Zaɓi Harshe

Tsarin Neural Sequence-to-Sequence Don Bayyana Kalmomin Ingilishi Na Yau Da Kullun

Tsarin cibiyar sadarwa mai ma'ana biyu wanda ke samar da bayanai game da kalmomin Ingilishi na yau da kullun ta amfani da bayanai daga shafukan sada zumunta.
learn-en.org | PDF Size: 0.3 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Tsarin Neural Sequence-to-Sequence Don Bayyana Kalmomin Ingilishi Na Yau Da Kullun

Teburin Abubuwan Ciki

Shekaru 15

Tarin Bayanai na Ƙamus na Birane

2K+

Sabbin Kalmomin Slang Na Kullum

Ma'auni Biyu

Sabon Tsari

1. Gabatarwa

A al'ada, sarrafa harshe na dabi'a yana mai da hankali kan Ingilishi na ƙa'ida a cikin yanayi na yau da kullun, yana barin kalmomin da ba na ƙa'ida ba ba a magance su sosai. Wannan bincike yana magance kalubalen sarrafa kalmomin Ingilishi na yau da kullun da ake samu a shafukan sada zumunta da kuma sadarwar yau da kullun ta atomatik.

Saurin canjin harshe a cikin wuraren dijital yana haifar da gibi mai muhimmanci a cikin iyawar sarrafa harshe. Yayin da hanyoyin da suka danganci ƙamus na al'ada ke fafutukar da matsalolin ɗaukar hoto, tsarin mu na jeri-zuwa-jeri na jijiyoyin jiki yana ba da mafita mai ƙarfi don fahimtar ma'anar kalmomin slang da kuma kalmomin yau da kullun.

2. Ayyukan Da Suka Gabata

Hanyoyin da suka gabata na sarrafa harshe mara ƙa'ida sun dogara ne akan binciken ƙamus da albarkatun tsayayye. Burfoot da Baldwin (2009) sun yi amfani da Wiktionary don gano zarge-zarge, yayin da Wang da McKeown (2010) suka yi amfani da ƙamus na kalmomin slang 5K don gano lalata Wikipedia. Waɗannan hanyoyin suna fuskantar iyakoki na asali wajen sarrafa saurin canjin harshe a cikin yanayin shafukan sada zumunta.

Ci gaban da aka samu na kalmomin da aka haɗa ta Noraset (2016) ya nuna alamar nasara amma ba su da hankali game da yanayi. Hanyarmu ta ginu ne akan tsarin jeri-zuwa-jeri wanda Sutskever et al. (2014) suka ƙirƙira, suna daidaita su musamman don kalubalen bayyana harshe mara ƙa'ida.

3. Hanyar Aiki

3.1 Tsarin Ma'auni Biyu

Babban ƙirƙira na hanyarmu shine tsarin ma'auni biyu wanda ke sarrafa duka yanayi da kalmomin da ake nufi daban. Tsarin ya ƙunshi:

  • Ma'auni na matakin kalma don fahimtar yanayi
  • Ma'auni na matakin haruffa don binciken kalmomin da ake nufi
  • Tsarin kulawa don samar da bayani mai da hankali

3.2 Rufe Bayanai Ta Hanyar Haruffa

Sarrafa matakin haruffa yana ba da damar sarrafa kalmomin da ba a cikin ƙamus ba da kuma bambance-bambancen siffofi da ake samu a cikin Ingilishi mara ƙa'ida. Ma'aunin haruffa yana amfani da raka'o'in LSTM don sarrafa jerin abubuwan shigar ta kowane hali:

$h_t = \text{LSTM}(x_t, h_{t-1})$

inda $x_t$ yana wakiltar harafin da yake matsayi na $t$, kuma $h_t$ shine yanayin ɓoye.

3.3 Tsarin Kulawa

Tsarin kulawa yana ba da damar tsarin ya mai da hankali ga sassan jerin abubuwan shigar lokacin samar da bayanai. Ana ƙididdige ma'aunin kulawa kamar haka:

$\alpha_{ti} = \frac{\exp(\text{score}(h_t, \bar{h}_i))}{\sum_{j=1}^{T_x} \exp(\text{score}(h_t, \bar{h}_j))}$

inda $h_t$ shine yanayin ɓoye na ma'auni kuma $\bar{h}_i$ su ne yanayin ɓoye na ma'auni.

4. Sakamakon Gwaji

4.1 Tarin Bayanai da Kimantawa

Mun tattara bayanai na shekaru 15 daga UrbanDictionary.com, wanda ya ƙunshi miliyoyin ma'anoni na Ingilishi mara ƙa'ida da misalan amfani. An raba tarin bayanai zuwa horo (80%), tabbatarwa (10%), da gwaji (10%).

Ma'aunin kimantawa ya haɗa da makin BLEU don ingancin ma'ana da kuma kimantawar ɗan adam don tantance inganci. An gwada tsarin akan duka kalmomin da aka gani da waɗanda ba a gani ba na mara ƙa'ida don auna iyawar gama gari.

4.2 Kwatancen Aiki

Tsarin mu na ma'auni biyu ya fi hanyoyin tushe gabaɗaya ciki har da LSTMs masu kulawa na al'ada da hanyoyin binciken ƙamus. Sakamako muhimman sun haɗa da:

  • Haɓaka makin BLEU da kashi 35% akan LSTM na tushe
  • Daidaiton kashi 72% a cikin kimantawar ɗan adam don inganci
  • Samar da bayanai nasara don kashi 68% na kalmomin da ba a gani ba

Hoto na 1: Kwatancen aiki yana nuna tsarin mu na ma'auni biyu (shuɗi) ya fi LSTM na al'ada (lemu) da binciken ƙamus (furfura) a cikin ma'aunin kimantawa da yawa. Rufe bayanai ta hanyar haruffa ya tabbatar da inganci musamman don sarrafa sabbin ƙirar kalmomin slang.

5. Ƙarshe da Ayyukan Gaba

Bincikenmu ya nuna cewa tsarin jeri-zuwa-jeri na jijiyoyin jiki na iya samar da bayanai masu inganci don kalmomin Ingilishi mara ƙa'ida. Tsarin ma'auni biyu yana ba da tsari mai ƙarfi don sarrafa yanayin kalmomin slang da kuma harshen yau da kullun.

Hanyoyin gaba sun haɗa da faɗaɗa zuwa kalmomin marasa ƙa'ida na harsuna daban-daban, haɗa da saurin canjin harshe, da kuma haɓaka tsarin bayyana bayanai na ainihi don dandamalin sada zumunta.

6. Bincike na Fasaha

Babban Fahimta

Wannan binciken yana ƙalubalantar tsarin da ya dogara da ƙamus wanda ya mamaye sarrafa harshe mara ƙa'ida. Marubutan sun fahimci cewa slang ba kalmomi kawai ba ne—aiki ne na yanayi. Hanyarsu ta ma'auni biyu tana ɗaukar bayani a matsayin fassarar tsakanin rajistar harshe, wani hangen nesa wanda ya yi daidai da ka'idodin zamantakewa na canjin lamba da bambancin rajista.

Kwararren Kwarara

Hujja ta ci gaba daga gano iyakokin ɗaukar hoto na ƙamus masu tsayayye zuwa ba da shawarar mafita mai ƙirƙira. Sarkar ma'ana tana da ban sha'awa: idan slang ya canza da sauri don gyarawa ta hannu, kuma idan ma'ana ta dogara ne akan yanayi, to dole ne mafita ta zama mai ƙirƙira da kuma sanin yanayi. Tsarin ma'auni biyu yana magance duka buƙatun cikin kyakkyawa.

Ƙarfi & Kurakurai

Ƙarfi: Girman bayanan Ƙamus na Birane yana ba da ɗaukar hoto na horo wanda ba a taɓa yin irinsa ba. Ma'aunin matakin haruffa yana sarrafa ƙirar siffofi a cikin ƙirar kalmomin slang da wayo. Tsarin kulawa yana ba da fassara—zamu iya ganin waɗanne kalmomin yanayi ke tasirin bayanai.

Kurakurai: Tsarin yana iya fuskantar wahala tare da amfani mai matuƙar yanayi ko ban dariya inda tsarin saman ya ɓatar. Kamar yawancin hanyoyin jijiyoyin jiki, yana iya gadon son rai daga bayanan horo—shigarwar Ƙamus na Birane sun bambanta sosai cikin inganci kuma suna iya ƙunsar abubuwan ban haushi. Kimantawa ta mai da hankali kan ma'auni na fasaha maimakon amfanin duniya na gaske.

Bayanai Masu Aiki

Ga masu aiki: Wannan fasaha na iya canza tsarin gudanarwa, yana sa dandamali su fi amsa ga saurin maganganun cutarwa. Ga malamai: Ka yi tunanin kayan aiki waɗanda ke taimaka wa ɗalibai su fassara kalmomin intanet yayin da suke kiyaye ka'idojin rubuce-rubucen ilimi. Tsarin kansa yana iya canzawa—irin waɗannan hanyoyin za su iya bayyana kalmomin fasaha ko yarukan yanki.

Binciken ya yi daidai da tsarin tsarin da aka gani a cikin tsarin da ya yi nasara na yau da kullun kamar CLIP (Radford et al., 2021), inda ma'auni daban-daban don hanyoyin daban-daban suka haifar da wakilci mai wadata. Duk da haka, aikace-aikacen zuwa fassarar rajista maimakon fahimtar tsakanin yanayi sabon abu ne kuma yana da ban sha'awa.

Misalin Tsarin Bincike

Nazarin Shari'a: Bayyana "sus" a cikin Yanayi

Shigar: "Wannan bayanin yana da kamar sus a gare ni."
Sarrafa Tsarin:
- Ma'aunin kalma yana bincika cikakken yanayin jumla
- Ma'aunin haruffa yana sarrafa "sus"
- Kulawa ta gano "bayanin" da "yana kama" a matsayin maɓallin yanayi
Fitar: "mai shakku ko marar aminci"

Wannan yana nuna yadda tsarin ke amfani da duka siffar kalmar da ake nufi da kuma yanayin ma'ana/ma'ana don samar da bayanai masu dacewa.

Aikace-aikacen Gaba

Bayan aikace-aikacen nan na bayyana kalmomin slang, wannan fasaha zai iya ba da damar:

  • Fassarar ainihi tsakanin rajista na yau da kullun da na yau da kullun
  • Kayan aikin ilimi masu dacewa ga masu koyon harshe
  • Ƙarfafa tsarin gudanarwa waɗanda suka fahimci saurin maganganun cutarwa
  • Taimakon sadarwa tsakanin al'adu don wuraren dijital na duniya

7. Nassoshi

  1. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
  2. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning.
  3. Burfoot, C., & Baldwin, T. (2009). Automatic satire detection: Are you having a laugh?. Proceedings of the ACL-IJCNLP 2009 conference short papers.
  4. Wang, W. Y., & McKeown, K. (2010). Got you!: automatic vandalism detection in wikipedia with web-based shallow syntactic-semantic modeling. Proceedings of the 23rd International Conference on Computational Linguistics.
  5. Noraset, T., Liang, C., Birnbaum, L., & Downey, D. (2017). Definition modeling: Learning to define word embeddings in natural language. Thirty-First AAAI Conference on Artificial Intelligence.