Teburin Abubuwan Ciki
Shekaru 15
Tarin Bayanai na Ƙamus na Birane
2K+
Sabbin Kalmomin Slang Na Kullum
Ma'auni Biyu
Sabon Tsari
1. Gabatarwa
A al'ada, sarrafa harshe na dabi'a yana mai da hankali kan Ingilishi na ƙa'ida a cikin yanayi na yau da kullun, yana barin kalmomin da ba na ƙa'ida ba ba a magance su sosai. Wannan bincike yana magance kalubalen sarrafa kalmomin Ingilishi na yau da kullun da ake samu a shafukan sada zumunta da kuma sadarwar yau da kullun ta atomatik.
Saurin canjin harshe a cikin wuraren dijital yana haifar da gibi mai muhimmanci a cikin iyawar sarrafa harshe. Yayin da hanyoyin da suka danganci ƙamus na al'ada ke fafutukar da matsalolin ɗaukar hoto, tsarin mu na jeri-zuwa-jeri na jijiyoyin jiki yana ba da mafita mai ƙarfi don fahimtar ma'anar kalmomin slang da kuma kalmomin yau da kullun.
2. Ayyukan Da Suka Gabata
Hanyoyin da suka gabata na sarrafa harshe mara ƙa'ida sun dogara ne akan binciken ƙamus da albarkatun tsayayye. Burfoot da Baldwin (2009) sun yi amfani da Wiktionary don gano zarge-zarge, yayin da Wang da McKeown (2010) suka yi amfani da ƙamus na kalmomin slang 5K don gano lalata Wikipedia. Waɗannan hanyoyin suna fuskantar iyakoki na asali wajen sarrafa saurin canjin harshe a cikin yanayin shafukan sada zumunta.
Ci gaban da aka samu na kalmomin da aka haɗa ta Noraset (2016) ya nuna alamar nasara amma ba su da hankali game da yanayi. Hanyarmu ta ginu ne akan tsarin jeri-zuwa-jeri wanda Sutskever et al. (2014) suka ƙirƙira, suna daidaita su musamman don kalubalen bayyana harshe mara ƙa'ida.
3. Hanyar Aiki
3.1 Tsarin Ma'auni Biyu
Babban ƙirƙira na hanyarmu shine tsarin ma'auni biyu wanda ke sarrafa duka yanayi da kalmomin da ake nufi daban. Tsarin ya ƙunshi:
- Ma'auni na matakin kalma don fahimtar yanayi
- Ma'auni na matakin haruffa don binciken kalmomin da ake nufi
- Tsarin kulawa don samar da bayani mai da hankali
3.2 Rufe Bayanai Ta Hanyar Haruffa
Sarrafa matakin haruffa yana ba da damar sarrafa kalmomin da ba a cikin ƙamus ba da kuma bambance-bambancen siffofi da ake samu a cikin Ingilishi mara ƙa'ida. Ma'aunin haruffa yana amfani da raka'o'in LSTM don sarrafa jerin abubuwan shigar ta kowane hali:
$h_t = \text{LSTM}(x_t, h_{t-1})$
inda $x_t$ yana wakiltar harafin da yake matsayi na $t$, kuma $h_t$ shine yanayin ɓoye.
3.3 Tsarin Kulawa
Tsarin kulawa yana ba da damar tsarin ya mai da hankali ga sassan jerin abubuwan shigar lokacin samar da bayanai. Ana ƙididdige ma'aunin kulawa kamar haka:
$\alpha_{ti} = \frac{\exp(\text{score}(h_t, \bar{h}_i))}{\sum_{j=1}^{T_x} \exp(\text{score}(h_t, \bar{h}_j))}$
inda $h_t$ shine yanayin ɓoye na ma'auni kuma $\bar{h}_i$ su ne yanayin ɓoye na ma'auni.
4. Sakamakon Gwaji
4.1 Tarin Bayanai da Kimantawa
Mun tattara bayanai na shekaru 15 daga UrbanDictionary.com, wanda ya ƙunshi miliyoyin ma'anoni na Ingilishi mara ƙa'ida da misalan amfani. An raba tarin bayanai zuwa horo (80%), tabbatarwa (10%), da gwaji (10%).
Ma'aunin kimantawa ya haɗa da makin BLEU don ingancin ma'ana da kuma kimantawar ɗan adam don tantance inganci. An gwada tsarin akan duka kalmomin da aka gani da waɗanda ba a gani ba na mara ƙa'ida don auna iyawar gama gari.
4.2 Kwatancen Aiki
Tsarin mu na ma'auni biyu ya fi hanyoyin tushe gabaɗaya ciki har da LSTMs masu kulawa na al'ada da hanyoyin binciken ƙamus. Sakamako muhimman sun haɗa da:
- Haɓaka makin BLEU da kashi 35% akan LSTM na tushe
- Daidaiton kashi 72% a cikin kimantawar ɗan adam don inganci
- Samar da bayanai nasara don kashi 68% na kalmomin da ba a gani ba
Hoto na 1: Kwatancen aiki yana nuna tsarin mu na ma'auni biyu (shuɗi) ya fi LSTM na al'ada (lemu) da binciken ƙamus (furfura) a cikin ma'aunin kimantawa da yawa. Rufe bayanai ta hanyar haruffa ya tabbatar da inganci musamman don sarrafa sabbin ƙirar kalmomin slang.
5. Ƙarshe da Ayyukan Gaba
Bincikenmu ya nuna cewa tsarin jeri-zuwa-jeri na jijiyoyin jiki na iya samar da bayanai masu inganci don kalmomin Ingilishi mara ƙa'ida. Tsarin ma'auni biyu yana ba da tsari mai ƙarfi don sarrafa yanayin kalmomin slang da kuma harshen yau da kullun.
Hanyoyin gaba sun haɗa da faɗaɗa zuwa kalmomin marasa ƙa'ida na harsuna daban-daban, haɗa da saurin canjin harshe, da kuma haɓaka tsarin bayyana bayanai na ainihi don dandamalin sada zumunta.
6. Bincike na Fasaha
Babban Fahimta
Wannan binciken yana ƙalubalantar tsarin da ya dogara da ƙamus wanda ya mamaye sarrafa harshe mara ƙa'ida. Marubutan sun fahimci cewa slang ba kalmomi kawai ba ne—aiki ne na yanayi. Hanyarsu ta ma'auni biyu tana ɗaukar bayani a matsayin fassarar tsakanin rajistar harshe, wani hangen nesa wanda ya yi daidai da ka'idodin zamantakewa na canjin lamba da bambancin rajista.
Kwararren Kwarara
Hujja ta ci gaba daga gano iyakokin ɗaukar hoto na ƙamus masu tsayayye zuwa ba da shawarar mafita mai ƙirƙira. Sarkar ma'ana tana da ban sha'awa: idan slang ya canza da sauri don gyarawa ta hannu, kuma idan ma'ana ta dogara ne akan yanayi, to dole ne mafita ta zama mai ƙirƙira da kuma sanin yanayi. Tsarin ma'auni biyu yana magance duka buƙatun cikin kyakkyawa.
Ƙarfi & Kurakurai
Ƙarfi: Girman bayanan Ƙamus na Birane yana ba da ɗaukar hoto na horo wanda ba a taɓa yin irinsa ba. Ma'aunin matakin haruffa yana sarrafa ƙirar siffofi a cikin ƙirar kalmomin slang da wayo. Tsarin kulawa yana ba da fassara—zamu iya ganin waɗanne kalmomin yanayi ke tasirin bayanai.
Kurakurai: Tsarin yana iya fuskantar wahala tare da amfani mai matuƙar yanayi ko ban dariya inda tsarin saman ya ɓatar. Kamar yawancin hanyoyin jijiyoyin jiki, yana iya gadon son rai daga bayanan horo—shigarwar Ƙamus na Birane sun bambanta sosai cikin inganci kuma suna iya ƙunsar abubuwan ban haushi. Kimantawa ta mai da hankali kan ma'auni na fasaha maimakon amfanin duniya na gaske.
Bayanai Masu Aiki
Ga masu aiki: Wannan fasaha na iya canza tsarin gudanarwa, yana sa dandamali su fi amsa ga saurin maganganun cutarwa. Ga malamai: Ka yi tunanin kayan aiki waɗanda ke taimaka wa ɗalibai su fassara kalmomin intanet yayin da suke kiyaye ka'idojin rubuce-rubucen ilimi. Tsarin kansa yana iya canzawa—irin waɗannan hanyoyin za su iya bayyana kalmomin fasaha ko yarukan yanki.
Binciken ya yi daidai da tsarin tsarin da aka gani a cikin tsarin da ya yi nasara na yau da kullun kamar CLIP (Radford et al., 2021), inda ma'auni daban-daban don hanyoyin daban-daban suka haifar da wakilci mai wadata. Duk da haka, aikace-aikacen zuwa fassarar rajista maimakon fahimtar tsakanin yanayi sabon abu ne kuma yana da ban sha'awa.
Misalin Tsarin Bincike
Nazarin Shari'a: Bayyana "sus" a cikin Yanayi
Shigar: "Wannan bayanin yana da kamar sus a gare ni."
Sarrafa Tsarin:
- Ma'aunin kalma yana bincika cikakken yanayin jumla
- Ma'aunin haruffa yana sarrafa "sus"
- Kulawa ta gano "bayanin" da "yana kama" a matsayin maɓallin yanayi
Fitar: "mai shakku ko marar aminci"
Wannan yana nuna yadda tsarin ke amfani da duka siffar kalmar da ake nufi da kuma yanayin ma'ana/ma'ana don samar da bayanai masu dacewa.
Aikace-aikacen Gaba
Bayan aikace-aikacen nan na bayyana kalmomin slang, wannan fasaha zai iya ba da damar:
- Fassarar ainihi tsakanin rajista na yau da kullun da na yau da kullun
- Kayan aikin ilimi masu dacewa ga masu koyon harshe
- Ƙarfafa tsarin gudanarwa waɗanda suka fahimci saurin maganganun cutarwa
- Taimakon sadarwa tsakanin al'adu don wuraren dijital na duniya
7. Nassoshi
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning.
- Burfoot, C., & Baldwin, T. (2009). Automatic satire detection: Are you having a laugh?. Proceedings of the ACL-IJCNLP 2009 conference short papers.
- Wang, W. Y., & McKeown, K. (2010). Got you!: automatic vandalism detection in wikipedia with web-based shallow syntactic-semantic modeling. Proceedings of the 23rd International Conference on Computational Linguistics.
- Noraset, T., Liang, C., Birnbaum, L., & Downey, D. (2017). Definition modeling: Learning to define word embeddings in natural language. Thirty-First AAAI Conference on Artificial Intelligence.