Zaɓi Harshe

SLABERT: Tsarin Koyon Harshe na Biyu tare da BERT

Bincike kan canja wurin harshe a cikin koyon harshe na biyu ta amfani da tsarin BERT da bayanan Magana ga Yara daga harsuna 5 iri-iri.
learn-en.org | PDF Size: 4.7 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - SLABERT: Tsarin Koyon Harshe na Biyu tare da BERT

Teburin Abubuwan Ciki

Harsuna 5

Jamusanci, Faransanci, Yaren Polish, Indonesiya, Jafananci

Ma'auni na BLiMP

Kwas na kimanta nahawu

Hanyar TILT

Koyon canja wurin harshe

1. Gabatarwa

Wannan binciken yana magance babban gibi a cikin wallafe-wallafen NLP game da canja wuri mara kyau a cikin koyon harshe na biyu (SLA). Duk da yake an yi bincike mai yawa kan canja wurin harshe a cikin binciken SLA na ɗan adam, yawancin hanyoyin NLP sun mayar da hankali ne kawai kan tasirin canja wuri mai kyau, suna yin watsi da babban tasirin canja wuri mara kyau da ke faruwa lokacin da tsarin harshe na harshen asali (L1) ya tsoma baki tare da koyon harshen waje (L2).

Binciken ya gabatar da SLABERT (Koyon Harshe na Biyu BERT), wani sabon tsari wanda ke ƙirƙira koyon harshe na biyu na jeri ta amfani da bayanan Magana ga Yara (CDS). Wannan hanyar tana ba da ingantacciyar simintin gyare-gyare na hanyoyin koyon harshe na ɗan adam, yana ba masu bincike damar bincika duka tasirin sauƙaƙawa da kutsawa na L1 akan koyon L2.

2. Hanyar Bincike

2.1 Tsarin SLABERT

Tsarin SLABERT yana aiwatar da koyon harshe na jeri inda ake horar da samfura da farko akan bayanan L1 (harshen asali) sannan a inganta su akan bayanan L2 (Turanci). Wannan hanyar ta jiri tana kama da hanyoyin koyon harshe na biyu na ɗan adam, yana ba masu bincike damar lura da tasirin canja wuri da ke faruwa lokacin da ilimin harshe daga L1 ya rinjayi koyon L2.

2.2 Bayanan MAO-CHILDES

Masu bincike sun gina bayanan Harsunan Yara Masu Tsari na Shera (MAO-CHILDES), wanda ya ƙunshi harsuna biyar iri-iri: Jamusanci, Faransanci, Yaren Polish, Indonesiya, da Jafananci. Wannan bayanan ya ƙunshi Magana ga Yara na halitta, yana ba da ingantaccen bayanin horo mai inganci wanda ke nuna ainihin yanayin koyon harshe.

2.3 Koyon Canja Sauri na TILT

Binciken yana amfani da hanyar Gwaji don Ra'ayi ta hanyar Canja Saurin Samfurin Harshe (TILT) da Papadimitriou da Jurafsky (2020) suka kafa. Wannan hanyar tana ba da damar bincika tsarin yadda nau'ikan bayanan horo daban-daban ke haifar da siffofi na tsarin da ke sauƙaƙawa ko hana canja wurin harshe.

3. Sakamakon Gwaji

3.1 Tasirin Nisa Tsakanin Iyalin Harshe

Gwaje-gwajen sun nuna cewa nisa tsakanin iyalin harshe yana hasashen canja wuri mara kyau sosai. Harsunan da suka fi nisa da Turanci (kamar Jafananci da Indonesiya) sun nuna tasirin kutsawa mafi girma, yayin da 'yan'uwa na kusa (Jamusanci da Faransanci) suka nuna ƙarin canja wuri mai kyau. Wannan binciken ya yi daidai da binciken SLA na ɗan adam, yana tabbatar da ingancin muhalli na hanyar SLABERT.

3.2 Magana ta Yau da Kullum da Ta Rubutacciya

Wani muhimmin bincike ya nuna cewa bayanan magana ta yau da kullum yana ba da ƙarin sauƙaƙawa don koyon harshe idan aka kwatanta da bayanan magana da aka rubuta. Wannan yana nuna cewa shigarwar harshe na halitta, mai ma'amala ya ƙunshi kaddarorin tsarin da suka fi dacewa a cikin harsuna, mai yiwuwa saboda kasancewar tsarin tattaunawa na duniya da hanyoyin gyara.

Muhimman Hasashe

  • Ba a bincika canja wuri mara kyau sosai a cikin binciken NLP duk da muhimmancinsa a cikin SLA na ɗan adam
  • Nisa tsakanin iyalin harshe yana hasashen matakin canja wuri mara kyau da gaske
  • Bayanan magana ta yau da kullum ya fi na rubutacciya don canja wurin harshe
  • Horo na jiri yana kama da tsarin koyon ɗan adam daidai fiye da horo na layi daya

4. Bincike na Fasaha

4.1 Tsarin Lissafi

Ana iya auna tasirin canja wuri tsakanin L1 da L2 ta amfani da wannan tsari:

Bari $T_{L1 \rightarrow L2}$ ya wakilci tasirin canja wuri daga L1 zuwa L2, wanda aka auna azaman haɓaka aiki akan ayyukan L2 bayan horon farko na L1. Ana iya bayyana ingancin canja wuri kamar haka:

$\eta_{transfer} = \frac{P_{L2|L1} - P_{L2|random}}{P_{L2|monolingual} - P_{L2|random}}$

inda $P_{L2|L1}$ shine aikin L2 bayan horon farko na L1, $P_{L2|monolingual}$ shine aikin L2 na harshe ɗaya, kuma $P_{L2|random}$ shine aiki tare da farawa bazuwar.

Ana iya lissafta ma'aunin nisa na harshe $D(L1,L2)$ tsakanin harsuna ta amfani da siffofi na nau'in harshe daga cikin bayanai kamar WALS (Duniya Taswirar Tsarin Harshe), bin hanyar Berzak et al. (2014):

$D(L1,L2) = \sqrt{\sum_{i=1}^{n} w_i (f_i(L1) - f_i(L2))^2}$inda $f_i$ yana wakiltar siffofi na nau'in harshe kuma $w_i$ ma'auninsu masu dacewa.

4.2 Misalin Tsarin Bincike

Binciken yana amfani da tsarin kimantawa na tsari ta amfani da kwas ɗin gwaji na BLiMP (Ma'auni na Ƙungiyoyin Harshe mafi ƙanƙanta). Wannan ma'auni yana kimanta ilimin nahawu ta hanyar ƙungiyoyi mafi ƙanƙanta waɗanda ke gwada takamaiman abubuwan haɗin kai. Yarjejeniyar kimantawa ta bi:

  1. Horo na Farko na L1: Ana horar da samfura akan bayanan CDS daga kowane ɗayan harsuna biyar
  2. Gyara L2: Horo na jiri akan bayanan harshen Turanci
  3. Kimantawa: Auna aiki akan hukunce-hukuncen nahawu na BLiMP
  4. Binciken Canja Sauri: Kwatanta da ma'auni na harshe ɗaya da na harshe daban-daban

Wannan tsarin yana ba da damar auna daidai tasirin canja wuri mai kyau (sauƙaƙawa) da mara kyau (kutsawa) a cikin nau'ikan harshe daban-daban da abubuwan harshe.

5. Aikace-aikace na Gaba

Tsarin SLABERT yana buɗe hanyoyi masu ban sha'awa da yawa don bincike da aikace-aikace na gaba:

  • Fasahar Ilimi: Haɓaka tsarin koyon harshe na keɓance waɗanda ke la'akari da asalin harshen asali na masu koyo
  • NLP mai Ƙarancin Albarkatu: Yin amfani da tsarin canja wuri don inganta aiki don harsunan da ke da ƙayyadaddun bayanan horo
  • Samfurin Fahimi: Ingantattun samfuran lissafi na hanyoyin koyon harshe na ɗan adam
  • AI na Al'adu Daban-daban: Haɓaka tsarin AI waɗanda suka fi fahimta da ɗaukar nauyin bambancin harshe

Aikin gaba yakamai ya bincika faɗaɗa tsarin zuwa ƙarin nau'ikan harshe, haɗa ƙarin siffofi na harshe, da bincika tasirin canja wuri a matakan ƙwarewa daban-daban.

6. Bayanan Kara Karatu

  1. Papadimitriou, I., & Jurafsky, D. (2020). Koyon Kiɗa Yana Taimaka Muku Koyon Harshe. A cikin Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  2. Warstadt, A., et al. (2020). BLiMP: Ma'auni na Ƙungiyoyin Harshe mafi ƙanƙanta don Turanci. Mu'amalar Ƙungiyar Lissafi na Harshe.
  3. Berzak, Y., et al. (2014). Sake Gina Nau'in Harshen Asali daga Amfani da Harshen Waje. A cikin Proceedings of the 18th Conference on Computational Natural Language Learning.
  4. Jarvis, S., & Pavlenko, A. (2007). Tasirin Harshe a cikin Harshe da Fahimta. Routledge.
  5. Conneau, A., et al. (2017). Koyon Koyon Wakilcin Jumla na Duniya daga Bayanan Shaidar Harshe. A cikin Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.

Binciken Kwararre: Muhimman Hasashe da Tasirin Dabarun

Babban Hasashe

Wannan binciken yana ba da sanarwar tashi da safe mai muhimmanci ga al'ummar NLP: mun kasance muna yin watsi da canja wuri mara kyau a tsarin yayin da muke bin tasirin canja wuri mai kyau. Tsarin SLABERT ya fallasa wannan makafin makafin da daidaitaccen fasaha, yana nuna cewa samfuran harshe, kamar mutane, suna fama da kutsawar harshe wanda nisa na nau'in harshe zai iya hasasawa. Wannan ba abin sha'awa na ilimi ba ne kawai—yana da iyaka ta asali a yadda muke tunkarar AI mai harsuna da yawa.

Kwararar Ma'ana

Ci gaban hanyar bincike yana da kyau: fara da ka'idar SLA na ɗan adam, gina ingantattun bayanai (MAO-CHILDES), aiwatar da horo na jiri wanda yake kama da ainihin koyo, sannan auna tasirin canja wuri a tsari. Haɗin kai tare da ingantacciyar ka'idar harshe (Berzak et al., 2014) da amfani da ingantaccen kimantawa (BLiMP) ya haifar da ingantaccen sarkar tabbaci. Gano cewa magana ta yau da kullum ta fi na rubutacciya yana daidaitawa daidai da abin da muka sani game da koyon harshe na ɗan adam daga ilimin halayyar ci gaba.

Ƙarfi & Kurakurai

Ƙarfi: Ingancin muhalli yana da ban sha'awa—amfani da Magana ga Yara maimakon jujjuyawar Wikipedia ya canza wasan gaba ɗaya. Tsarin horo na jiri yana da yuwuwar ilimin halitta kuma an kafa shi bisa ka'ida. Bambancin nau'in harsunan da aka gwada yana ba da ingantaccen inganci na waje.

Kurakurai Masu Muhimmanci: Girman samfurin harsuna biyar, duk da cewa sun bambanta, ya kasance mai iyaka don manyan iƙirarin nau'in harshe. Tsarin bai isa ya magance matakan ƙwarewa ba—SLA na ɗan adam yana nuna tsarin canja wuri yana canzawa sosai a cikin matakan farko, na tsakiya, da na ci gaba. Kimantawa ya mayar da hankali ne kawai akan hukunce-hukuncen nahawu, yana yin watsi da fage na aiki da na zamantakewa waɗanda ke da mahimmanci don amfani da harshe na ainihi.

Hasashe masu Aiki

Ga masu aiki a masana'antu: nan da nan ku bincika samfuran ku na harsuna da yawa don tasirin canja wuri mara kyau, musamman ga nau'ikan harshe masu nisa. Ga masu bincike: ba da fifiko ga haɓaka ma'aunin canja wuri mara kyau tare da matakan canja wuri mai kyau. Ga malamai: wannan binciken ya tabbatar da muhimmancin la'akari da asalin L1 a cikin koyarwar harshe, amma yana gargaɗi cewa malaman harshe na AI suna buƙatar ingantaccen gyara kafin su iya la'akari da tsangwama ta harshe daidai.

Mafi kyawun hanya? Haɗa wannan aiki tare da ci gaban kwanan nan a cikin bayanan nau'in harshe kamar Grambank da kuma amfani da hasashe don inganta aiki akan harsunan da ba su da albarkatu na gaske. Kamar yadda Ruder et al. (2017) suka nuna a cikin bincikensu na hanyoyin harshe daban-daban, muna kawai dora farfajiya ga abin da zai yiwu lokacin da muka daidaita rikitattun koyon harsuna da yawa daidai.