Tsarin Abubuwan Cikin Littafin
- 1. Gabatarwa
- 2. Tsarin Gwaji
- 3. Ra'ayoyin Karkata na Hanyoyin Horar da L2
- 4. Tasirin Horar da L1 akan Koyon Nahawun L2
- 5. Tsarin Koyon L2
- 6. Fahimtar Jigo & Ra'ayin Mai Bincike
- 7. Cikakkun Bayanai na Fasaha & Tsarin Lissafi
- 8. Sakamakon Gwaji & Bayanin Chati
- 9. Tsarin Bincike: Misalin Lamari
- 10. Aikace-aikace na Gaba & Jagorori
- 11. Nassoshi
1. Gabatarwa
Wannan aikin yana bincika yadda ake iya canja tsarin harshe na jijiyoyi (LMs) daga harshe zuwa harshe ta mahangar koyon harshe na biyu (L2). Yayin da binciken da ya gabata ya mai da hankali kan koyon harshe na farko (L1), wannan binciken yana nazarin yadda ilimin L1 ke tasiri ingancin koyon nahawu a cikin L2. Babbar tambayar binciken ita ce: Ta yaya koyon harshe na farko (L1) na LMs ke tasiri ingancin koyon nahawu a cikin harshe na biyu (L2)?
Dalilin ya samo asali ne daga lura cewa manyan LMs na Turanci suna nuna iyawar fassara tare da ƙaramin bayanan horo na wani harshe, wanda ke nuna ingantacciyar canja wuri tsakanin harsuna. Duk da haka, yawancin kimantawa sun dogara ne akan ma'auni gabaɗaya kamar rudani ko daidaiton aiki na gaba. Wannan binciken yana nufin cike gibi ta hanyar nazarin canja wuri daga mahangar harshe, tare da mai da hankali kan koyon ilimin nahawu da halayen canja wurin harshe.
2. Tsarin Gwaji
Tsarin gwajin ya yi kama da yanayin koyon L2 na ɗan adam:
- Horar da L1 Kafin (Koyon Harshe na Farko): Horar da tsarin harshe mai rufaffiyar fuska (masked language model) akan takamaiman L1 (Faransanci, Jamusanci, Rashanci, ko Jafananci).
- Horar da L2 (Koyon Harshe na Biyu): Ƙara horar da tsarin akan Turanci (L2) a ƙarƙashin saitunan harshe biyu.
- Kimantawa: Nazarin tasirin L1 akan L2 ta hanyar gwajin hukunci na nahawu a Turanci ta amfani da ma'aunin BLiMP.
An iyakance girman bayanan horo don a fi kwatanta da halayen koyon L2 na ɗan adam. Zaɓaɓɓun harsunan L1 suna wakiltar matakai daban-daban na nisa na nau'in harshe da kuma tsammanin wahalar canja wuri zuwa Turanci.
3. Ra'ayoyin Karkata na Hanyoyin Horar da L2
Gwaje-gwajen farko sun bincika saitunan bayanan L2 daban-daban:
- Horarwa akan rubutun harshe ɗaya na L2 (Turanci) kawai.
- Horarwa akan nau'i-nau'i na fassarar L1-L2.
Babban Bincike: Ba da nau'i-nau'i na fassarar L1-L2 ga LMs ya rage saurin koyon nahawun L2 idan aka kwatanta da ba da rubutun harshe ɗaya na L2 kawai kowane zamani biyu. Wannan yana nuna cewa hanyar bayyana L2 tana tasiri ingancin koyo sosai.
4. Tasirin Horar da L1 akan Koyon Nahawun L2
4.1 Ilimin L1 Yana Haɓaka Gama-garin L2
Tsare-tsare tare da horar da L1 kafin sun nuna mafi kyawun gama-garin harshe a cikin L2 idan aka kwatanta da tsare-tsaren da aka horar akan L2 daga farko. Wannan yana nuna cewa ilimin harshe da ya gabata (ko da a cikin wani harshe daban) yana ba da ra'ayi mai fa'ida na karkata don koyon sabbin tsarin harshe.
4.2 Zaɓin L1 Yana Tasiri Ayyukan L2
Harshen L1 na tushe ya yi tasiri sosai akan aikin gama-garin L2 (Turanci). Tsare-tsare tare da Faransanci ko Jamusanci a matsayin L1 sun yi aiki sosai mafi kyau fiye da waɗanda ke da Jafananci ko Rashanci a matsayin L1. Wannan matsayi ya yi daidai da wahalar canja wurin harshe da ɗan adam ya ayyana (Chiswick & Miller, 2004), inda kamancen nau'in harshe (misali, harsunan Jamusanci/Romance zuwa Turanci) ke sauƙaƙa canja wuri.
4.3 Tasiri Daban-daban akan Nau'ikan Nahawu
Horar da L1 kafin yana da tasiri daban-daban akan al'amuran nahawu daban-daban a cikin L2:
- Ribobi Mafi Girma: Abubuwan ilimin siffofi da tsari (misali, yarjejeniyar mai magana da fi'ili, tsarin kalmomi).
- Ribobi Ƙanana: Abubuwan ma'ana da haɗin kai tsakanin tsari da ma'ana (misali, iyakar ƙididdiga, ɗaure).
Wannan yana nuna cewa ilimin tsari mai zurfi na iya canja wuri cikin sauƙi fiye da ilimin takamaiman ma'ana ko ilimin haɗin kai.
5. Tsarin Koyon L2
5.1 Ci gaba da Rashin Ingancin Bayanai
Nazarin yanayin koyo ya bayyana cewa koyon ilimin L2 bai ci gaba sosai ba har sai tsarin ya ga cikakken bayanan L2 sau da yawa (misali, zamani 50-100). Wannan yana nuna wani mataki na rashin ingancin bayanai a cikin tsarin koyon L2 na waɗannan LMs. Bugu da ƙari, binciken ya lura da lalacewar ilimin L1 yayin horar da L2, yana nuna ciniki da kuma buƙatar daidaita ilimin harshe na tushe da na manufa.
6. Fahimtar Jigo & Ra'ayin Mai Bincike
Fahimtar Jigo: Wannan takarda tana ba da gaskiya mai mahimmanci, wacce ake yawan yin watsi da ita: LMs na jijiyoyi ba injinan ƙididdiga marasa harshe ba ne. "L1" nasu yana buga ra'ayi mai zurfi na tsari wanda ke ƙayyade inganci da yanayin koyon "L2". Gano cewa nau'i-nau'i na fassara na iya hana koyon nahawun L2 musamman yana saba wa hankali kuma yana ƙalubalantar akidar horar da harsuna da yawa.
Kwararar Ma'ana: Binciken yana haɗa ilimin harshe na kwamfuta da ka'idar koyon harshe na biyu cikin kyakkyawa. Ya fara da hasashe bayyananne (L1 yana tasiri ingancin L2), ya ƙirƙira tsari mai kamanceceniya da na ɗan adam (iyakataccen bayanai, takamaiman L1s), yana gwada bambance-bambancen horo bisa tsari, kuma ya ƙare a cikin nazarin harshe mai zurfi. Kwararar daga canja wuri mai girma (zaɓin harshe) zuwa canja wuri mai ƙanƙanta (nau'in nahawu) tana da ma'ana.
Ƙarfi & Kurakurai: Babban ƙarfin shi ne ƙananan ƙwayoyin harshe. Matsawa bayan ma'auni gabaɗaya kamar daidaito don rarrabe aiki akan al'amuran tsarin BLiMP babbar gudummawa ce, mai tunawa da tsarin bincike da ayyuka kamar "Me BERT ke Dubawa?" (Clark et al., 2019) suka shahara. Tsarin kwatanta ɗan adam-LM shima sabon abu ne. Babban aibin shi ne ma'auni. Yin amfani da ƙananan LMs (wanda aka nuna ta hanyar iyakance bayanai) yana iyakance aikace-aikacen kai tsaye ga LLMs na zamani kamar GPT-4 ko LLaMA, waɗanda ƙarancin iyawar canja wurin harshe suke da ban mamaki. Binciken ya yarda da haka amma har yanzu gibi ne. Bugu da ƙari, an lura da "mantuwa mai tsanani" na L1 amma ba a yi nazari mai zurfi ba—damar da aka rasa.
Fahimta Mai Aiki: Ga masu aiki, wannan binciken yana ba da shawarar kin dabarar harsuna da yawa guda ɗaya. Lokacin gina tsari don harshen manufa, zaɓi harshe(n) na horo kafin bisa dabarun bisa kamancen nau'in harshe. Misali, haɓaka aikin harshen Thai na iya amfana mafi yawa daga horo kafin akan harsunan Tai-Kadai masu alaƙa maimakon Turanci kawai. Binciken rashin ingancin bayanai yana kira ga bincike cikin hanyoyin koyarwa mafi yawa na tsarin karatu ko koyon meta don horar da L2, maimakon ci gaba da horo mai ƙarfi. A ƙarshe, fannin dole ne ya haɓaka dabarun koyo na ci gaba mafi kyau don rage mantuwar L1 yayin koyon L2, ƙalubali da ake fuskanta a cikin koyon nau'i-nau'i da yawa kamar yadda aka gani a cikin ayyuka kamar Flamingo (Alayrac et al., 2022).
7. Cikakkun Bayanai na Fasaha & Tsarin Lissafi
Jigon manufar tsarin harshe mai rufaffiyar fuska da aka yi amfani da shi a cikin horo kafin (Devlin et al., 2019) shine haɓaka mafi girman yuwuwar sake gina alamomin da aka rufe:
$\mathcal{L}_{MLM} = -\sum_{i \in M} \log P(x_i | \mathbf{x}_{\backslash M}; \theta)$
inda $M$ shine saitin alamomin alamomin da aka rufe, $x_i$ shine alamar asali, $\mathbf{x}_{\backslash M}$ shine jerin tare da alamomi a cikin $M$ an rufe, kuma $\theta$ sune sigogin tsarin.
A cikin lokacin koyon L2, sigogin tsarin $\theta$, waɗanda aka fara daga horo kafin na L1, ana ƙara inganta su akan cakuda bayanan L1 da L2 ko bayanan L2 kawai. Babban sarrafa binciken shine tsarin bayanai da tsari a wannan lokacin, wanda ke canza aikin asarar ingantaccen da tsarin ke ingantawa.
8. Sakamakon Gwaji & Bayanin Chati
Sakamako Mai Mahimmanci 1 (Hanzarin L1): Chatin layi (wanda aka nuna ta bayanin rubutu) zai nuna daidaiton nahawu na L2 (akan BLiMP) akan axis-y da zamani na horar da L2 akan axis-x. Layuka da yawa za su wakilci tsare-tsare tare da L1s daban-daban (Fr, De, Ru, Ja) da ma'auni ba tare da L1 ba (L2-daga-farko). Chatin zai nuna cewa duk tsare-tsaren da aka horar da L1 kafin sun fara sama kuma suna koyo da sauri fiye da ma'auni, tare da layukan Fr da De suna tashi mafi tsayi da mafi girma.
Sakamako Mai Mahimmanci 2 (Bambancin Nau'in Nahawu): Chatin mashaya rukuni zai nuna daidaiton ƙarshe akan BLiMP. Axis-x zai kasance da rukuni: Ilimin Siffofi (Morphology), Tsari (Syntax), Ma'ana (Semantics), Tsari-Ma'ana (Syntax-Semantics). Ga kowane rukuni, za a sami mashaya biyu: ɗaya don "Babu Horar da L1 Kafin" da ɗaya don "Tare da Horar da L1 Kafin". Bambancin tsayi tsakanin mashaya biyu (ribar daga L1) zai kasance mafi girma ga Ilimin Siffofi da Tsari, kuma mafi ƙanƙanta ga Ma'ana.
9. Tsarin Bincike: Misalin Lamari
Lamari: Nazarin Canja wuri daga L1 Jafananci (Ja) zuwa L2 Turanci (En) don Yarjejeniyar Mai Magana da Fi'ili.
- Siffar Harshe: Turanci yana buƙatar yarjejeniyar mai magana da fi'ili a lamba (misali, "Kare yana gudu" da "Karnuka suna gudu"). Jafananci ba ya nuna fi'ili don yarjejeniyar mai magana.
- Hasashe: LM da aka horar kafin akan Jafananci (L1) na iya samun ra'ayi mai rauni na farko don koyon wannan siffar yarjejeniya a cikin Turanci idan aka kwatanta da LM da aka horar kafin akan Faransanci (wanda ke da yarjejeniya).
- Gwajin Bincike: Bayan horar da L2, gabatar da tsarin tare da nau'i-nau'i mafi ƙanƙanta daga BLiMP:
- Nahawu: "Maɓalli zuwa ga cabinets yana kan tebur."
- Ba Nahawu ba: "Maɓalli zuwa ga cabinets suna kan tebur."
- Ma'auni: Kwatanta yuwuwar da tsarin ya ba da siffar fi'ili daidai da wanda ba daidai ba. Ƙaramin tazarar yuwuwar ga tsarin Ja-L1 idan aka kwatanta da tsarin Fr-L1 zai tabbatar da hasashen canja wuri mara kyau daga L1 mara yarjejeniya.
Wannan tsarin yana ba da damar ware canja wurin takamaiman siffofi na nahawu bisa daidaiton tsarin L1-L2.
10. Aikace-aikace na Gaba & Jagorori
- Ingantaccen Tsarin Harshe Mai Ƙarancin Albarkatu: Zaɓi harshen "uwa" mai albarkatu mai yawa, mai kamancen nau'in harshe bisa dabarun don horo kafin kafin daidaitawa akan ainihin harshen manufa mai ƙarancin albarkatu, inganta ingancin bayanai.
- Kayan Aikin Koyon Harshe Na Musamman: Haɓaka malamai na AI waɗanda ke daidaita dabarun koyarwa bisa harshen asali na mai koyo, yana hasashen wuraren wahala (misali, amfani da labari ga masu magana da Rashanci) kamar yadda al'amuran canja wurin LM suka bayyana.
- LLMs Masu Fassara Harsuna Da Yawa: Yi amfani da tsarin canja wurin L1-L2 a matsayin saitin gwaji mai sarrafawa don ware da kuma nuna abin da ilimin harshe aka adana kuma aka canja wuri a cikin sigogin tsarin, haɓaka fassarar tsarin.
- Tabbatar da Ilimin Harshe na Jijiyoyi: Haɗin gwiwa tare da masana kimiyyar fahimi don kwatanta hanyoyin koyon L2 na LM (misali, ƙirar kuskure, filayen koyo) tare da hoton kwakwalwar ɗan adam ko bayanan hali, gwada ka'idodin kwamfuta na koyon harshe.
- Tsare-tsare Masu Ƙarfi, Marasa Mantawa na Harsuna Da Yawa: Bincike cikin algorithms na koyo na ci gaba waɗanda ke ba da damar LM ya sami harsuna da yawa a jere ba tare da lalata ƙwarewar harshe da ya gabata ba, yana matsawa zuwa ga AI na polyglot na gaskiya.
11. Nassoshi
- Oba, M., Kuribayashi, T., Ouchi, H., & Watanabe, T. (2023). Second Language Acquisition of Neural Language Models. arXiv preprint arXiv:2306.02920.
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT.
- Chiswick, B. R., & Miller, P. W. (2004). Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages. Journal of Multilingual and Multicultural Development.
- Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What Does BERT Look At? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP.
- Alayrac, J., et al. (2022). Flamingo: a Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems.
- Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems.
- Papadimitriou, I., & Jurafsky, D. (2020). Pretraining on Non-English Data Improves Cross-lingual Generalization. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the ACL.