1. Gabatarwa

Koyon harshe a yara yana bin tsari mai mahimmanci: daga raba sautunan magana, zuwa haɓaka ƙamus, sannan kuma zuwa ƙware a cikin sarƙaƙƙiyar tsarin nahawu. Wannan hanyar ci gaban, da aka lura daga jaririya har zuwa shekaru shida, yana tayar da tambayoyi na asali game da ƙa'idodin lissafi na tushe. Shin wannan matakan koyo wani siffa ne na musamman na ilimin halittar jikin ɗan adam, ko kuma yana iya fitowa a cikin tsarin wucin gadi? Wannan binciken ya magance wannan kai tsaye ta hanyar kwatanta hanyoyin koyo na yara 54 (masu shekaru 18 zuwa 6) da na tsarin GPT-2 48 da aka horar daga farko. Babban hasashe shi ne, idan irin wannan matakan sun bayyana a cikin duka biyun, yana iya nuna ƙa'idodin koyo gama-gari, waɗanda bayanai ke tafiyar da su.

2. Hanyar Bincike

Binciken ya yi amfani da tsarin kwatance, yana bincika ɗan adam da na'urorin wucin gadi a matakai daban-daban na ci gabansu.

2.1 Tsarin Gwaji

Yara: An yi nazarin yadda yara 54 ke yin magana. An kimanta maganarsu ta son rai da kuma ikon su na maimaita jimloli masu sarƙaƙƙiyar nahawu daban-daban, bisa hanyoyin da Friedmann da sauransu (2021) suka kafa.

Tsarin GPT-2: An horar da tsarin GPT-2 (nau'in sigogi miliyan 124) guda 48 daga farkon saiti akan manufofin ƙirar harshe na yau da kullun (misali, WebText). An bincika yanayinsu na ciki a lokuta na yau da kullun a duk tsawon horo.

2.2 Tattara Bayanai & Gwaje-gwajen Bincike

An tattara gwaje-gwajen bincike guda 96 daga ma'auni da aka kafa:

  • BLiMP: Don kimanta ilimin nahawu a cikin abubuwan nahawu 67.
  • Zorro: Don bincika tunani na ma'ana da na hankali.
  • BIG-Bench: Don tantance iyawar harshe da fahimi mai faɗi.

An yi amfani da waɗannan gwaje-gwajen a kan tsarin GPT-2 a kowane mataki na horo kuma sun zama ma'auni masu kama da ayyukan samar da yara.

3. Sakamako & Bincike

3.1 Kwatancen Hanyar Koyo

Binciken ya nuna cewa tsarin GPT-2, kamar yara, suna koyon ƙwarewar harshe a cikin tsari mai tsari. Ayyuka masu sauƙi (misali, yarjejeniyar nahawu ta asali) ana ƙware su da farko a cikin horo, yayin da ayyuka masu sarƙaƙi (misali, sarƙaƙƙiyar tsarin nahawu kamar jimlolin dangantaka) suna buƙatar matakan horo da yawa (mai kama da lokacin ci gaba).

3.2 Tsarin Koyo Mai Kama

Wani muhimmin bincike shi ne yanayin kama na koyo. Ko da ayyukan da aka cim ma gaba ɗaya a ƙarshen horo suna nuna ci gaba mai ma'ana tun daga matakan farko. Wannan yana nuna cewa tsarin yana gina wakilci na tushe waɗanda ake inganta su akai-akai, maimakon koyon ƙwarewa a cikin tsari mai tsauri, keɓaɓɓe.

3.3 Matakan Gama-gari da Na Bambance

Binciken ya gano duka maɗauri da bambance-bambance masu mahimmanci:

  • Gama-gari: Babban ci gaba daga siffofi masu sauƙi zuwa mafi sarƙaƙi na nahawu.
  • Bambance: Tsarin takamaiman wasu ƙananan ƙwarewa ya bambanta. Misali, tsarin na iya koyon wasu ƙa'idodin nahawu na yau da kullun a wani tsari daban da na yara, mai yiwuwa saboda bambance-bambance a cikin rarraba bayanan horo da kwarewar fahimta da zamantakewar ɗan adam.

Wannan yana nuna cewa, yayin da matsin lamba na bayanai yana haifar da matakai, cikakkun bayanai na jerin matakan ana daidaita su ta hanyar tsarin mai koyo da shigar da shi.

Mahimman Ma'auni na Gwaji

Tsarin da aka Horar: Tsarin GPT-2 guda 48

Gwaje-gwajen Bincike: Ayyuka 96 daga BLiMP, Zorro, BIG-Bench

Mahalarta Yara: 54 (watanni 18 - shekaru 6)

Babban Bincike: Muhimmiyar alaƙa a cikin tsarin matakan koyo tsakanin yara da tsarin, amma ba iri ɗaya ba.

4. Tsarin Fasaha

4.1 Tsarin Lissafi

Babban manufar koyo na GPT-2 shine hasashen alama na gaba ta hanyar ƙididdiga mafi yiwuwa. Idan aka ba da jerin alamomi $x_1, x_2, ..., x_t$, tsarin da aka ƙayyade ta $ heta$ ana horar da shi don rage mummunan log-likelihood:

$L(\theta) = -\sum_{t} \log P(x_t | x_{

Daidaituwar bincike $A_p(\theta, \tau)$ don takamaiman binciken harshe $p$ a matakin horo $\tau$ yana auna iyawar da ta taso. Hanyar koyo ita ce aikin $\tau \rightarrow \{A_{p_1}(\theta, \tau), A_{p_2}(\theta, \tau), ...\}$. Binciken ya kwatanta tsarin da gwaje-gwaje daban-daban $p$ suka ketare bakin kofa na aiki (misali, daidaito 80%) a cikin $\tau$ don tsarin da kuma a cikin shekaru don yara.

4.2 Misalin Tsarin Bincike

Hali: Bin Didigin Koyon Jimlolin Dangantaka

Aikin Bincike: Bambance nahawu ("Yaron da na gani ya rera waƙa") daga marasa nahawu ("Yaron da na gani ya rera waƙa") jimloli.

Matakan Bincike:

  1. Cire Bayanai: Ga kowane mataki na duba $\tau$, lissafta daidaito akan daidaitaccen saiti na gwaje-gwajen jimlolin dangantaka 100.
  2. Bakin Kofa: Ayyana matakin koyo $\tau_{acquire}$ a matsayin matakin duba na farko inda daidaito > 80% kuma ya ci gaba da kasancewa sama don dubawa na gaba.
  3. Alaƙa: Kwatanta tsarin matsayi na $\tau_{acquire}$ don binciken jimlolin dangantaka da sauran gwaje-gwajen nahawu (misali, yarjejeniyar mai-fi'ili, ƙirar tambaya).
  4. Daidaitawar Dan Adam: Taswira $\tau_{acquire}$ zuwa kewayon shekarun da aka saba (misali, ~ watanni 42) lokacin da yara suka ƙware wannan tsari a cikin samarwa.

Wannan tsarin yana ba da damar kwatanta ƙididdiga na jadawalin ci gaba a cikin tsarin koyo daban-daban na asali.

5. Hoto na Sakamako

Zanen Ra'ayi: Kwatancen Hanyar Koyo

Ana iya nuna sakamakon akan zanen axis biyu:

  • X-Axis (Lokaci): Ga yara, wannan shine Shekaru (watanni). Ga GPT-2, wannan shine Matakan Horo (ma'aunin log).
  • Y-Axis: Daidaituwar Aiki (%) akan ma'auni da aka daidaita.
  • Layuka Da Yawa: Kowane layi yana wakiltar ƙwarewar harshe daban-daban (misali, Bambance Sauti, SVO na Asali, Ƙirar Tambaya, Nahawu Mai Haɗe).

Zanen zai nuna duka hanyoyin suna nuna lanƙwasa na koyo mai siffar S ga kowane ƙwarewa, amma tare da tsari na layukan (wane ƙwarewa ya tashi da farko) yana kama ko da yake ba daidai ba. Wani muhimmin hoto na biyu zai zama taswirar zafi wanda ke nuna matrix na alaƙa na tsarin samun duk gwaje-gwaje 96 don ƙungiyar tsarin da kuma tsarin da aka lura a cikin yara, yana haskaka gungu na babban alaƙa da ƙananan alaƙa.

6. Babban Fahimta & Ra'ayi na Mai Bincike

Babban Fahimta: Wannan takarda tana ba da muhimmin bincike, mai cike da bayanai: matakan koyon harshe ba sirri ne na musamman ga ɗan adam ba amma wani siffa ne da ke fitowa daga ingantaccen ingantawa, wanda bayanai ke tafiyar da shi a ƙarƙashin ƙuntatawa. Duk da haka, tsarin na waɗannan matakan tsarin mai koyo ne ya haɗa shi. GPT-2 da yara sun haɗu a kan "tsarin koyo daga sauƙi zuwa sarƙaƙi" saboda bayanan sun ƙunshi wannan tsarin. Sun bambanta akan cikakkun bayanai saboda "ra'ayoyin shigarwa" na transformer (Vaswani da sauransu, 2017) sun bambanta da farkon fahimta da fahimtar yaro.

Tsarin Ma'ana: Hujja an gina ta da kyau. Ya fara da ingantaccen gaskiyar gwaji (matakan tsari a cikin yara), ya gabatar da tambayar lissafi (shin wannan tsari yana fitowa a cikin AI?), kuma ya yi amfani da ingantacciyar hanyar bincike mai yawa don gwada shi. Matsi daga nuna "tsari yana wanzu" zuwa nazarin "yanayin kama" kuma a ƙarshe zuwa rarraba abubuwan "gama-gari/bambance" yana da ƙarfi a ma'ana. Yana kama da ci gaban nazari a cikin ayyukan tushe kamar takardar CycleGAN (Zhu da sauransu, 2017), wanda bai gabatar da sabon tsari ba kawai amma ya rarraba matsalar fassarar hoto mara biyu zuwa ƙuntatawa na daidaiton zagayowar.

Ƙarfi & Kurakurai: Ƙarfin binciken shine tsauraran hanyoyinsa da kwatancen kai tsaye. Yin amfani da nau'ikan tsari da yawa da babban saitin bincike yana rage hayaniya. Babban aibi, wanda aka yarda da shi a fakaice, shine rashin daidaituwa a cikin ma'auni: samarwa a cikin yara da daidaiton bincike na ciki a cikin tsarin. Shin tsarin "sanin" ƙa'idar nahawu a cikin bincike yana daidai da yaro "amfani" da shi a cikin magana ta son rai? Ba lallai ba ne. Wannan yana kama da sukar ma'auni kamar ImageNet inda tsarin ke koyon gajerun hanyoyi (Geirhos da sauransu, 2020). Saitin bincike, ko da yake mai faɗi, bazai iya ɗaukar haɗin kai, ma'anar sadarwa na koyon harshe na ɗan adam ba.

Fahimta Mai Aiki: Ga masu binciken AI, wannan ma'adinai ne na zinariya don koyon tsarin karatu da binciken tsari. Idan muna son tsarin su koyi kamar ɗan adam, muna buƙatar injiniyan jerin bayanan horo ko ayyukan asara waɗanda suka fi kama da jadawalin ci gaban ɗan adam. Ga masana kimiyyar fahimi, aikin yana ba da sabon wurin gwaji, mai sarrafawa: canza tsarin tsarin (misali, gabatar da haɗin maimaitawa kamar a cikin LSTM) ko bayanan horo (misali, ƙara shigarwa mai yawa), kuma duba yadda hanyar ci gaban ta canza. Wannan zai iya taimakawa ware gudummawar takamaiman ra'ayoyin ɗan adam. Babban fahimta shi ne, gina AI mafi kyau da fahimtar fahimtar ɗan adam yanzu aiki ɗaya ne, haɗe-haɗe.

7. Aikace-aikace na Gaba & Hanyoyi

  • Ma'auni na Ci Gaba don AI: Ƙirƙirar daidaitattun ma'auni na "ci gaban ci gaba" don LLM, matsawa daga kimantawa mai tsayi zuwa nazarin hanyar motsi.
  • Ƙirar Tsarin Karatu Mai Ilimi: Yi amfani da fahimta daga ci gaban yaro don tsara tsarin bayanan horo don ƙarin ingantaccen horon tsari mai inganci, mai yuwuwar rage buƙatun bayanai da lissafi.
  • Ƙirar Tsari: Ƙirƙirar sabbin tsarin hanyoyin sadarwar jijiyoyi waɗanda suka haɗa da hasashen ra'ayoyin fahimtar ɗan adam (misali, dawwama abu, alamun lada na zamantakewa) don ganin ko sun haifar da hanyoyin koyo masu kama da na ɗan adam.
  • Kayan Aikin Asibiti: Haɓaka tsarin AI waɗanda ke bin hanyoyin koyo marasa al'ada (simulating cututtukan harshe na ci gaba) don samar da hasashe da gwada shiga tsakani a cikin silico.
  • Haɗin Kai Mai Yawa: Faɗaɗa wannan binciken zuwa tsarin mai yawa (gani, sauti, rubutu). Shin matakan suna fitowa inda haɗin kai mai yawa (misali, koyon ma'anar kalmomi daga mahallin gani) ya riga ya bi ko ya bi matakan harshe kawai, yana koyi da koyon jariri?

8. Nassoshi

  1. Evanson, L., Lakretz, Y., & King, J. (2023). Koyon harshe: shin yara da tsarin harshe suna bin matakan koyo iri ɗaya? arXiv preprint arXiv:2306.03586.
  2. Friedmann, N., Reznick, J., & et al. (2021). Tsarin samun tsarin nahawu: Nazarin yaran masu magana da Ibrananci. Koyon Harshe.
  3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Hankali shine duk abin da kuke buƙata. Ci gaba a cikin tsarin sarrafa bayanai na jijiyoyi, 30.
  4. Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Fassarar hoto zuwa hoto mara biyu ta amfani da hanyoyin sadarwar adawa na zagayowar. Proceedings of the IEEE international conference on computer vision (shafi na 2223-2232).
  5. Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Koyon gajerun hanyoyi a cikin cibiyoyin sadarwar jijiyoyi mai zurfi. Nature Machine Intelligence, 2(11), 665-673.
  6. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Tsarin harshe masu koyon ayyuka da yawa ba tare da kulawa ba. OpenAI blog, 1(8), 9.
  7. Bowman, S. R., & Dahl, G. E. (2021). Menene zai ɗauki don gyara ma'auni a cikin fahimtar harshe na halitta? Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.