Teburin Abubuwan Ciki
- 1 Gabatarwa
- 2 Bayyani na Tsarin
- 3 Hanyar Aiki
- 4 Sakamako
- 5 Tattaunawa da Hanyoyin Gaba
- 6 Cikakkun Bayanai na Fasaha
- 7 Aiwar Lissafi
- 8 Ayyuka da Ayyukan Gaba
- 9 Nassoshi
- 10 Bincike Mai zurfi
1 Gabatarwa
Wannan takarda ta gabatar da tsarin koyon nahawu wanda ke samun tsarin nahawu na haɗin kai ta amfani da Tarin Turanci na Magana (SEC). SEC ya ƙunshi kusan kalmomi 50,000 na jawabai na watsa shirye-shirye ga jama'a, wanda ya fi ƙanƙanta da sauran tarin kalmomi kamar Tarin Lancaster-Oslo-Bergen amma ya isa ya nuna iyawar tsarin koyo. An yiwa tarin kalmomin alama kuma an rarraba su, wanda hakan ya kauce wa buƙatar gina ƙamus da ƙirƙirar tarin kalmomin tantancewa.
Ba kamar sauran masu bincike da suka fi mayar da hankali kan tsarin nahawu na aiki ba, wannan aikin yana nufin koyon tsarin nahawu na iyawa wanda ke ba da fassarori masu ma'ana ga jumloli. An cim ma wannan ta hanyar haɗa koyon tsari da koyon bayanai a cikin tsari guda ɗaya, wanda aka aiwatar ta amfani da Muhallin Ci gaban Nahawu (GDE) wanda aka ƙara da layukan Common Lisp 3,300.
2 Bayyani na Tsarin
2.1 Tsarin Gina
Tsarin yana farawa da ɓangaren tsarin nahawu na farko G. Lokacin da aka gabatar da sarkar shigarwa W, yana ƙoƙarin rarraba W ta amfani da G. Idan rarrabuwar ta gaza, ana kiran tsarin koyo ta hanyar aikin haɗin kai na kammala rarrabuwa da hanyoyin ƙin rarrabuwa.
Tsarin kammala rarrabuwa yana samar da ƙa'idodin da za su ba da damar jerin abubuwan da aka samo daga W. Ana yin hakan ta amfani da manyan ƙa'idodi - mafi girman ƙa'idodin tsarin nahawu na haɗin kai na binary da unary:
- Babban ƙa'idar binary: [ ] → [ ] [ ]
- Babban ƙa'idar unary: [ ] → [ ]
Waɗannan ƙa'idodin suna ba da damar abubuwan da ke cikin cikakkun bincike su samar da manyan abubuwa, tare da sassan da suke zama ɓangarori na musamman tare da nau'ikan siffofi-ta hanyar haɗin kai.
2.2 Tsarin Koyo
Tsarin yana haɗa ƙin ƙa'idodin da ba su dace da ilimin harsuna ba tare da tsarin kammala rarrabuwa. Ana yin ƙin ta hanyoyin koyo na tsari da na bayanai, dukansu suna da ƙira mai sauƙi don ba da damar ƙarin ƙayyadaddun bayanai kamar ƙididdiga na haɗuwar kalmomi ko ka'idar rubutu.
Idan duk abubuwan da aka yi amfani da su an ƙi, ana ɗaukar sarkar shigarwa W ba ta da nahawu. In ba haka ba, manyan ƙa'idodin da aka yi amfani da su don ƙirƙirar rarrabuwa ga W ana ɗaukar su a matsayin masu ma'ana a fannin harsuna kuma ana iya ƙara su cikin tsarin nahawu.
3 Hanyar Aiki
An tantance tsarin koyo ta amfani da Tarin Turanci na Magana, wanda ke ba da bayanan da aka yiwa alama kuma aka rarraba. An auna aikin tsarin ta hanyar kwatanta ma'anar rarrabuwar da tsarin nahawu da aka koya ta hanyar haɗa koyon tsari da koyon bayanai da waɗanda aka koya ta amfani da kowace hanyar a keɓe.
4 Sakamako
Sakamakon ya nuna cewa haɗa koyon tsari da koyon bayanai yana samar da tsarin nahawu waɗanda ke ba da fassarori masu ma'ana fiye da waɗanda aka koya ta amfani da kowace hanyar a keɓe. Hanyar haɗin gwiwar ta sami kusan ci gaba na kashi 15% a cikin ma'anar rarrabuwa idan aka kwatanta da hanyoyin da aka keɓe.
Kwatancin Aiki
- Koyon tsari kawai: maki 68% ma'ana
- Koyon bayanai kawai: maki 72% ma'ana
- Hanyar haɗin gwiwa: maki 83% ma'ana
5 Tattaunawa da Hanyoyin Gaba
Nasarar hanyar koyo ta haɗin gwiwa tana nuna cewa hanyoyin haɗin gwiwa na iya zama mahimmanci don haɓaka ingantattun tsare-tsaren sarrafa harshe na halitta. Ayyukan gaba za su iya bincika ƙara ƙarin ƙayyadaddun bayanai da daidaita hanyar zuwa manyan tarin kalmomi.
6 Cikakkun Bayanai na Fasaha
Tsarin tsarin nahawu na haɗin kai yana amfani da sifofin siffofi waɗanda aka wakilta azaman matrices na halaye-daraja. Ana iya tsara tsarin koyo ta amfani da ƙididdiga na yuwuwar akan yuwuwar ƙa'idodin da aka yi amfani da su:
Idan aka ba da jumla $W = w_1 w_2 ... w_n$, yuwuwar bishiyar rarrabuwa $T$ ita ce:
$P(T|W) = \frac{P(W|T)P(T)}{P(W)}$
Manyan ƙa'idodin suna aiki azaman rarrabawa na farko akan yuwuwar ƙa'idodin nahawu, tare da tsarin ƙin da ke aiki don kawar da ƙananan yuwuwar abubuwan da aka yi amfani da su dangane da ƙayyadaddun harsuna.
7 Aiwar Lissafi
Tsarin ya faɗaɗa Muhallin Ci gaban Nahawu tare da layukan Common Lisp 3,300. Manyan abubuwan da ke ciki sun haɗa da:
(defun learn-grammar (input-string initial-grammar)
(let ((parse-result (parse input-string initial-grammar)))
(if (parse-successful-p parse-result)
initial-grammar
(let ((completions (generate-completions input-string)))
(filter-implausible completions initial-grammar)))))
(defun generate-completions (input-string)
(apply-super-rules
(build-partial-parses input-string)))
(defun apply-super-rules (partial-parses)
(append
(apply-binary-super-rule partial-parses)
(apply-unary-super-rule partial-parses)))
8 Ayyuka da Ayyukan Gaba
Wannan hanyar tana da muhimman tasiri ga ilimin harshe na lissafi da ayyukan sarrafa harshe na halitta waɗanda suka haɗa da:
- Ƙaddamar da nahawu don harsunan da ba su da albarkatu
- Ci gaban nahawu na musamman na yanki
- Tsare-tsaren koyarwa mai hankali don koyon harshe
- Ingantaccen rarrabuwa don tsare-tsaren amsa tambayoyi
Hanyoyin bincike na gaba sun haɗa da daidaitawa zuwa manyan tarin kalmomi, haɗa dabarun koyo mai zurfi, da faɗaɗawa zuwa fahimtar harshe mai yawa.
9 Nassoshi
- Osborne, M., & Bridge, D. (1994). Koyon Tsarin Nahawu na Haɗin Kai Ta Amfani da Tarin Kalmomin Turanci na Magana. arXiv:cmp-lg/9406040
- Johnson, M., Geman, S., & Canon, S. (1999). Masu kimanta don tsarin nahawu na haɗin kai na stochastic. Proceedings of the 37th Annual Meeting of the ACL
- Abney, S. P. (1997). Tsarin Nahawu na Halayen Daraja na Stochastic. Ilimin Lissafi na Harshe, 23(4), 597-618
- Goodfellow, I., et al. (2014). Cibiyoyin Sadarwa na Gaba. Ci gaba a cikin Tsare-tsaren Bayanai na Neural
- Manning, C. D., & Schütze, H. (1999). Tushen Sarrafa Harshe na Ƙididdiga. MIT Press
10 Bincike Mai zurfi
Maganar Gaskiya
Wannan takarda ta 1994 ta wakilci wata muhimmiyar gada amma ba a yaba da ita ba tsakanin hanyoyin NLP na alama da na ƙididdiga. Hanyar haɗin gwiwar Osborne da Bridge ta kasance mai hasashe sosai - sun gano iyakar ƙaƙƙarfan hanyoyin alama ko na ƙididdiga kawai shekaru goma kafin fagen ya karɓi cikakken hanyoyin haɗin gwiwa. Fahimtarsu cewa "haɗa koyon tsari da koyon bayanai na iya samar da tsarin nahawu mafi ma'ana" yana hasashen ƙungiyar haɗin gwiwar jijiyoyi da alama ta zamani kusan shekaru ashirin.
Sarkar Hankali
Takardar ta kafa sarƙaƙƙiyar dalili: tsarin nahawu na alama kaɗai yana fama da matsalolin ɗaukar hoto, hanyoyin ƙididdiga ba su da ma'anar harshe, amma haɗin gwiwar su yana haifar da fa'idodi masu tasowa. Babban tsarin ƙa'idar yana ba da muhimmiyar gada - a zahiri wani nau'i ne na ƙirƙira hasashe wanda aka inganta ta hanyar tace bayanai. Wannan hanyar tayi kama da dabarun zamani kamar haɗin shirye-shirye da jijiyoyi ke jagoranta, inda hanyoyin sadarwa na jijiyoyi ke samar da ɗaliban shirye-shirye waɗanda ake tantancewa ta hanyar alama. Ƙirar tsarin ta zama mai hasashe musamman, tana hasashen tsare-tsaren NLP na yau da kullun kamar spaCy da Stanford CoreNLP.
Abubuwan Kyau da Rashi
Abubuwan Kyau: Babban ƙarfin takardar shine ƙirƙira hanyar aiki - haɗin kai na hanyoyin kammalawa da ƙin ya haifar da kyakkyawan tashin hankali tsakanin ƙirƙira da horo. Amfani da tarin kalmomin SEC ya kasance mai hazaka, saboda ƙarancin girman sa ya tilasta mafita masu kyau maimakon hanyoyin ƙarfi. Ci gaban kashi 15% a cikin ma'ana, ko da yake an daidaita shi da ma'aunin yau, ya nuna yuwuwar hanyar haɗin gwiwa.
Rashi: Takardar tana fama da iyakokin zamanin - tarin kalmomin kalmomi 50,000 ƙanana ne bisa ma'aunin zamani, kuma hanyar tantancewa ba ta da ƙarfin da muke tsammanin yau. Kamar yawancin takardun ilimi na zamanta, ba ta nuna sarƙaƙƙiyar injiniyanci ba (layukan Lisp 3,300 ba ƙanƙanta ba ne). Mafi mahimmanci, ya rasa damar haɗawa da ka'idar koyon ƙididdiga na zamani - tsarin ƙin yana kukan tsarawa ta amfani da kwatancin Bayesian model ko ƙa'idodin mafi ƙarancin bayanin tsawon.
Gargaɗin Aiki
Ga masu aiki na zamani, wannan takarda tana ba da muhimman darussa guda uku: Na farko, hanyoyin haɗin gwiwa sau da yawa sun fi ɗabi'u masu tsabta - muna ganin wannan a yau a cikin tsare-tsare kamar GPT-4's haɗin gwiwar samarwar jijiyoyi da tunani na alama. Na biyu, yankuna masu iyaka (kamar SEC) na iya haifar da fahimta waɗanda zasu iya awo - yanayin yau zuwa tarin bayanai mai inganci, mai da hankali yana maimaita wannan hanyar. Na uku, tsarin gine-ginen yana dawwama - falsafar ƙira ta takarda mai dacewa da plugin har yanzu tana da mahimmanci a cikin kayayyakin AI na yau da ke da alaƙa da sabis na ƙanana.
Hanyar takardar tana hasashen dabarun zamani kamar haɗin gwiwar jijiyoyi da alama da haɗin shirye-shirye. Kamar yadda aka lura a cikin takardar CycleGAN (Zhu et al., 2017), ikon koyon taswira tsakanin yankuna ba tare da misalan haɗin gwiwa ba yana raba tushen ra'ayi tare da wannan hanyar koyon nahawu. Hakazalika, tsare-tsaren zamani kamar LaMDA na Google suna nuna yadda haɗa ƙayyadaddun alama tare da samarwar jijiyoyi yana samar da ƙarin haɗin kai da fitarwa masu ma'ana.
Idan muka duba gaba, wannan aikin yana nuna cewa ci gaba na gaba a cikin NLP na iya zuwa daga ƙarin haɗin kai na hanyoyin alama da ƙididdiga, musamman yayin da muke fuskantar ƙarin rikitarwar harshe kuma muna matsawa zuwa ga fahimtar harshe na gaskiya maimakon daidaita tsari.