1. Gabatarwa & Bayyani
Wannan takarda tana bincika takarda mai mahimmanci ta 2016 "SQuAD: Tambayoyi 100,000+ don Fahimtar Teku ta Injin" na Rajpurkar da sauransu daga Jami'ar Stanford. Takardar ta gabatar da Tsarin Bayanan Tambayoyi na Stanford (SQuAD), babban ma'auni mai inganci don fahimtar karatu ta injin (MRC). Kafin SQuAD, fagen ya sha wahala saboda tsarin bayanai da suke ko dai ƙanƙanta sosai don samfuran zamani masu buƙatar bayanai ko kuma na roba kuma ba su nuna ainihin ayyukan fahimta ba. SQuAD ta magance wannan gibi ta hanyar samar da fiye da 100,000 nau'i-nau'i na tambaya da amsa bisa labaran Wikipedia, inda kowane amsa ya zama ɓangaren rubutu mai ci gaba (wani yanki) daga rubutun da ya dace. Wannan zaɓin ƙira ya haifar da aiki mai ma'ana, amma mai ƙalubale, wanda tun daga lokacin ya zama ginshiƙi don kimanta samfuran NLP.
2. Tsarin Bayanan SQuAD
2.1 Gina Tsarin Bayanai & Ƙididdiga
An gina SQuAD ta amfani da ma'aikatan jama'a akan Amazon Mechanical Turk. An gabatar da ma'aikatan da sakin layi na Wikipedia kuma an nemi su gabatar da tambayoyin da za a iya amsawa ta wani yanki a cikin wannan sakin layi, kuma su haskaka yankin amsar. Wannan tsari ya haifar da tsarin bayanai tare da waɗannan mahimman ƙididdiga:
107,785
Nau'i-nau'i na Tambaya da Amsa
536
Labaran Wikipedia
~20x
Girma fiye da MCTest
An raba tsarin bayanan zuwa tsarin horarwa (misalai 87,599), tsarin ci gaba (misalai 10,570), da tsarin gwaji na ɓoye da ake amfani da shi don kimanta jagorar hukuma.
2.2 Muhimman Halaye & Ƙira
Sabon abu na SQuAD yana cikin tsarinsa na amsa bisa yanki. Ba kamar tambayoyin zaɓi da yawa (misali, MCTest) ko tambayoyin salon rufewa (misali, tsarin bayanan CNN/Daily Mail) ba, SQuAD tana buƙatar samfuran su gano ainihin farkon da ƙarshen alamun amsar a cikin wani rubutu. Wannan tsari:
- Yana Ƙara Wuyar Gaske: Dole ne samfuran su kimanta duk yuwuwar yankuna, ba ƴan ƴan takara kawai ba.
- Yana Ba da Damar Kimantawa Mai Daidaito: Amsoshi suna da haƙiƙa (rubutu ya yi daidai), yana ba da damar kimantawa ta atomatik ta amfani da ma'auni kamar Daidaitaccen Daidaito (EM) da maki F1 (juzu'in token).
- Yana Nuna Tambayoyin QA na Gaskiya: Yawancin tambayoyin gaskiya a cikin yanayin duniya suna da amsoshi waɗanda suke ɓangaren rubutu.
Hoto na 1 a cikin takarda yana kwatanta nau'i-nau'i na tambaya da amsa, kamar "Me ke haifar da ruwan sama ya faɗo?" tare da amsar "nauyi" da aka ciro daga rubutun.
3. Bincike & Hanyoyin Aiki
3.1 Wuyar Tambaya & Nau'ukan Tunani
Marubutan sun yi bincike na inganci da ƙididdiga akan tambayoyin. Sun rarraba tambayoyin bisa dangantakar harshe tsakanin tambaya da jumlar amsa, ta amfani da tsawon bishiyar dogaro. Misali, sun auna tazarar a cikin bishiyar fassarar dogaro tsakanin kalmar tambaya (misali, "me," "ina") da kalmar shugaban yankin amsar. Sun gano cewa tambayoyin da ke buƙatar hanyoyin dogaro masu tsayi ko ƙarin sauye-sauye na tsari (misali, sake fasalin kalma) sun fi wahala ga samfurin tushensu.
3.2 Samfurin Tushe: Samfurin Regression na Logistic
Don kafa tushe, marubutan sun aiwatar da samfurin regression na logistic. Ga kowane ɗan takara a cikin wani rubutu, samfurin ya ƙididdige maki bisa ga cikakken tsarin siffofi, gami da:
- Siffofin Ƙamus: Juzu'in kalma, daidaitawar n-gram tsakanin tambaya da yanki.
- Siffofin Tsari: Siffofin hanyar bishiyar dogaro da ke haɗa kalmomin tambaya zuwa kalmomin amsa na ɗan takara.
- Siffofin Daidaitawa: Ma'auni na yadda tambaya da jumlar da ke ɗauke da ɗan takara suka daidaita.
Manufar samfurin ita ce zaɓi yankin da ya fi girma maki. Aikin wannan samfurin da aka ƙera siffa ya ba da muhimmin tushe mara jijiya ga al'umma.
4. Sakamakon Gwaji
Takardar ta ba da rahoton waɗannan mahimman sakamako:
- Tushe (Daidaitawar Kalma Mai Sauƙi): Ya sami makin F1 na kusan 20%.
- Samfurin Regression na Logistic: Ya sami makin F1 na 51.0% da makin Daidaitaccen Daidaito na 40.0%. Wannan ya wakilci ci gaba mai mahimmanci, yana nuna ƙimar siffofi na tsari da ƙamus.
- Aikin ɗan Adam: An kimanta akan wani yanki, masu tantance ɗan adam sun sami makin F1 na 86.8% da EM na 76.2%.
Babban gibin tsakanin tushe mai ƙarfi (51%) da aikin ɗan adam (87%) a fili ya nuna cewa SQuAD ta gabatar da babban ƙalubale mai ma'ana ga bincike na gaba.
5. Cikakkun Bayanai na Fasaha & Tsarin Aiki
Babban ƙalubalen ƙirar a cikin SQuAD an tsara shi azaman matsalar zaɓin yanki. Idan aka ba da rubutu $P$ tare da alamomi $n$ $[p_1, p_2, ..., p_n]$ da tambaya $Q$, manufar ita ce hasashen farkon alama $i$ da ƙarshen alama $j$ (inda $1 \le i \le j \le n$) na yankin amsar.
Samfurin regression na logistic yana ƙididdige ɗan takara $(i, j)$ ta amfani da siffar siffa $\phi(P, Q, i, j)$ da siffar nauyi $w$:
$\text{maki}(i, j) = w^T \cdot \phi(P, Q, i, j)$
An horar da samfurin don haɓaka yuwuwar yankin daidai. Manyan rukunonin siffofi sun haɗa da:
- Daidaitawar Kalma: Ƙididdigar kalmomin tambaya da ke bayyana a cikin yankin ɗan takara da mahallinsa.
- Hanyar Bishiyar Dogaro: Yana ɓoye mafi guntuwar hanya a cikin bishiyar dogaro tsakanin kalmomin tambaya (kamar "me" ko "wa") da kalmar shugaban amsar ɗan takara. An wakilci hanyar azaman kirtani na alamun dogaro da siffofin kalma.
- Nau'in Amsa: Dabaru bisa kalmar tambaya (misali, ana tsammanin mutum don "wa", wuri don "ina").
6. Bincike Mai Zurfi & Ra'ayi na Masana'antu
Mahimmin Fahimta: SQuAD ba wani tsarin bayanai kawai ba ce; ta kasance mai haɓaka dabarun. Ta hanyar samar da babban ma'auni, mai kimantawa ta atomatik, amma da gaske mai wahala, ta yi wa Fahimtar Karatu abin da ImageNet ta yi wa hangen nesa na kwamfuta: ta ƙirƙiri filin wasa mai daidaito, mai matuƙar mahimmanci wanda ya tilasta wa dukan al'ummar NLP su mai da hankalinsu ga ƙarfinsu na injiniya da bincike. Tushen F1 na 51% ba gazawa ba ce—ta kasance tuta mai haske da aka sanya a kan wani tsauni mai nisa, tana ƙalubalantar fagen ya hau.
Kwararren Tsari: Hankalin takardar yana da ƙwararren hankali na kasuwanci. Na farko, gano gibin kasuwa: tsarin bayanan RC na yanzu ko dai na ƙanana kuma ƙanƙanta (MCTest) ko kuma masu girma amma na roba kuma marasa mahimmanci (CNN/DM). Sa'an nan, ayyana ƙayyadaddun samfur: dole ne ya zama babba (don hanyoyin sadarwar jijiya), mai inganci (ɗan adam ya ƙirƙira), kuma yana da kimantawa mai haƙiƙa (amsoshi bisa yanki). Gina shi ta hanyar taron jama'a. A ƙarshe, tabbatar da samfurin: nuna tushe mai ƙarfi wanda ya isa ya tabbatar da yuwuwar amfani amma ya isa ya bar babban gibin aiki, yana tsara shi a fili a matsayin "matsalar ƙalubale." Wannan shine ƙirƙirar dandali na littafin koyi.
Ƙarfi & Kurakurai: Babban ƙarfinsa shine babban tasirinsa. SQuAD ta haifar da juyin juya halin transformer/BERT kai tsaye; a zahiri an kimanta samfuran ta makin SQuAD su. Duk da haka, kurakuransa sun bayyana daga baya. Ƙuntatawa bisa yanki wani abu ne mai kaifi biyu—yana ba da damar kimantawa mai tsabta amma yana iyakance gaskiyar aikin. Yawancin tambayoyin duniya na gaskiya suna buƙatar haɗawa, tunani, ko amsoshi masu yawa, waɗanda SQuAD ta keɓe. Wannan ya haifar da samfuran da suka zama ƙwararrun "mafarautan yanki," wani lokacin ba tare da zurfin fahimta ba, wani abu da aka bincika daga baya a cikin ayyuka kamar "Me BERT ke kallo?" (Clark da sauransu, 2019). Bugu da ƙari, mayar da hankali kan tsarin bayanan akan Wikipedia ya gabatar da son zuciya da yanke ilimi.
Fahimta Mai Aiki: Ga masu aiki da masu bincike, darasin yana cikin ƙirar tsarin bayanai azaman dabarun bincike. Idan kuna son tafiyar da ci gaba a cikin wani yanki, kada kawai ku gina samfuri mafi kyau kaɗan; ku gina ma'auni na ƙarshe. Tabbatar yana da ma'auni mai bayyana, mai iya ƙididdigewa. Ku shuka shi da tushe mai ƙarfi amma mai iya doke shi. Nasarar SQuAD kuma ta yi gargadin game da wuce gona da iri akan ma'auni guda ɗaya, darasin da fagen ya koya tare da ƙirƙirar masu gaba masu ban sha'awa da ƙalubale kamar HotpotQA (tunani mai yawa) da Tambayoyin Halitta (tambayoyin mai amfani na gaskiya). Takardar ta koya mana cewa mafi tasirin bincike sau da yawa yana ba da ba kawai amsa ba, amma mafi kyawun tambaya mai yuwuwa.
7. Aikace-aikace na Gaba & Jagorori
Tsarin SQuAD ya rinjayi jagorori da yawa a cikin NLP da AI:
- Ƙirar Tsarin Samfuri: Ya motsa tsare-tsare kai tsaye kamar BiDAF, QANet, da hanyoyin kulawa a cikin Masu Canzawa waɗanda suke da mahimmanci ga BERT.
- Bayan Ciro Yanki: Tsarin bayanan masu gaba sun faɗaɗa iyaka. Tambayoyin Halitta (NQ) yana amfani da ainihin tambayoyin binciken Google kuma yana ba da damar amsoshi masu tsayi, eh/a'a, ko marasa amsa. HotpotQA yana buƙatar tunani mai yawa, mai yawa. CoQA da QuAC suna gabatar da QA na tattaunawa.
- QA na Musamman na Yanki: An daidaita tsarin SQuAD don takaddun shari'a (LexGLUE), rubutun likita (PubMedQA), da tallafin fasaha.
- AI Mai Bayyanawa (XAI): Amsar da ta dogara da yanki tana ba da nau'i na bayani na halitta, idan aka iyakance ("amsar tana nan"). Bincike ya gina akan wannan don samar da ƙarin dalilai masu cikakken bayani.
- Haɗawa da Tushen Ilimi: Tsarin gaba za su iya haɗa fahimtar rubutu irin na SQuAD tare da dawo da ilimi mai tsari, suna matsawa zuwa ga ainihin amsa tambaya mai tushen ilimi kamar yadda ake zato ta ayyuka kamar REALM na Google ko RAG na Facebook.
8. Nassoshi
- Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: Tambayoyi 100,000+ don Fahimtar Teku ta Injin. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2383–2392.
- Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: Babban tsarin bayanan hoto mai girma. 2009 IEEE Conference on Computer Vision and Pattern Recognition.
- Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Gina babban tarin rubutun Turanci: Bankin Bishiyar Penn. Ilimin harshe na kwamfuta, 19(2), 313-330.
- Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Koyar da injuna su karanta su fahimta. Ci gaba a cikin tsarin sarrafa bayanai na jijiya, 28.
- Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). Me BERT ke kallo? Bincike kan hankalin BERT. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
- Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., ... & Petrov, S. (2019). Tambayoyin Halitta: Ma'auni don Binciken Amsa Tambaya. Transactions of the Association for Computational Linguistics, 7, 452-466.