Tsarin Abubuwan Ciki
1. Gabatarwa
A duniyar yau ta nau'i-nau'i da harsuna daban-daban, fahimtar bayanai tsakanin nau'ikan da harsuna daban-daban yana da muhimmanci. Yayin da Horarwar Duba-Maganar (VLP) ta Ingilishi ta sami babban nasara, fadada waɗannan iyawa zuwa harsunan da ba Ingilishi ba yana gabatar da ƙalubale masu yawa. Hanyoyin gargajiya na Horarwar Duba-Maganar na Harsuna Daban-daban (M-VLP) suna buƙatar albarkatun lissafi masu yawa kuma ba su da sassauci don fadada zuwa sabbin harsuna.
Wannan takarda ta gabatar da tsarin Samun Harsuna Daban-daban (MLA), wanda aka yi wahayi daga tsarin koyon harshe na ɗan adam. Ba kamar samfuran M-VLP na gargajiya waɗanda ke sarrafa harsuna da yawa lokaci guda a cikin samfuri ɗaya ba, MLA tana fadada samfuran VLP na harshe ɗaya da ke akwai zuwa iyawar harsuna daban-daban cikin inganci ta hanyar mai rikodin koyon harshe mai sauƙi.
Ingancin Albarkatu
MLA tana buƙatar ƙaramin bayanan horarwa na harsuna daban-daban idan aka kwatanta da hanyoyin M-VLP na gargajiya
Tanadin Lissafi
Yana rage buƙatun lissafi yayin da yake riƙe da aikin ƙwararru na zamani
Sassaucin Harshe
Yana ba da damar fadada sassauci zuwa sabbin harsuna ba tare da lalata aikin kan harsunan asali ba
2. Hanyoyin Bincike
2.1. Tsarin Samun Harsuna Daban-daban (MLA)
Tsarin MLA ya ƙunshi manyan sassa uku: samfurin VLP na harshe ɗaya da aka riga aka horar, mai rikodin koyon harshe mai sauƙi, da dabarun horarwa na mataki biyu. Tsarin yana amfani da samfuran VLP na harshe ɗaya da ke akwai (kamar CLIP ko ALIGN) a matsayin ginshiƙi kuma yana ƙara ƙananan sigogi don daidaitawa zuwa harsuna daban-daban.
2.2. Mai Rikodin Koyon Harshe
Ana aiwatar da mai rikodin koyon harshe ta hanyar shigar da masu koyon harshe masu sauƙi a cikin mai rikodin harshe ɗaya da aka riga aka horar. An ƙera waɗannan masu koyo don zama masu ingancin sigogi yayin da suke kama taswirar ma'anar tsakanin harsuna cikin inganci. Mai rikodin yana kiyaye sigogin asali na samfurin VLP na harshe ɗaya a tsaye yayin horarwa.
2.3. Dabarun Horarwa na Mataki Biyu
Tsarin horarwa yana bin matakai daban-daban guda biyu:
- Matakin Canja Harshen Asali: Samfurin yana koyon daidaita sabbin harsuna tare da harshen asali (yawanci Ingilishi) ta hanyar kulawar tsakanin harsuna
- Matamin Bayyanar da Harshe: Samfurin yana hulɗa kai tsaye tare da bayanan nau'i-nau'i a cikin harshen da ake nufi, kama da koyon nutsewar harshe na ɗan adam
Manufar horarwa ta haɗu da asarar bambanci tsakanin nau'ikan (cross-modal contrastive loss) da asarar daidaitawar tsakanin harsuna (cross-lingual alignment loss): $\mathcal{L} = \lambda_1 \mathcal{L}_{cm} + \lambda_2 \mathcal{L}_{cl}$ inda $\mathcal{L}_{cm}$ shine asarar bambanci tsakanin wakilcin gani da na rubutu, kuma $\mathcal{L}_{cl}$ shine asarar daidaitawar tsakanin harsuna.
3. Gwaje-gwaje & Sakamako
3.1. Tsarin Gwajin
An gudanar da gwaje-gwaje akan ma'auni na dibon bayanai na hoto-rubutu da bidiyo-rubutu na harsuna daban-daban da yawa, gami da Multi30K, fadadawar MSCOCO na harsuna daban-daban, da rukunoni na harsuna daban-daban na HowTo100M. An kimanta samfurin da ma'auni na M-VLP na zamani waɗanda suka haɗa da MURAL, UC2, da M3P.
3.2. Aiki akan Dibon Bayanai na Harsuna Daban-daban
MLA tana samun aiki mai gasa ko mafi girma idan aka kwatanta da samfuran M-VLP na gargajiya yayin amfani da kashi 20-30% kawai na bayanan horarwa na harsuna daban-daban. Sakamako masu mahimmanci sun haɗa da:
- Dibon bayanai na hoto-rubutu: Ci gaba 5-8% akan ma'auni akan harsunan da ba Ingilishi ba
- Dibon bayanai na bidiyo-rubutu: Ci gaba na aiki a kan harsuna daban-daban
- Canja wuri ba tare da gwaji ba (Zero-shot): Kyakkyawan aiki akan nau'ikan harsuna da ba a gani ba
3.3. Nazarin Cire Sassa
Nazarin cire sassa ya tabbatar da muhimmancin duka matakan horarwa da ƙirar mai rikodi mai sauƙi. Cire kowane mataki yana haifar da lalacewar aiki sosai, musamman ga harsunan da ba su da albarkatu.
4. Nazarin Fasaha & Fahimta
Babban Fahimta
Tsarin MLA yana wakiltar canji a tsarin koyon nau'i-nau'i na harsuna daban-daban. Maimakon dabarun ƙarfi na horar da manyan samfura akan duk harsuna lokaci guda—kama da falsafar "mafi girma yana da kyau" wacce ta mamaye farkon ilimin zurfin—MLA tana ɗaukar dabarar da ta fi dacewa da inganci. Ta gane cewa samun harshe a cikin AI, kamar yadda yake a cikin mutane, yana amfana daga amfani da tsarin ilimi da ke akwai. Wannan hanya ta yi daidai da binciken binciken canja wuri a cikin hangen nesa na kwamfuta, inda samfura kamar ResNet suka nuna cewa sake amfani da sifofin da aka koya yana da inganci fiye da koyo daga farko (He et al., 2016). Wahayin halittu na tsarin—kwaikwayon koyon harshe na ɗan adam—ba kawai waka ba ce; yana da inganci a aikace, yana rage buƙatun lissafi da yawa yayin riƙe da aikin gasa.
Kwararar Ma'ana
Hujjar takardar tana bin ci gaba mai ma'ana: gano iyakokin M-VLP na yanzu (farashin lissafi, rashin sassauci), sami wahayi daga kimiyyar fahimi (samun harshe na ɗan adam), gabatar da sabon tsari (masu koyon harshe masu sauƙi), aiwatar da dabarun horarwa da aka yi wahayi daga halittu (koyo na mataki biyu), da tabbatarwa tare da gwaje-gwaje masu ƙarfi. Wannan kwararar tana kama da nasarar tsarin binciken AI da aka gani a cikin takardun ci gaba kamar Transformer na asali (Vaswani et al., 2017), wanda kuma ya gano iyaka (sarrafa jeri a cikin RNNs), ya gabatar da sabon mafita (hanyoyin kulawa), da tabbatarwa tare da sakamako mafi girma. Haɗin da tsarin koyon ɗan adam yana ƙarfafa tushen ka'idar takardar, kama da yadda hanyoyin da aka yi wahayi daga kimiyyar kwakwalwa suka ci gaba da hangen nesa na kwamfuta.
Ƙarfi & Kurakurai
Ƙarfi: Ingancin lissafi na tsarin shine siffarsa mai kashewa. A cikin zamani da tasirin AI ke ƙarƙashin bincike (Strubell et al., 2019), hanyoyin da ke rage farashin horarwa da kashi 70-80% yayin riƙe da aiki sun cancanci kulawa. Sassaucin ƙara sabbin harsuna ba tare da mantuwa mai tsanani ba yana magance babban iyaka na samfuran M-VLP na yanzu. Dabarun horarwa na mataki biyu tana nuna fahimta mai zurfi game da yanayin koyon harshe.
Kurakurai: Takardar ba ta bincika iyakokin tsarin da harsunan da ke da nisa sosai ba. Yayin da ta nuna nasara tare da harsunan Turai da wasu harsunan Asiya, aikin akan harsunan da ba su da albarkatu ko nau'ikan nau'ikan bai tabbata ba. Kimantawa ya mai da hankali sosai akan ayyukan dibon bayanai; fahimtar nau'i-nau'i mafi faɗi (bayani, VQA) suna buƙatar ƙarin bincike. Kamar yawancin hanyoyin masu inganci, za a iya samun iyaka na aiki idan aka kwatanta da hanyoyin sake horarwa gabaɗaya don wasu nau'ikan harsuna.
Fahimta Mai Aiki
Ga masu aiki: Wannan tsarin yana ba da tsari don fadada samfuran VLP na Ingilishi da ke akwai zuwa sabbin kasuwanni tare da ƙayyadaddun albarkatu. Kamfanoni masu tsarin nau'i-nau'i na Ingilishi da aka tura zasu iya amfani da MLA don faɗaɗa ƙasashen waje ba tare da cikakken sake horarwa ba. Ga masu bincike: Hanyar da aka yi wahayi daga koyon ɗan adam tana ba da shawarar bincika wasu ƙa'idodin fahimi don ingancin AI. Tsarin adafta mai sauƙi za a iya fadada shi zuwa wasu yankuna na nau'i-nau'i (sauti-gani, taɓa-gani). Dabarun horarwa na mataki biyu tana buƙatar bincike a wasu yanayin canja wurin koyo. Mafi mahimmanci, wannan aikin ya nuna cewa AI na harsuna daban-daban baya buƙatar manyan samfura guda ɗaya—hanyoyin masu inganci, na zamani zasu iya samun irin wannan sakamako tare da ƙaramin albarkatu, wani muhimmin fahimta don demokratiziyar AI a cikin harsuna.
5. Ayyuka na Gaba & Jagorori
Tsarin MLA yana buɗe jagorori masu ban sha'awa da yawa don bincike da ayyuka na gaba:
- Daidaitawar Harshe na Lokaci Gaskiya: Ƙara sabbin harsuna a cikin tsarin da aka tura ba tare da katse sabis ba
- Tallafin Harsunan Ƙaramin Albarkatu: Fadadawa zuwa harsunan da ke da ƙayyadaddun bayanan nau'i-nau'i masu kama da juna
- Ƙirƙirar Abun ciki Tsakanin Nau'ikan: Samar da hoto da bidiyo na harsuna daban-daban daga bayanin rubutu
- Ayyukan Ilimi: Kayan aikin koyon harshe waɗanda ke amfani da mahallin nau'i-nau'i
- Magani na Kamfani: Tsarin daidaita abun ciki da bincike na harsuna daban-daban masu tsada
Bincike na gaba ya kamata ya bincika dokokin sikelin mai rikodin koyon harshe, haɗawa tare da manyan samfuran tushe, da ayyuka a cikin tsarin tattaunawa na nau'i-nau'i.
6. Nassoshi
- Zhang, L., Hu, A., & Jin, Q. (2022). Generalizing Multimodal Pre-training into Multilingual via Language Acquisition. arXiv preprint arXiv:2206.11091.
- Jain, A., et al. (2021). MURAL: Multimodal, Multitask Retrieval Across Languages. arXiv preprint arXiv:2109.05125.
- Radford, A., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.
- Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS.
- He, K., et al. (2016). Deep Residual Learning for Image Recognition. CVPR.
- Strubell, E., et al. (2019). Energy and Policy Considerations for Deep Learning in NLP. ACL.
- Castello, M. (2015). Second Language Acquisition: From Theory to Practice. Cambridge University Press.
- Ni, M., et al. (2021). M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training. CVPR.