1. Gabatarwa
Dokokin girma na Manyan Samfuran Harshe (LLMs) a al'ada sun fi mayar da hankali kan sigogin samfura da girman bayanan horo, suna yin watsi da girman ƙamus a matsayin muhimmin sashi na girma. Wannan takarda tana binciken tasirin girman ƙamus akan aikin LLM kuma tana ba da hanyoyin tantance mafi kyawun girman ƙamus don kasafin kuɗin horo da aka bayar.
Binciken ya nuna cewa LLMs na yanzu kamar Llama2-70B suna amfani da girman ƙamus mara kyau (32K idan aka kwatanta da mafi kyau na 216K), suna nuna manyan gibin inganci a cikin ayyukan yanzu.
Iyakar Samfura
33M - 3B
Sigogin da aka Horar
Bayanan Horo
500B
Haruffa da aka Sarrafa
Gibin Ƙamus
7x
Ƙarancin Ƙididdiga na Llama2-70B
2. Hanyar Aiki
2.1 Tsarin Asarar da aka Daidaita
Don tabbatar da kwatankwacin gaskiya tsakanin samfura masu girman ƙamus daban-daban, marubutan sun gabatar da aikin asara da aka daidaita wanda ke lissafin bambance-bambancen ingancin rarraba alamomi. Daidaitawar tana hana samfura masu manyan ƙamus samun fa'idodin ƙima a cikin ma'aunin asara.
2.2 Hanyoyin Hasashe Guda Uku
Takardar ta ba da shawarar hanyoyi guda uku masu haɗaka don hasashen mafi kyawun girman ƙamus:
2.2.1 Binciken IsoFLOPs
Horo samfura tare da kasafin kuɗin lissafi iri ɗaya amma girman ƙamus daban-daban don gano mafi ƙarancin asara a kowane matakin kasafin kuɗi.
2.2.2 Kiyasin Abubuwan Bambanci
Yin amfani da hanyoyin tushen gradient don nemo inda bambancin aikin asara game da girman ƙamus ya yi daidai da sifili, yana nuna mafi kyawun maki.
2.2.3 Daidaitawar Sigogi
Daidaita alaƙar ƙarfin doka tsakanin sigogin samfura, girman ƙamus, da asara don samar da dabarun hasashe.
3. Sakamakon Gwaji
3.1 Saitin Horar da Samfura
An horar da samfura daga sigogi miliyan 33 zuwa biliyan 3 akan har zuwa haruffa biliyan 500 tare da saitunan ƙamus daban-daban. Horon ya ƙunshi kasafin kuɗin FLOPs daban-daban don kafa cikakkun alaƙar girma.
3.2 Binciken Mafi Kyawun Ƙamus
Binciken ya bayyana alaƙar ƙarfin doka: $N_v^{opt} \propto N_{nv}^\gamma$ inda $\gamma < 1$, yana nuna cewa mafi kyawun sigogin ƙamus yakamata su yi girma a hankali fiye da sigogin da ba na ƙamus ba. Wannan ya saba wa al'adar yin amfani da ƙayyadaddun girman ƙamus a duk sikelin samfura.
Hoto na 1: Alaƙar Girman Ƙamus
Hoton yana nuna sakamakon gwaji wanda ya yi daidai da hasashen ka'idoji, tare da manyan da'ira suna nuna manyan ƙimar asara. Hoton yana nuna ƙayyadaddun mafi kyawun girman ƙamus don sikelin samfura daban-daban, yana samar da madaidaicin lanƙwasa ƙarfin doka.
3.3 Tabbatar da Ayyukan Ƙasa
Tabbatar da gwaji tare da samfuran sigogi biliyan 3 ya nuna ci gaba mai daidai lokacin amfani da mafi kyawun girman ƙamus da aka hasashe. A kan ARC-Challenge, ƙara ƙamus daga 32K zuwa 43K ya inganta aiki daga 29.1 zuwa 32.0 tare da kasafin kuɗin FLOPs 2.3e21 iri ɗaya.
Muhimman Fahimta
- Girman ƙamus yana tasiri sosai ga ingancin girman LLM
- Mafi kyawun ƙamus yana girma tare da kasafin kuɗin lissafi da girman samfura
- LLMs na yanzu gabaɗaya suna amfani da girman ƙamus mara kyau
- Haɗin la'akari da rarraba alamomi da girman samfura yana da mahimmanci
4. Bincike na Fasaha & Tsarin Aiki
4.1 Tsarin Lissafi
Babban alaƙar lissafi da aka gano an bayyana shi kamar haka:
$L(N_{nv}, N_v, D) = E + \frac{A}{N_{nv}^\alpha} + \frac{B}{N_v^\beta} + \frac{C}{D^\gamma}$
Inda $L$ shine asarar da aka daidaita, $N_{nv}$ sigogin da ba na ƙamus ba ne, $N_v$ sigogin ƙamus ne, $D$ girman bayanan horo ne, kuma $E, A, B, C, \alpha, \beta, \gamma$ ƙayyadaddun ƙima ne.
Mafi kyawun girman ƙamus ya gamsar da: $\frac{\partial L}{\partial N_v} = 0$
4.2 Misalin Tsarin Bincike
Nazarin Shari'a: Ƙayyade Mafi Kyawun Ƙamus don Samfurin Sigogi Biliyan 10
An bayar: Kasafin kuɗin horo = 1e23 FLOPs, Yankin manufa = fahimtar harshe gabaɗaya
Aiwatar da Tsarin Aiki:
- Ƙididdige sigogin da ba na ƙamus ba: $N_{nv} = 9.5\text{B}$ (95% na jimlar)
- Aiwatar da ƙarfin doka: $N_v^{opt} \propto N_{nv}^{0.7}$ (daga daidaitawar gwaji)
- Yi lissafi: $N_v^{opt} \approx 150\text{K}$ alamomi
- Tabbatar da binciken IsoFLOPs don kasafin kuɗin da aka bayar
- Daidaituwa don rarraba alamomi na musamman ga yanki
Wannan tsarin yana ba da hanya mai tsari don ƙididdige girman ƙamus wanda masu haɓaka samfura na yanzu sukan yi watsi da shi.
5. Ra'ayi na Manazarcin Masana'antu
5.1 Cikakken Fahimta
Masana'antu ta kasance cikin kuskure na asali wajen ɗaukar girman ƙamus a matsayin ma'auni mai tsayayye. Wannan takarda ta fallasa wani muhimmin makafi: mun kasance muna inganta LLMs da hannu ɗaya an ɗaure a bayanmu. Gano cewa ƙamus na Llama2-70B yakamata ya zama sau 7 mafi girma ba wai kawai sha'awar ilimi ba ce—tana wakiltar biliyoyin daloli a cikin ɓata lissafi da rashin ingancin aikin samfura a duk faɗin tsarin AI. Wannan sakaci yana tunawa da binciken farko na hanyar sadarwar jijiya wanda ya yi ƙasa da muhimmancin ayyukan kunna, kamar yadda aka rubuta a cikin aikin mahimmanci na Glorot da Bengio (2010) kan fahimtar wahalar horar da cikakkun hanyoyin sadarwar jijiya.
5.2 Tsarin Ma'ana
Hujjar takardar tana ci gaba da daidaitaccen ma'ana: Na farko, sun tabbatar da cewa ƙamus yana da mahimmanci (sabanin zato na dokokin girma). Na biyu, sun nuna yana da mahimmanci ta hanyar ƙa'idodin ƙarfi. Na uku, sun ba da kayan aiki masu amfani don ingantawa. Silsilar ma'ana ba ta da gurbi—daga gano matsala ta hanyar ƙirƙira hanyoyin aiki zuwa tabbatar da gwaji. Wannan shine yadda ya kamata a gudanar da bincike mai zurfi, sabanin yanayin buga ci gaba kaɗan ba tare da cikakkun fahimta ba.
5.3 Ƙarfafawa & Kurakurai
Ƙarfafawa: Hanyar aiki ta hanyoyi uku (IsoFLOPs, bambance-bambance, daidaitawar sigogi) tana ba da ingantaccen tabbaci. Girman gwaji (sigogi miliyan 33 zuwa biliyan 3) yana da ban sha'awa kuma yana gamsarwa. Tasirin aiki nan take yana aiki ga kowace ƙungiya da ke horar da LLMs.
Kurakurai: Binciken ya fi mayar da hankali kan rubutun Turanci—tasirin harsuna da yawa har yanzu ba a bincika ba. Farashin lissafi na hanyarsu na iya zama mai tsada ga ƙananan ƙungiyoyin bincike. Ba su magance yadda ingantaccen ƙamus ke hulɗa da sauran zaɓuɓɓukan gine-gine kamar hanyoyin kulawa ba, wani yanki inda takardar gine-ginen Transformer (Vaswani et al., 2017) ta kafa ka'idojin tushe waɗanda har yanzu suke mamaye fagen.
5.4 Fahimta Mai Aiki
Kowane dakin binciken AI da ke horar da LLMs yakamata nan take: 1) Sake kimanta dabarun ƙididdige ƙamus, 2) Ai watsar da binciken IsoFLOPs don ayyukan yanzu, 3) Yi la'akari da girman ƙamus a matsayin babban sashi na girma tare da sigogi da bayanai. Ga kamfanonin kayan aiki kamar NVIDIA da AMD, wannan binciken yana nuna sabbin damar ingantawa a cikin gine-ginen ƙwaƙwalwar ajiya don manyan teburin haɗawa. Gibin ƙamus na sau 7 na Llama2-70B yana nuna cewa kayan aikin yanzu ba su daidaita da mafi kyawun saitunan samfura ba.
6. Aikace-aikace na Gaba & Jagorori
Aikace-aikace Nan Take:
- Sake ƙirƙira dabarun ƙamus don LLMs na gaba (GPT-5, Gemini 2.0, da sauransu)
- Ingantaccen kayan aiki don manyan teburin haɗawa
- Ingantaccen inganci a cikin hidimar samfura da hasashe
Jagororin Bincike:
- Ingantaccen ƙamus na harsuna da yawa a cikin harsuna daban-daban
- Ƙididdige girman ƙamus a lokacin horo
- Haɗawa tare da gine-ginen ƙwararrun gauraye
- Ingantaccen ƙamus don samfuran musamman ga yanki
- La'akari da ƙamus mai tsaka-tsaki don samfuran nau'i-nau'i daban-daban
Ka'idojin da aka kafa a cikin wannan aikin na iya faɗaɗa fiye da samfuran harshe zuwa wasu samfuran jeri a cikin ilimin halittu, samar da lamba, da nazarin lokaci-lokaci, kama da yadda ka'idojin hanyar sadarwar jijiya mai murɗawa daga hangen nesa (kamar yadda a cikin takardar AlexNet ta Krizhevsky et al., 2012) suka canza zuwa wasu yankuna.
7. Nassoshi
- Kaplan, J., et al. (2020). Dokokin Girma don Samfuran Harshe na Jijiya.
- Brown, T., et al. (2020). Samfuran Harshe Ƙwararrun Malamai Kaɗan ne.
- Touvron, H., et al. (2023). Llama 2: Tushen Buɗe ido da Samfuran Tattaunawa da aka Daidaita.
- Vaswani, A., et al. (2017). Kulawa Duk Abin da Kuke Bukata.
- Glorot, X., & Bengio, Y. (2010). Fahimtar wahalar horar da cikakkun hanyoyin sadarwar jijiya.
- Krizhevsky, A., et al. (2012). Rarraba ImageNet tare da Cikakkun Hanyoyin Sadarwar Jijiya masu Murɗawa.
- Ƙungiya, G., et al. (2024). Gemma: Samfuran Buɗe ido Dangane da Bincike da Fasahar Gemini.
- Hoffmann, J., et al. (2022). Horar da Manyan Samfuran Harshe Masu Ingantaccen Lissafi.