1. Gabatarwa & Bayyani
Wannan binciken yana wakiltar wani bincike mai mahimmanci a mahadar ilimin harshe na kwamfuta da ilimin halin dan Adam. Ta hanyar nazarin bayanai da ba a taɓa samun irinsu ba na kalmomi miliyan 700, jimloli, da batutuwa daga masu amfani da Facebook 75,000, ƙungiyar masu bincike ta fara amfani da hanyar buɗe-kalmomi don fahimtar yadda harshe a kan hanyoyin sadarwar jama'a ke da alaƙa da mahimman halayen ɗan Adam: halin mutum, jinsi, da shekaru. Aikin ya wuce tsoffin hanyoyin nazarin ƙayyadaddun rukunin kalmomi (kamar LIWC) don barin bayanan su kansu su bayyana alamomin harshe waɗanda ke bambanta mutane da ƙungiyoyi.
Babban jigo shi ne cewa manyan bayanan harshe da aka samar a dandamali kamar Facebook suna ba da hangen nesa na musamman ga ilimin halin dan Adam. Binciken ya nuna cewa wannan hanyar da ta dogara da bayanai na iya gano haɗin kai na zahiri (misali, mutanen da ke cikin tsaunuka suna tattauna game da tsaunuka), sake tabbatar da sanannun binciken ilimin halin dan Adam (misali, rashin kwanciyar hankali yana da alaƙa da kalmomi kamar "baƙin ciki"), kuma, mafi mahimmanci, samar da sabbin hasashe game da halayen ɗan Adam waɗanda masu bincike ba su riga sun yi tsammani ba.
2. Hanyoyin Bincike & Bayanai
Ƙarfin hanyoyin binciken na wannan binciken wani muhimmin sashi ne na gudummawar sa. Ya haɗa tattara manyan bayanai tare da sabbin dabarun nazari.
2.1 Tattara Bayanai & Mahalarta
Bayanan suna da girma sosai a lokacinsu:
- Mahalarta: Masu sa kai 75,000.
- Tushen Bayanai: Sabuntawa na matsayi da saƙonnin Facebook.
- Adadin Rubutu: Fiye da saƙonnin miliyan 15.4, wanda ya haifar da misalan harshe miliyan 700 da za a iya nazari (kalmomi, jimloli, batutuwa).
- Ma'aunin Ilimin Halin Dan Adam: Mahalarta sun kammala gwaje-gwajen halin mutum na yau da kullun (misali, Jerin Halayen Biyar Mafi Girma), suna ba da alamun gaskiya don nazari.
2.2 Hanyar Buɗe-Kalmomi
Wannan shine babban ƙirƙira na binciken. Ba kamar hanyoyin rufaffiyar kalmomi waɗanda ke gwada hasashe game da ƙayyadaddun rukunin kalmomi (misali, "kalmomin motsin rai mara kyau") ba, hanyar buɗe-kalmomi tana bincike kuma tana dogara da bayanai. Algorithm ɗin yana duba dukan tarin rubutu don gano kowace fasalin harshe—kalmomi guda ɗaya, jimloli masu kalmomi da yawa, ko batutuwa masu ɓoye—waɗanda ke da alaƙa ta ƙididdiga da ma'aunin da ake nufi (misali, babban rashin kwanciyar hankali). Wannan yana kawar da son zuciya na mai bincike a zaɓin fasali kuma yana ba da damar gano tsarin harshe da ba a zata ba.
2.3 Nazarin Bambancin Harshe (DLA)
DLA ita ce takamaiman aiwatar da hanyar buɗe-kalmomi da aka yi amfani da ita a nan. Tana aiki ta hanyar:
- Cire Fasali: Gano duk n-grams (jerin kalmomi) da batutuwa masu ɓoye ta atomatik daga tarin rubutu.
- Lissafin Haɗin Kai: Ƙididdige ƙarfin haɗin kai tsakanin kowane fasalin harshe da ma'aunin al'umma/ilimin halin dan Adam da ake sha'awar.
- Matsayi & Fassara: Matsayin fasali ta ƙarfin haɗin kawunsu don gano mafi alamun alamomi ga wata ƙungiya ko sifa.
3. Muhimman Bincike & Sakamako
Nazarin ya haifar da cikakkun bayanai masu zurfi game da ilimin halin dan Adam na amfani da harshe.
3.1 Harshe & Halayen Halin Mutum
An sami ƙaƙƙarfan haɗin kai tsakanin harshe da manyan halayen halin mutum guda biyar:
- Rashin Kwanciyar Hankali (Neuroticism): Yana da alaƙa da kalmomi kamar "baƙin ciki," "damuwa," da jimloli kamar "gaji da," yana nuna mai da hankali kan motsin rai mara kyau da matsaloli.
- Fitarwa (Extraversion): Yana da alaƙa da kalmomin zamantakewa ("biki," "mai ban sha'awa," "soyayya"), kukan ("haha," "woo"), da nassoshi ga abubuwan zamantakewa.
- Buɗewa ga Kwarewa (Openness to Experience): Yana da alaƙa da kalmomi na ado da hankali ("fasaha," "falsafa," "duniya"), da amfani da ƙaƙƙarfan ƙamus.
- Jin Daɗi (Agreeableness): Ana yi masa alama da harshen taimakon juna ("mu," "na gode," "mai ban mamaki") da ƙarancin amfani da kalmomin rantsuwa.
- Hankali (Conscientiousness): Yana da alaƙa da kalmomin da suka dace da nasara ("aiki," "shiri," "nasarar") da ƙarancin nassoshi ga gamsuwa nan take (misali, "yau da dare," "sha").
3.2 Bambance-bambancen Harshe na Jinsi
Binciken ya tabbatar da kuma inganta sanannun bambance-bambancen jinsi:
- Mata sun fi amfani da kalmomin motsin rai, kalmomin zamantakewa, da karin magana ("ni," "kai," "mu").
- Maza sun fi amfani da nassoshin abubuwa, kalmomin rantsuwa, da batutuwa marasa mutumci (wasanni, siyasa).
- Mahimmin Fahimta: Maza sun fi yin amfani da kalmar mallaka "na" lokacin da suka ambaci "matar" ko "budurwa," yayin da mata ba su nuna irin wannan tsarin tare da "miji" ko "saurayi" ba. Wannan yana nuna bambance-bambance masu zurfi a cikin bayyana mallakar dangantaka.
3.3 Tsarin Harshe Masu Alaƙa da Shekaru
Amfani da harshe ya canza bisa tsari tare da shekaru:
- Matasa manya: Ƙarin nassoshi ga ayyukan zamantakewa, rayuwar dare, da fasaha ("waya," "yanar gizo").
- Tsofaffi manya: Ƙara tattaunawa game da iyali, lafiya, da al'amuran da suka shafi aiki. Ƙarin amfani da kalmomin motsin rai masu kyau gabaɗaya.
- Binciken ya yi daidai da ka'idar zaɓin zamantakewa da motsin rai, wanda ke nuna canji a cikin fifikon abubuwan motsa jiki tare da shekaru.
4. Cikakkun Bayanai na Fasaha & Tsarin Aiki
4.1 Tushen Lissafi
Babban jigon DLA ya ƙunshi ƙididdige bayanan haɗin kai na baki ɗaya (PMI) ko ma'aunin haɗin kai tsakanin fasalin harshe $f$ (misali, kalma) da sifa mai ma'ana biyu ko ci gaba $a$ (misali, maki na jinsi ko rashin kwanciyar hankali). Don sifa mai ma'ana biyu:
$PMI(f, a) = \log \frac{P(f, a)}{P(f)P(a)}$
Inda $P(f, a)$ shine haɗin yuwuwar fasali da sifa suna faruwa tare (misali, kalmar "mai ban sha'awa" ta bayyana a cikin saƙonnin wanda ya fi fita), kuma $P(f)$ da $P(a)$ su ne yuwuwar gefe. Ana sanya fasali a matsayi ta makin PMI ko haɗin kai don gano mafi alamun alamomi na ƙungiyar $a$.
Don ƙirar batutuwa, wanda da alama an yi amfani da shi don samar da "misalan batutuwa," an yi amfani da dabarun kamar Rarraba Latent Dirichlet (LDA). LDA tana ƙirƙira kowane takarda a matsayin cakuda batutuwa $K$, kuma kowane batu a matsayin rarraba akan kalmomi. Yuwuwar kalma $w$ a cikin takarda $d$ ana bayar da ita ta:
$P(w|d) = \sum_{k=1}^{K} P(w|z=k) P(z=k|d)$
inda $z$ ma'aunin batu ne mai ɓoye. Waɗannan batutuwan da aka gano sai su zama fasali a cikin DLA.
4.2 Misalin Tsarin Nazari
Harka: Gano Alamomin Harshe na Babban Hankali
- Shirya Bayanai: Rarraba mahalarta 75,000 zuwa ƙungiyoyi biyu bisa raba matsakaicin makin Hankali (High-C vs. Low-C).
- Samar da Fasali: Sarrafa duk saƙonnin Facebook don cire:
- Unigrams (kalmomi guda ɗaya): "aiki," "shiri," "kammala."
- Bigrams (jimloli masu kalmomi biyu): "aikina," "mako mai zuwa," "yi."
- Batutuwa (ta hanyar LDA): misali, Topic 23: {aiki: 0.05, aikin: 0.04, ƙayyadaddun lokaci: 0.03, ƙungiya: 0.02, ...}.
- Gwajin Ƙididdiga: Ga kowane fasali, yi gwajin chi-squared ko ƙididdige PMI don kwatanta yawan sa a cikin ƙungiyar High-C da ƙungiyar Low-C.
- Fassara Sakamako: Matsayin fasali ta ƙarfin haɗin kawunsu. Manyan fasali na High-C na iya haɗawa da "aiki," "shiri," "kammala," bigram "burina," da manyan lodin akan batutuwan LDA masu alaƙa da tsari da nasara. Waɗannan fasalin tare suna zana hoton da ya dogara da bayanai na sawun harshe na mutane masu hankali.
5. Sakamako & Hoto na Bayanai
Yayin da ainihin PDF na iya ƙunshe da adadi, ana iya fahimtar sakamakon ta hanyar mahimman hotuna:
- Girgije na Kalmomi/Zane-zane na Bar don Halaye: Hotuna da ke nuna manyan kalmomi 20-30 waɗanda suka fi alaƙa da kowane babban hali na Halin Mutum Biyar. Misali, zane-zane na Fitarwa zai nuna manyan sanduna masu yawan maimaitawa don "biki," "soyayya," "mai ban sha'awa," "lokaci mai kyau."
- Taswirar Zafi na Kwatancen Jinsi: Matrix da ke nuna bambancin amfani da rukunin kalmomi (motsin rai, zamantakewa, abu) ta maza da mata, yana nuna bambance-bambance masu ƙarfi.
- Makasudin Tsarin Shekaru: Zane-zanen layi da ke nuna yadda yawan yawan wasu rukunin kalmomi (misali, kalmomin zamantakewa, kalmomin da suka dace da gaba, kalmomin lafiya) ke canzawa a matsayin aikin shekarun mahalarta.
- Cibiyar Sadarwar Haɗin Kai: Zanen hanyar sadarwa da ke haɗa halayen halin mutum zuwa gungu na kalmomi da jimloli masu alaƙa, yana nuna a zahiri cikakken taswirar tsakanin ilimin halin dan Adam da ƙamus.
Girman tabbatarwa shine babban sakamako: tsarin da aka lura a cikin misalan harshe miliyan 700 yana ba da ƙarfin ƙididdiga mai ƙarfi da ƙarfi.
6. Ra'ayi na Mai Bincike Mai Zurfi
Babban Fahimta: Takardar Schwartz et al. na 2013 ba bincike kawai ba ne; canjin tsari ne. Ya yi nasara ya yi amfani da "manyan bayanai" na hanyoyin sadarwar jama'a don kai hari ga matsala ta asali a ilimin halin dan Adam—auna gine-ginen ɓoye kamar halin mutum ta hanyar halayen da ake iya gani. Babban fahimta shine cewa sharar mu ta dijital wani rubutun halayya ne mai inganci, na cikin mu. Takardar ta tabbatar da cewa ta hanyar amfani da ruwan tabarau mai ƙarfi, marar son zuciya (nazarin buɗe-kalmomi), za ku iya fassara wannan rubutun tare da daidaito mai ban mamaki, wucewa ta hanyar ra'ayoyi don bayyana ƙananan, sau da yawa masu saɓani, sa hannun harshe.
Tsarin Hankali: Hankali yana da ƙarfi sosai: 1) Sami babban tarin rubutu na ainihin duniya da ke da alaƙa da bayanan ma'aunin ilimin halin dan Adam na ma'auni na zinariya (Facebook + gwaje-gwajen halin mutum). 2) Yi watsi da takunkumin ka'idar ƙamus da aka riga aka ƙayyade. 3) Bari algorithms na koyon inji su bincika duk yanayin harshe don alamun ƙididdiga. 4) Fassara mafi ƙarfin sigina, waɗanda suka kama daga bayyananne (mutanen da ba su da kwanciyar hankali suna cewa "baƙin ciki") zuwa ga ƙanƙanta mai haske (amfani da karin magana na mallaka na jinsi). Gudun daga girman bayanai zuwa ƙirƙira na hanyoyin zuwa sabon ganowa yana da gamsarwa kuma ana iya maimaitawa.
Ƙarfi & Kurakurai: Babban ƙarfinsa shine ikon bincikensa. Ba kamar aikin rufaffiyar kalmomi ba (misali, amfani da LIWC), wanda zai iya tabbatarwa ko ƙaryata hasashe da aka riga aka samu kawai, wannan hanyar tana samar da hasashe. Injin ganowa ne. Wannan ya yi daidai da ka'idar da ta dogara da bayanai da aka yi amfani da ita a fagage kamar hangen nesa na kwamfuta, kamar yadda aka gani a cikin gano fasalin hoto ba tare da kulawa ba a cikin ayyuka kamar takardar CycleGAN (Zhu et al., 2017), inda samfurin ya koyi wakilci ba tare da babban lakabin ɗan adam ba. Duk da haka, aibin shine hoton madubi na ƙarfinsa: haɗarin fassara. Gano haɗin kai tsakanin "snowboarding" da ƙarancin rashin kwanciyar hankali baya nufin snowboarding yana haifar da kwanciyar hankali; yana iya zama haɗin kai mara tushe ko yana nuna ma'auni na uku (shekaru, yanayin ƙasa). Takardar, yayin da ta san haka, ta buɗe kofa ga wuce gona da iri. Bugu da ƙari, dogaro da bayanan Facebook daga 2013 yana tayar da tambayoyi game da yaduwa zuwa wasu dandamali (Twitter, TikTok) da harshen yanar gizo na zamani.
Fahimta Mai Aiki: Ga masu bincike, umarni a bayyane yake: karɓi hanyoyin buɗe-kalmomi a matsayin kayan aiki mai dacewa ga binciken da ke jagorantar ka'ida. Yi amfani da shi don samar da hasashe, sannan a tabbatar da shi tare da ingantaccen bincike. Ga masana'antu, abubuwan da ke tattare da su suna da yawa. Wannan hanyar ita ce ginshiƙin ƙirar ilimin halin dan Adam na zamani don tallan da aka yi niyya, shawarwarin abun ciki, har ma da kimanta haɗari (misali, a cikin inshora ko kuɗi). Fahimtar da za a iya aiwatarwa ita ce gina irin wannan bututun don bayanan rubutun ku na mallakar mallaka—bitar abokan ciniki, tikitin tallafi, hanyoyin sadarwa na ciki—don gano ɓangarorin ɓoye da masu hasashen halayya. Duk da haka, ci gaba da taka tsantsan ta ɗabi'a. Ƙarfin yin hasashen halayen halin dan Adam daga harshe takobi ne mai kaifi biyu, yana buƙatar ingantattun tsare-tsaren gudanarwa don hana yin amfani da su da son zuciya, damuwa da aka bayyana a cikin sukar masu bincike daga Cibiyar AI Now da sauransu.
7. Aikace-aikace na Gaba & Jagorori
Tsarin buɗe-kalmomi da aka kafa a nan ya haifar da hanyoyin bincike da aikace-aikace da yawa:
- Rarraba Lafiyar Hankali: Haɓaka kayan aikin tantancewa na harshe a kan hanyoyin sadarwar jama'a don gano mutanen da ke cikin haɗarin baƙin ciki, damuwa, ko tunanin kashe kansa, yana ba da damar shiga tsakani da wuri.
- Ilimi Na Musamman & Horarwa: Daidaita abun cikin ilimi, shawarwarin sana'a, ko horon walwala bisa alamomin harshe na halin mutum da salon koyo da aka ƙaddara daga rubutun mai amfani.
- Ƙimar Halin Mutum Mai Ƙarfi: Ƙetare gwaje-gwaje masu tsayi zuwa ci gaba da yawa, kimanta yanayin halin mutum da canje-canje akan lokaci ta hanyar nazarin imel, saƙon, ko salon rubuta takardu.
- Ilimin Halin Dan Adam na Al'adu Daban-daban: Yin amfani da DLA ga bayanan hanyoyin sadarwar jama'a a cikin harsuna daban-daban don gano waɗanne haɗin kai na halin mutum-harshe ne na duniya kuma waɗanne suke na musamman na al'ada.
- Haɗawa tare da Bayanai Masu Nau'i Daban-daban: Gaba gaba shine haɗa nazarin harshe tare da sauran sawun dijital—zaɓin hoto, tarihin sauraron kiɗa, tsarin hanyar sadarwar jama'a—don ƙirƙirar ƙirar ilimin halin dan Adam mai arziƙi, nau'i-nau'i daban-daban, jagora da aka gani a cikin aikin daga baya daga Ayyukan Jin Dadi na Duniya da sauransu.
- AI na ɗabi'a & Kawar da Son Zuƙa: Yin amfani da waɗannan dabarun don duba da rage son zuciya a cikin tsarin AI. Ta hanyar fahimtar yadda samfuran harshe zasu iya haɗa wasu yaruka ko tsarin magana tare da halaye na al'ada, masu haɓakawa za su iya aiki don kawar da bayanan horo da algorithms.
8. Nassoshi
- Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Halin mutum, jinsi, da shekaru a cikin harshen sadarwar jama'a: Hanyar buɗe-kalmomi. PLoS ONE, 8(9), e73791.
- Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). Haɓakawa da kaddarorin ilimin halin dan Adam na LIWC2015. Jami'ar Texas a Austin.
- Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Fassarar hoto zuwa hoto mara haɗin gwiwa ta amfani da hanyoyin sadarwar adawa na zagaye. A cikin Proceedings of the IEEE international conference on computer vision (shafi na 2223-2232). (An ambata a matsayin misalin gano fasalin da ba a kulawa ba, wanda ya dogara da bayanai a wani yanki).
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Rarraba Latent Dirichlet. Journal of machine Learning research, 3(Jan), 993-1022. (Dabarar ƙirar batutuwa ta asali).
- Cibiyar AI Now. (2019). Nakasa, Son Zuƙa, da AI. Jami'ar New York. (Don ra'ayoyi masu mahimmanci game da ɗabi'a da son zuciya a cikin ƙirar algorithm).
- Eichstaedt, J. C., et al. (2021). Harshen Facebook yana hasashen baƙin ciki a cikin bayanan likita. Proceedings of the National Academy of Sciences, 118(9). (Misalin aikin da aka yi amfani da shi a cikin lafiyar hankali).