Select Language

The Polish Vocabulary Size Test: A Novel Adaptive Test for Receptive Vocabulary Assessment

Pilot results of an adaptive, IRT-based Polish Vocabulary Size Test (PVST) for native and non-native speakers, addressing limitations of traditional tests like VST and LexTale.
learn-en.org | PDF Size: 0.6 MB
Rating: 4.5/5
Your Rating
You have already rated this document
PDF Document Cover - The Polish Vocabulary Size Test: A Novel Adaptive Test for Receptive Vocabulary Assessment

1. Table of Contents

2. Introduction

Vocabulary size is a cornerstone of language proficiency, influencing reading comprehension, listening efficiency, and speed of word recognition. The Polish Vocabulary Size Test (PVST) introduces a novel adaptive approach based on Item Response Theory (IRT) to assess receptive vocabulary in both native and non-native Polish speakers. This pilot study aims to validate PVST as a reliable, time-efficient tool that overcomes the limitations of traditional fixed-item tests like the Vocabulary Size Test (VST) and LexTale.

3. Literature Review

3.1 Vocabulary Size Tests

Traditional tests such as the VST (Nation & Beglar, 2007) and LexTale (Lemhöfer & Broersma, 2012) are widely used but suffer from issues like score inflation due to guessing, lack of replication, and poor discrimination among proficiency levels. The VST uses multiple-choice synonym recognition, while LexTale employs lexical decision tasks. Both have been adapted to multiple languages but show critical flaws in reliability and validity.

3.2 Computerized Adaptive Testing (CAT)

CAT, grounded in IRT, dynamically selects items based on the test-taker's previous responses, increasing precision and reducing test length. Golovin (2015) developed an Adaptive online Vocabulary Size Test (AoVST) for Russian, which demonstrated strong validity and a nonlinear relationship between vocabulary and age. PVST builds on this methodology for Polish.

4. Methodology

4.1 Test Design and Item Selection

PVST uses a bank of 500 Polish words calibrated using the Rasch model. Items are selected adaptively based on the test-taker's estimated ability, with each response updating the ability estimate via maximum likelihood estimation. The test terminates when the standard error of the estimate falls below 0.3 logits.

4.2 Participants and Procedure

A sample of 1,200 participants (800 native Polish speakers, 400 non-native learners) completed the PVST online. Native speakers ranged from ages 18 to 70, while non-natives had at least B1 proficiency. The test took an average of 12 minutes to complete.

5. Results

5.1 Vocabulary Size Distribution

Native speakers showed a mean receptive vocabulary of 45,000 words (SD = 8,200), while non-natives averaged 18,000 words (SD = 5,400). The distribution for natives was positively skewed, with younger adults (18-30) scoring higher than older adults (60+).

5.2 Age and Vocabulary Correlation

A significant nonlinear correlation was found between age and vocabulary size for native speakers (R² = 0.34, p < 0.001), with vocabulary peaking in the 25-35 age range and declining gradually after 50. This aligns with findings from Keuleers et al. (2015) for Dutch.

6. Discussion

PVST successfully distinguishes native from non-native speakers and captures age-related vocabulary trends. Its adaptive nature reduces test time by 40% compared to fixed-length tests while maintaining high reliability (Cronbach's α = 0.92). The test addresses key criticisms of VST and LexTale by minimizing guessing effects and providing more precise ability estimates.

7. Original Analysis

The PVST represents a significant methodological advancement in vocabulary assessment, leveraging IRT-based adaptive testing to address long-standing issues of test efficiency and accuracy. Unlike traditional fixed-item tests, which often inflate scores due to guessing (Coxhead et al., 2014), PVST's adaptive algorithm tailors item difficulty to the individual, reducing measurement error. This approach is supported by research on CAT in educational testing, which shows that adaptive tests can achieve the same precision as fixed tests with 50% fewer items (Weiss, 2011). The strong correlation between age and vocabulary size in native speakers (R² = 0.34) mirrors patterns observed in large-scale studies of English (Brysbaert et al., 2016) and Dutch (Keuleers et al., 2015), confirming that vocabulary growth plateaus in early adulthood and declines in later years. However, the PVST's reliance on a single word recognition format may not capture depth of vocabulary knowledge, a limitation noted by Read (2023). Future iterations could incorporate multiple response formats, such as meaning recall or contextual usage, to provide a more holistic assessment. The test's potential for cross-linguistic adaptation is promising, as the underlying IRT framework is language-agnostic, similar to the approach used in the Russian AoVST (Golovin, 2015). From a practical standpoint, PVST offers educators and researchers a rapid, reliable tool for placement testing and longitudinal studies, with potential applications in clinical settings for assessing language decline in aging populations. The integration of machine learning models to refine item calibration could further enhance predictive validity, as demonstrated in recent adaptive language assessments (Bohn et al., 2024). Overall, PVST sets a new standard for vocabulary testing in Slavic languages and provides a replicable model for other under-resourced languages.

8. Technical Details

The PVST uses the Rasch model for item calibration, where the probability of a correct response is given by:

$P(X_{ij}=1|\theta_i, b_j) = \frac{e^{(\theta_i - b_j)}}{1 + e^{(\theta_i - b_j)}}$

where $\theta_i$ is the ability of person $i$ and $b_j$ is the difficulty of item $j$. The test uses a Bayesian adaptive algorithm to select the next item that maximizes information at the current ability estimate. The stopping rule is based on the standard error of $\theta$, set at SE < 0.3 logits.

9. Experimental Results and Figures

Figure 1: Vocabulary size distribution for native (blue) and non-native (red) speakers. Native speakers show a broader range (20,000-70,000 words) with a peak around 45,000, while non-natives cluster between 10,000-30,000 words.

Figure 2: Scatter plot of age vs. vocabulary size for native speakers, with a loess smooth curve showing a peak at age 30 and gradual decline after 55. The nonlinear fit (R² = 0.34) indicates that age accounts for 34% of variance in vocabulary size.

Table 1: Comparison of test characteristics: PVST (12 min, 30 items avg, α=0.92) vs. VST (25 min, 140 items, α=0.88) vs. LexTale (15 min, 60 items, α=0.85). PVST shows superior efficiency and reliability.

10. Analytical Framework Example

Case Study: Using PVST in a University Placement Test

A university administers PVST to 200 incoming international students. The test identifies 30 students with vocabulary below 15,000 words, recommending them for a preparatory language course. After one semester, a retest shows an average gain of 4,200 words, confirming the test's sensitivity to instruction. The adaptive algorithm ensures that each student sees items appropriate to their level, reducing frustration and test fatigue.

11. Future Applications and Directions

PVST can be extended to assess productive vocabulary by incorporating a typing-based recall component. Integration with natural language processing (NLP) models could enable real-time analysis of vocabulary use in writing tasks. Future versions may include multimedia stimuli (audio, images) to assess multimodal vocabulary knowledge. Cross-linguistic adaptations for other Slavic languages (e.g., Czech, Ukrainian) are planned, using the same IRT framework. In clinical neuropsychology, PVST could serve as a screening tool for language decline in dementia, given its sensitivity to age-related vocabulary changes.

12. References

13. Expert Commentary

Core Insight: The PVST is not just another vocabulary test—it's a paradigm shift from static, one-size-fits-all assessments to dynamic, personalized measurement. By leveraging IRT, it solves the guessing problem that plagues multiple-choice tests and delivers a precision that fixed tests can only dream of.

Logical Flow: The authors correctly identify the flaws in VST and LexTale (score inflation, lack of replication) and propose CAT as the logical alternative. The pilot data convincingly shows that PVST is faster, more reliable, and more sensitive to age effects. The progression from problem identification to solution to validation is textbook-perfect.

Strengths & Flaws: The biggest strength is the adaptive algorithm—it cuts test time by 40% while boosting reliability. The age-vocabulary correlation (R²=0.34) is robust and aligns with prior work. However, the test only measures receptive vocabulary depth via a single format (word recognition). This is a narrow slice of lexical competence. Also, the sample of 1,200 is decent but not massive; the test needs validation on larger, more diverse populations, including clinical groups.

Actionable Insights: For researchers: Use PVST for longitudinal studies of vocabulary growth—its precision will detect small effect sizes. For educators: Adopt PVST for placement testing; it's faster and more accurate than paper-based tests. For test developers: Expand PVST to include productive and contextual measures, and explore NLP integration for automated item generation. The future is adaptive—don't get left behind with static tests.