Reading.help: An LLM-Powered Intelligent Reading Assistant for EFL Learners

1. Introduction

English dominates global academic, professional, and social communication, yet millions of readers for whom English is a Foreign Language (EFL) struggle with comprehension. Traditional resources like formal education or full-text translation tools (e.g., Google Translate) are often inaccessible, costly, or counterproductive for learning. Reading.help addresses this gap by proposing an intelligent reading assistant that leverages Natural Language Processing (NLP) and Large Language Models (LLMs) to provide proactive and on-demand explanations of grammar and semantics, aiming to foster independent reading skills among EFL learners with university-level proficiency.

2. System Design & Methodology

2.1. The Reading.help Interface

The interface (Fig. 1) is designed for clarity and utility. Key components include: (A) Content summaries, (B) Adjustable summary levels (concise/detailed), (C) Contextual support tools triggered by text selection, (D) A tools menu offering Lexical Terms, Comprehension, and Grammar assistance, (E) Proactive identification of challenging content per paragraph, (F) Vocabulary explanations with definitions and context, (G) A two-LLM validation pipeline for explanation quality, and (H) Visual highlighting linking suggestions to the original text.

2.2. Core Modules: Identification & Explanation

The system is built on two specialized modules:

Identification Module: Detects potentially difficult words, phrases, and syntactic structures for EFL readers using a combination of rule-based heuristics (e.g., low-frequency vocabulary, complex sentence length) and a fine-tuned neural model.
Explanation Module: Generates clarifications for vocabulary, grammar, and overall context. It uses an LLM (like GPT-4) prompted with specific instructions for EFL-level explanations, ensuring clarity and pedagogical value.

2.3. LLM Validation Pipeline

A critical innovation is the dual-LLM validation process. The first LLM generates an explanation. A second, separate LLM acts as a validator, assessing the first LLM's output for factual accuracy, relevance, and appropriateness for the target EFL level. This process, inspired by techniques like self-consistency and chain-of-thought verification seen in advanced AI research, aims to mitigate hallucinations and improve reliability—a common concern in educational applications of LLMs.

3. Case Study & Evaluation

3.1. Study with South Korean EFL Readers

The development followed a human-centered design process. An initial prototype was tested with 15 South Korean EFL readers. Feedback focused on interface usability, explanation clarity, and the perceived helpfulness of proactive suggestions. This feedback directly informed the revisions leading to the final Reading.help system.

3.2. Results & User Feedback

A final evaluation was conducted with 5 EFL readers and 2 EFL education professionals. Qualitative findings suggested that:

Users appreciated the on-demand explanations for specific confusing elements.
The proactive highlights helped direct attention to areas of potential difficulty before confusion arose.
Participants reported increased confidence in parsing complex sentences independently.
Professionals saw potential for the tool as a supplementary self-learning aid outside the classroom.

The study concluded that Reading.help could help bridge the gap when access to human tutors is limited.

Initial User Study

EFL Readers (South Korea)

Final Evaluation

Participants (5 Readers + 2 Pros)

Core Modules

Identification & Explanation

4. Technical Implementation

4.1. NLP & LLM Architecture

The system employs a pipeline architecture. The text is first processed through the identification module, which uses features like:

Word frequency (e.g., against the Corpus of Contemporary American English).
Syntactic parse tree depth.
Presence of idiomatic expressions or cultural references.

Annotated text segments are then passed to the explanation module, powered by a prompting-engineered LLM. The prompt includes context (the surrounding paragraph), the target segment, and instructions to generate an explanation suitable for a university-educated non-native speaker.

4.2. Mathematical Formulation for Difficulty Scoring

The identification module assigns a composite difficulty score $D_s$ to a text segment $s$ (e.g., a sentence or phrase). This score is a weighted sum of normalized feature values: $$D_s = \sum_{i=1}^{n} w_i \cdot f_i(s)$$ Where:

$f_i(s)$ is the normalized value (between 0 and 1) of feature $i$ for segment $s$ (e.g., inverse document frequency (IDF) for vocabulary rarity, parse tree depth).
$w_i$ is the learned weight for feature $i$, reflecting its importance in predicting EFL reader difficulty, potentially derived from user study data.
$n$ is the total number of features.

Segments with $D_s$ exceeding a calibrated threshold are proactively highlighted by the system.

5. Results & Discussion

5.1. Key Performance Metrics

While the paper emphasizes qualitative findings, implied metrics for success include:

Reduction in External Look-ups: Users relied less on separate dictionary or translation apps.
Increased Comprehension Accuracy: Measured via post-reading quizzes on tool-assisted vs. non-assisted texts.
User Satisfaction & Perceived Usefulness: High ratings in post-study questionnaires.
Explanation Validation Accuracy: The percentage of LLM-generated explanations deemed "correct and helpful" by the second validator LLM and/or human evaluators.

5.2. Chart: Comprehension Improvement vs. Tool Usage

Figure 2 (Conceptual): Comprehension Score by Condition. A bar chart comparing average comprehension scores across three conditions: 1) Reading without any aid (Baseline), 2) Reading with a full-text translator, and 3) Reading with Reading.help. The hypothesis, supported by user feedback, is that Reading.help would yield scores significantly higher than the baseline and comparable to or better than translation, while promoting deeper engagement with the English text rather than bypassing it.

Key Insights

Proactive + On-Demand is Key: Combining both assistance modes caters to different reader needs and moments of confusion.
LLMs Need Guardrails for Education: The dual-LLM validation is a pragmatic step towards reliable, pedagogical AI output.
Targets the "Independent Learner" Gap: Effectively addresses the need for scalable support between formal classes and full automation (translation).
Human-Centered Design is Non-Negotiable: Iterative testing with real EFL users was crucial for refining tool usefulness.

6. Analysis Framework & Case Example

Framework: The tool's efficacy can be analyzed through the lens of Cognitive Load Theory. It aims to reduce extraneous cognitive load (the effort spent searching for definitions or parsing grammar) by providing integrated explanations, thereby freeing up mental resources for germane cognitive load (deep comprehension and learning).

Case Example (No Code): Consider an EFL reader encountering this sentence in a news article: "The central bank's hawkish stance, intended to curb inflation, has sent ripples through the bond market."

Identification: The system highlights "hawkish stance," "curb inflation," and "sent ripples through" as potentially challenging (low-frequency finance idiom, metaphorical phrase).
On-Demand Explanation (User clicks on 'hawkish stance'): The Lexical Terms tool explains: "In economics, 'hawkish' describes a policy focused aggressively on controlling inflation, even if it raises interest rates. A 'stance' is a position or attitude. So, a 'hawkish stance' means the bank is taking a strong, aggressive position against inflation."
Proactive Comprehension Aid: The Comprehension tool for the paragraph might summarize: "This paragraph explains that the central bank's aggressive actions to fight inflation are causing noticeable effects in the bond market."

This integrated support helps decode jargon and metaphor without removing the reader from the original English context.

7. Future Applications & Research Directions

Personalization: Adapting difficulty identification and explanation depth to individual user's proven proficiency level and learning history.
Multimodal Input: Extending support to audio (podcasts) and video (lectures) with synchronized text and explanation.
Gamification & Long-Term Learning Tracking: Incorporating spaced repetition for vocabulary learned through the tool and tracking progress over time.
Broader Language Pairs: Applying the same framework to support readers of other dominant languages (e.g., Mandarin, Spanish) as a foreign language.
Integration with Formal Learning Management Systems (LMS): Becoming a plug-in for platforms like Moodle or Canvas to assist students with course readings.
Advanced Explainable AI (XAI): Making the identification model's reasoning more transparent (e.g., "This sentence is highlighted because it contains a passive voice construction and a low-frequency noun phrase").

8. References

Chung, S., Jeon, H., Shin, S., & Hoque, M. N. (2025). Reading.help: Supporting EFL Readers with Proactive and On-Demand Explanation of English Grammar and Semantics. arXiv preprint arXiv:2505.14031v2.
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017).
Brown, T., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (NeurIPS 2020).
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285.
Google AI. (2023). Best practices for prompting and evaluating large language models. Retrieved from [Google AI Blog].
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.

9. Expert Analysis: Core Insight, Logical Flow, Strengths & Flaws, Actionable Insights

Core Insight: Reading.help isn't just another translation wrapper; it's a targeted intervention in the cognitive processhybrid proactive/reactive assistance model coupled with a validation mechanism for LLM outputs. This positions it not as a crutch (like full translation), but as a "cognitive scaffold"—a concept well-supported by educational theory like Vygotsky's Zone of Proximal Development. It acknowledges that the goal for proficient learners isn't just understanding this text, but building the skills to understand the next one independently.

Logical Flow: The paper's logic is sound and practitioner-focused: 1) Identify a real, underserved market (independent adult EFL learners), 2) Diagnose the failure of existing solutions (translation promotes dependency, dictionaries lack context), 3) Propose a novel technical architecture (identification + explanation + validation) directly addressing those failures, 4) Validate through iterative, human-centered testing. This is a textbook example of applied HCI research with clear product-market fit logic.

Strengths & Flaws:

Strengths: The dual-LLM validation is a pragmatic and necessary hack in today's hallucination-prone AI landscape. The focus on paragraph-level comprehension aids, not just word lookup, is pedagogically astute. The choice of target user (university-level) is smart—they have the base grammar/vocabulary to benefit most from nuanced semantic and syntactic support.
Glaring Flaws/Omissions: The evaluation is dangerously light on quantitative, longitudinal data. Does tool use actually improve long-term reading proficiency, or just immediate comprehension? The paper is silent. The "identification module" is described as a "specialized neural model," but its architecture, training data, and accuracy metrics are opaque—a major red flag for technical credibility. Furthermore, it ignores the potential for automation bias; users might uncritically accept LLM explanations, especially after the validator gives a false sense of security.

Actionable Insights:

For Researchers: The next step must be a rigorous, controlled longitudinal study measuring retention and skill transfer. Also, open-source the identification model architecture and benchmark it against standard readability metrics (e.g., Flesch-Kincaid) to establish technical credibility.
For Product Developers: This framework is ripe for commercialization. The immediate product roadmap should focus on personalization (the biggest missing piece) and seamless browser/PDF integration. Consider a freemium model with basic highlights and a premium tier with advanced grammar decomposition and personalized vocabulary decks.
For Educators: Pilot this tool as a mandatory support for intensive reading assignments in university EFL courses. Use it to generate discussion by having students compare the AI's explanation with their own inferences, turning the tool into a debate partner rather than an oracle.

In conclusion, Reading.help presents a compelling blueprint for the next generation of language learning aids. It correctly identifies the limitations of brute-force translation and moves towards a more nuanced, assistive intelligence. However, its current evidence is more suggestive than conclusive. Its success will hinge not on fancier LLMs, but on robust, transparent evaluation and a deep commitment to the long-term learning outcomes of its users.