1. Introduction
This study investigates how English as a Foreign Language (EFL) students utilize Natural Language Generation (NLG) tools for idea generation in creative writing. Writing is a fundamental skill for communication and academic success, particularly challenging for EFL learners. Creative writing offers unique benefits, including personal knowledge construction and meaningful insight development. The integration of AI-powered NLG tools presents new opportunities and challenges in educational contexts.
The research addresses a significant gap in understanding how EFL students interact with NLG tools during the creative process, specifically examining their strategies for searching, evaluating, and selecting ideas generated by these tools.
2. Methodology
The study employed a qualitative research design with four secondary school students in Hong Kong. Participants attended workshops where they learned to write stories using both their own words and NLG-generated content. Following the workshops, students completed written reflections about their experiences.
Data analysis utilized thematic analysis to identify patterns and strategies in students' interactions with NLG tools. The focus was on three main areas: search strategies, evaluation methods, and tool selection criteria.
3. Results & Findings
3.1 Idea Search Strategies
Students demonstrated that they often approached NLG tools with pre-existing ideas or thematic directions. Rather than using the tools for completely open-ended inspiration, they employed them to expand, refine, or find variations on initial concepts. This suggests a guided rather than exploratory search behavior.
3.2 Idea Evaluation
A notable finding was students' aversion or skepticism toward ideas generated solely by NLG tools. They showed a preference for blending AI-generated content with their own original thoughts, indicating a desire to maintain authorship and creative control. Evaluation criteria included relevance, originality (perceived human-like quality), and coherence with their intended narrative.
3.3 Tool Selection Criteria
When choosing between different NLG tools or prompts, students tended to favor options that produced a larger quantity of ideas. This "quantity over initial quality" approach provided them with a broader pool of raw material to sift through and adapt, aligning with the brainstorming phase of creative writing.
4. Discussion
The findings reveal that EFL students use NLG tools not as autonomous idea creators but as collaborative partners or idea amplifiers. The observed aversion to purely AI-generated content highlights the importance of student agency in creative processes. These insights are crucial for educators seeking to effectively integrate AI tools into writing curricula, emphasizing the need for pedagogical strategies that teach critical evaluation and synthesis of AI-generated content.
The study underscores the potential of NLG tools to lower the cognitive load associated with idea generation in a second language, potentially reducing writer's block and increasing engagement.
5. Technical Framework & Analysis
Core Insight: This paper isn't about building a better NLG model; it's a crucial human-computer interaction (HCI) study that exposes the "last-mile problem" in educational AI. The real bottleneck isn't the AI's ability to generate text—modern models like GPT-4 are proficient at that. The challenge is the user's, particularly an EFL learner's, ability to effectively prompt, critically evaluate, and creatively integrate that output. The study reveals that students use NLG not as an oracle but as a brainstorming partner, favoring tools that produce high-volume, low-commitment ideas they can sift through—a behavior mirroring how writers use traditional inspiration boards.
Logical Flow: The research logic is sound but limited. It correctly identifies the gap between NLG capability and pedagogical application. It moves from observing behavior (students using tools) to inferring strategy (guided search, evaluative aversion). However, it stops short of a robust theoretical framework. It hints at concepts like cognitive load theory (NLG reducing effort in L2 ideation) and Vygotsky's Zone of Proximal Development (AI as a scaffold), but doesn't explicitly ground the findings in them, missing an opportunity for deeper explanatory power.
Strengths & Flaws: The strength is its grounded, qualitative approach with real students in an authentic learning context—a rarity in early EdTech AI research often dominated by technical proofs-of-concept. The major flaw is scale. With only four participants, the findings are suggestive, not generalizable. It's a compelling pilot study, not a definitive guide. Furthermore, it treats "NLG tools" as a monolith without dissecting differences between template-based, prompt-driven, or fine-tuned models, which would significantly impact user strategy. Compared to foundational works like the CycleGAN paper (Zhu et al., 2017), which presented a novel technical architecture with clear, measurable outcomes, this study's contribution is sociological rather than algorithmic.
Actionable Insights: For educators: Don't just drop an AI tool into the classroom. Design structured activities that teach "prompt literacy"—how to ask the AI productive questions—and "output triage"—how to critically assess and hybridize AI suggestions. For developers: Build NLG tools for education with interfaces that support iterative refinement (e.g., "generate more like this," "simplify language," "make it darker") and metadata explaining why the AI made certain suggestions, moving beyond black-box generation. The future isn't in more fluent AI, but in more pedagogically intelligent human-AI collaboration frameworks.
Technical Details & Mathematical Formulation
The core process can be abstracted. Let a student's internal idea state be represented as a vector Is. An NLG tool, based on a prompt p, generates a set of idea variants {Iai,1, Iai,2, ..., Iai,n}. The student's evaluation and selection function feval operates on these, often seeking to minimize a distance metric d(Is, Iai) while maximizing a novelty measure N(Iai). The final adopted idea is a fusion: Ifinal = g(Is, Iai,selected), where g is a student-specific composition function.
The study's finding about quantity preference suggests students are optimizing for a higher probability of finding an Iai where d(Is, Iai) < θ (a personal threshold), hence preferring tools with larger n.
Analysis Framework Example Case
Scenario: An EFL student wants to write a story about "a lost robot in a forest."
Without Structured Framework:
Student prompts NLG: "Write a story about a robot lost in a forest." Gets one long, generic story. Student feels overwhelmed or uninspired, dislikes the AI's voice.
With a Pedagogical Framework (Informed by this study):
1. Idea Expansion: Student prompts for components: "Generate 10 descriptive words for a futuristic forest" and "List 5 emotional states for a lost robot." (Leverages quantity preference).
2. Evaluation & Selection: Student selects 3 words from list A ("bioluminescent," "overgrown," "silent") and 2 states from list B ("curious," "lonely"). (Applies critical triage).
3. Hybridization: Student writes: "In the silent, bioluminescent forest, the robot felt a deep loneliness mixed with curiosity." (Fuses AI output with personal syntax and narrative control).
This framework systematizes the effective behaviors observed in the study.
Experimental Results & Chart Description
The qualitative data suggests behavioral patterns that could be quantified in a larger study. A hypothetical bar chart would show:
- Y-axis: Frequency of Strategy Use.
- X-axis: Strategy Categories: "Guided Search (with pre-idea)," "Open Exploration," "Favor High-Quantity Output," "Express Skepticism of AI Idea," "Blend AI & Own Ideas."
- Result: Bars for "Guided Search," "Favor High-Quantity Output," and "Blend AI & Own Ideas" would be significantly taller than those for "Open Exploration," indicating the dominant, pragmatic approach students adopt towards NLG as a tool for augmentation, not replacement.
The primary "result" is the thematic map derived from student reflections, identifying the core tensions between the desire for creative assistance and the need for authorial ownership.
6. Future Applications & Directions
Short-term (1-3 years): Development of specialized educational NLG plugins for platforms like Google Docs or Word that offer scaffolded prompting (e.g., "brainstorm characters," "describe a setting using senses") and integration with formative assessment tools to provide feedback on the creativity and coherence of the human-AI co-written text.
Medium-term (3-5 years): "Adaptive Ideation Partners"—AI systems that learn individual student's creative profiles, preferred genres, and linguistic competence levels to tailor idea suggestions and vocabulary support dynamically, acting as a personalized writing tutor.
Long-term (5+ years): Convergence with immersive technologies. Using NLG coupled with multimodal AI to generate dynamic story worlds in VR/AR environments where the narrative adapts to the student's written choices, creating a deeply engaging feedback loop for practicing narrative construction and descriptive language.
The critical research direction is longitudinal studies on how sustained use of NLG tools affects the development of original creative thinking and writing proficiency in EFL learners, ensuring these tools enhance rather than atrophy foundational skills.
7. References
- Woo, D. J., Wang, Y., Susanto, H., & Guo, K. (2023). Understanding EFL Students’ Idea Generation Strategies for Creative Writing with NLG Tools. Manuscript in preparation.
- Graham, S., & Perin, D. (2007). A meta-analysis of writing instruction for adolescent students. Journal of Educational Psychology, 99(3), 445–476.
- Kaufman, J. C., & Beghetto, R. A. (2009). Beyond big and little: The four c model of creativity. Review of General Psychology, 13(1), 1–12.
- Dawson, P. (2005). Creative Writing and the New Humanities. Routledge.
- Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV).
- OpenAI. (2023). GPT-4 Technical Report. arXiv preprint arXiv:2303.08774.
- Swanson, H. L., & Berninger, V. W. (1996). Individual differences in children's working memory and writing skill. Journal of Experimental Child Psychology, 63(2), 358–385. (For cognitive load theory context).