Case Study: EFL Secondary Students' Prompt Engineering Pathways with ChatGPT for Writing Tasks

1. Introduction

The advent of state-of-the-art (SOTA) generative AI chatbots like ChatGPT has created a paradigm shift in language learning and writing support. Unlike rule-based predecessors, these models, built on neural network architectures like the Transformer, can generate coherent and contextually relevant text. For English as a Foreign Language (EFL) learners, this presents a powerful, yet complex, tool. The core challenge identified in this study is prompt engineering—the skill of crafting effective instructions to elicit desired outputs from the AI. Without this skill, users, especially non-technical students, are relegated to a frustrating trial-and-error process, limiting the tool's pedagogical potential.

This paper investigates the nascent prompt engineering behaviors of secondary school EFL students using ChatGPT for the first time to complete a writing task. It moves beyond theoretical discussion to present empirical, qualitative case studies that map distinct user pathways.

2. Methodology & Data Collection

The research employs a qualitative case study approach, analyzing real-world interaction data from novice users.

2.1. Participants & Task

Participants were secondary school EFL students with no prior formal experience using SOTA chatbots like ChatGPT. The study captured their process via iPad screen recordings as they engaged with the AI to complete a defined writing task. This methodology provides a raw, unfiltered view of the human-AI collaboration process.

2.2. Data Analysis Framework

The screen recordings were transcribed and analyzed to code for:

Prompt Content: The linguistic and instructional components of each student query (e.g., task description, style requests, constraints).
Prompt Quantity: The number of prompts used to complete the task.
Interaction Pattern: The sequence and nature of follow-up prompts based on AI responses.
Outcome Quality: The fitness of the final AI-generated text for the assigned task.

From this analysis, four archetypal user pathways were identified and developed into detailed case studies.

3. Case Studies: Four Prompt Engineering Pathways

The analysis crystallized four distinct behavioral patterns, representing a spectrum of prompt engineering sophistication.

3.1. Pathway A: The Minimalist

This student used a very low number of prompts (e.g., 1-2). The initial prompt was often a simple, direct translation of the task instruction (e.g., "Write an essay about climate change"). They showed minimal engagement with the AI's output, accepting the first result with little to no refinement. This pathway highlights a tool-as-oracle misconception, where the AI is seen as providing a complete, final answer rather than a collaborative partner.

3.2. Pathway B: The Iterative Refiner

This student used a moderate number of prompts in a linear, iterative sequence. They started with a basic prompt, reviewed the output, and issued follow-up commands for specific improvements (e.g., "Make it longer," "Use simpler words"). This pathway demonstrates an emerging understanding of the AI's responsiveness to instruction but remains within a basic revision-request framework.

3.3. Pathway C: The Structured Inquirer

This student employed a higher number of prompts with a strategic, multi-stage approach. They might first ask the AI to "brainstorm three ideas for an essay on X," then select one, then ask for an outline, and finally request a draft based on that outline. This pathway reflects a more sophisticated meta-cognitive strategy, breaking down the writing process and using the AI for structured support at each stage.

3.4. Pathway D: The Trial-and-Error Explorer

This student used a high volume of prompts with significant variation but little apparent strategy. Prompts shifted dramatically in focus and style (e.g., from formal to colloquial, from broad to narrow) without clear progression. This pathway embodies the unstructured experimentation that characterizes the novice experience, often resulting in confusion and inefficient use of time, though it may occasionally yield creative results.

4. Key Findings & Analysis

4.1. Prompt Quality & Quantity Patterns

The study found no simple correlation between the number of prompts and the quality of the final output. Pathway C (Structured Inquirer) often produced the most task-appropriate text, not necessarily through the most prompts, but through the most strategic and high-quality prompts. Quality was defined by specificity, context provision, and decomposition of the task. A single well-engineered prompt (e.g., "Write a 300-word persuasive essay for a school magazine arguing for more recycling bins on campus, using two statistics and a call to action") could outperform a dozen vague ones.

Interaction Summary

Pathway C (Structured) consistently yielded the highest-rated final drafts by independent evaluators, despite not always using the most turns. Pathway D (Trial-and-Error) had the highest variance in outcome quality.

4.2. The Role of AI Literacy

The pathways starkly illustrate varying levels of implicit AI literacy. Students in Pathways A and D lacked a functional mental model of how ChatGPT processes requests. In contrast, students in Pathways B and C demonstrated a budding understanding of the AI as a stochastic, instruction-following system. They intuitively grasped that clearer, more structured inputs lead to more predictable and useful outputs. This finding directly supports calls from organizations like the International Society for Technology in Education (ISTE) to integrate AI literacy fundamentals into K-12 curricula.

5. Technical Framework & Analysis

Understanding these pathways requires a technical lens. ChatGPT and similar models are based on the Transformer architecture and are fundamentally next-token predictors. The probability of generating a specific output sequence $O$ given an input prompt $P$ is modeled as: $$P(O|P) = \prod_{t=1}^{|O|} P(o_t | P, o_1, ..., o_{t-1})$$ where $o_t$ is the token at position $t$. A student's prompt $P$ sets the initial context and probability distribution for the output.

Analysis Framework Example: We can model a student's prompt engineering session as a state machine. Let the State (S) be the current context window of the conversation (the last $k$ tokens). The Action (A) is the student's next prompt. The Reward (R) is the perceived usefulness of the AI's response (e.g., a subjective score from 1-5). The student's goal is to learn a policy $\pi$ that maps states to actions to maximize cumulative reward. The four pathways represent different, often suboptimal, exploration policies for this reinforcement learning problem faced by the human user.

Chart Description: A conceptual chart would plot Prompt Specificity (X-axis) against Task Decomposition (Y-axis). Pathway A (Minimalist) would cluster in the low-low quadrant. Pathway D (Trial-and-Error) would show a scattered cloud across the graph. Pathway B (Iterative Refiner) would show a horizontal movement rightward (increasing specificity). Pathway C (Structured Inquirer) would occupy the high-high quadrant, demonstrating both high specificity and high use of task decomposition in their prompts.

6. Educational Implications & Future Directions

Core Implication: Leaving students to discover prompt engineering through trial-and-error is pedagogically inefficient and inequitable. It favors students who naturally develop strategic thinking (Pathway C) and disadvantages others.

Actionable Strategy: Explicit, scaffolded prompt engineering instruction must be integrated into EFL writing pedagogy. This includes:

Teaching the "Role-Goal-Format-Constraints" prompt framework.
Demonstrating iterative refinement (e.g., using ChatGPT's "regenerate" or "continue" functions strategically).
Critically evaluating AI outputs for bias, accuracy, and style.

Future Research & Development:

Adaptive Learning Interfaces: Future AI writing assistants could detect a user's pathway (e.g., detecting minimalist prompts) and offer contextual hints or tutorials to scaffold them toward more effective strategies.
Prompt Libraries & Templates: Developing curated, level-appropriate prompt templates for common EFL writing tasks (e.g., "Compare and contrast essay generator").
Longitudinal Studies: Tracking how students' prompt engineering pathways evolve with instruction and experience over time.
Cross-linguistic & Cultural Studies: Investigating if prompt engineering strategies differ significantly across languages and educational cultures.

7. References

Woo, D. J., Guo, K., & Susanto, H. (2023). Cases of EFL Secondary Students’ Prompt Engineering Pathways to Complete a Writing Task with ChatGPT. Manuscript in preparation.
Caldarini, G., Jaf, S., & McGarry, K. (2022). A Literature Survey of Recent Advances in Chatbots. Information, 13(1), 41.
Long, D., & Magerko, B. (2020). What is AI Literacy? Competencies and Design Considerations. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–16.
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017).
International Society for Technology in Education (ISTE). (2023). AI Explorations for Educators. Retrieved from iste.org.
Zhao, W. X., et al. (2023). A Survey of Large Language Models. arXiv preprint arXiv:2303.18223.

8. Analyst's Perspective: Deconstructing the Human-AI Writing Dance

Core Insight: This study isn't really about ChatGPT; it's a stark revelation of the unprepared human in the human-AI feedback loop. The tool is exponentially more capable than the user's ability to direct it. The four pathways aren't just behaviors; they're diagnostic markers for a new form of digital illiteracy. The real product gap isn't a better LLM, but a better human interface layer that teaches interaction strategy in real-time.

Logical Flow: The paper correctly identifies the problem (trial-and-error is the default) and provides elegant, empirical evidence through the pathway taxonomy. The logical leap it makes—and this is crucial—is that these novice behaviors are not a temporary phase. Without intervention, the Minimalist and Trial-and-Error Explorer pathways can solidify into permanent, suboptimal usage patterns, cementing a power asymmetry where the user is led by the tool's defaults rather than directing it. This aligns with broader concerns in HCI research, such as those discussed in works on "automation bias" and "skill decay" in highly assisted systems.

Strengths & Flaws: The strength is its grounded, observational methodology. Screen recordings don't lie. The major flaw, acknowledged implicitly, is scale. Four pathways from a limited sample are compelling archetypes, not definitive categories. The study also sidesteps the elephant in the room: assessment. If a Minimalist gets a passing grade from an overworked teacher using an AI-generated essay, what incentive do they have to learn prompt engineering? The paper's educational recommendations hinge on a system that values process over product, which most current educational assessment frameworks do not.

Actionable Insights: For EdTech investors and developers, the takeaway is clear: the next wave of value creation is in prompt engineering scaffolding. Think Grammarly for prompts—an overlay that analyzes a student's initial vague command and suggests, "Try adding a target audience and word count. Click here to see an example." For school administrators, the mandate is to fund professional development not just on using AI, but on teaching the pedagogy of interacting with AI. This study provides the perfect evidence to argue for that budget line. Finally, for researchers, the pathway framework is a replicable lens. Apply it to professionals using AI for coding (GitHub Copilot), design, or legal research. I predict you'll find the same four archetypes, proving this is a fundamental human-computer interaction challenge, not just an EFL issue.

Table of Contents