Is Data-Driven Learning with AI-Generated Example Sentences Effective for Elementary-Level Learners? A Case Study of the Synonyms “collect” and “gather”

Ryosuke NAKAHARA

Asian Journal of English Language Teaching ›› 2025, Vol. 34 ›› Issue (1) : 5-22.

PDF(1075 KB)
PDF(1075 KB)
Asian Journal of English Language Teaching ›› 2025, Vol. 34 ›› Issue (1) : 5-22.

Is Data-Driven Learning with AI-Generated Example Sentences Effective for Elementary-Level Learners? A Case Study of the Synonyms “collect” and “gather”

  • Ryosuke NAKAHARA
Author information +
History +

Abstract

Despite its potential, data-driven learning (DDL) remains underutilized in pre-tertiary education, partly due to the lack of tools that provide ample and level-appropriate example sentences. This classroom-based case study investigates whether generative artificial intelligence (GenAI) can generate suitable example sentences to help elementary-level learners distinguish between the synonyms “gather” and “collect”, and whether the learners can discern their differences through hands-off DDL with the AI-generated example sentences using a pre- and post-test design. Analyses based on the English Vocabulary Profile (EVP) and Coh-Metrix revealed that GenAI can generate sentences that align with the CEFR levels specified in the prompt (i.e., A1-A2). Furthermore, a Bayesian Wilcoxon signed-rank test yielded a Bayes factor (BF₁₀) of 31.564 and an effect size (δ) of -0.68, indicating that the hypothesis that post-test scores are higher than pre-test scores is approximately 32 times more plausible, representing a medium effect size. Although this is a small-scale, classroom-based case study, the findings suggest that GenAI can function as a user-friendly concordancer, offering accessible tools for younger learners and potentially addressing some of the key barriers to implementing DDL in schools.

Cite this article

Download Citations
Ryosuke NAKAHARA. Is Data-Driven Learning with AI-Generated Example Sentences Effective for Elementary-Level Learners? A Case Study of the Synonyms “collect” and “gather”[J]. Asian Journal of English Language Teaching. 2025, 34(1): 5-22

References

Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis.Language Learning, 67, 348-393. doi: 10.1111/lang.12224
Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time.Language Learning & Technology, 25(3), 66-89. doi: 10125/73450
Capel, A. (2010). A1-B2 vocabulary: insights and issues arising from the English Profile Wordlists project.English Profile Journal, 1, e3. doi:10.1017/S2041536210000048
Capel, A. (2012). Completing the English Vocabulary Profile: C1 and C2 vocabulary.English Profile Journal, 3, e1. doi:10.1017/S2041536212000013
Crossley, S. A., & McNamara, D. S. (2014). What's so simple about simplified texts? A computational and psycholinguistic investigation of text comprehension and text processing.Reading in a Foreign Language, 26(1), 92-113. doi: 10125/66686
Crosthwaite, P., & Baisa, V. (2023). Generative AI and the end of corpus-assisted data-driven learning? Not so fast!Applied Corpus Linguistics, 3(3). doi:10.1016/j.acorp.2023.100066
DAIR.AI. (2025, April). Prompt engineering guide. https://www.promptingguide.ai/
Flowerdew, J. (2024). Data-driven learning: From Collins Cobuild Dictionary to ChatGPT.Language Teaching, 1-18. doi:10.1017/S0261444824000144
Goss-Sampson, M. (2020). Bayesian Inference in JASP: A guide for students.doi: 10.17605/OSF.IO/CKNXM
Graesser A. C., McNamara D. S., & Kulikowich J. M. (2011). Coh-Metrix: Providing Multilevel Analyses of Text Characteristics. Educational Researcher, 40(5), 223-234. doi:10.3102/0013189X11413260
Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. English Language Research Journal, 4, 1-16. https://lexically.net/wordsmith/corpus_linguistics_links/Tim%20Johns%20and%20DDL.pdf
Lin, P. (2023). ChatGPT: Friend or foe (to corpus linguists)?Applied Corpus Linguistics, 3(3). doi: 10.1016/j.acorp.2023.100065
McLaughlin, B. (1990). Restructuring.Applied Linguistics, 11(2), 113-128. doi: 10.1093/applin/11.2.113
Miočević M., Levy R., & van de Schoot, R. (2020). Introduction to Bayesian statistics. In R. van de Schoot & M. Miočević(Eds.), Small sample size solutions: A guide for applied researchers and practitioners
Mizumoto, A. (2023). Data-driven learning meets generative AI: Introducing the framework of metacognitive resource use.Applied Corpus Linguistics, 3(3). doi: 10.1016/j.acorp.2023.100074
Mizumoto, A., & Chujo, K. (2015). A Meta-analysis of Data-driven Learning Approach in the Japanese EFL Classroom.English Corpus Studies, 22, 1-18.
Mizumoto, A., & Teng, M. F. (2025). Large language models fall short in classifying learners' open-ended responses. Research Methods in Applied Linguistics, 4(2), Article 100210. doi:10.1016/j.rmal.2025.100210
Nakahara, R., & Sugiura, M. (2025). Generalizing linguistic patterns through data-driven learning: A study of the dative alternation in Japanese learners of English. Language Teaching Research,0(0). doi:10.1177/13621688241305787
Nishigaki, C., & Akasegawa, S. (2023). Development and revision of DDL tools for secondary school students: What we can do to nurture autonomous corpus users?English Corpus Studies, 30, 131-149.
O'Keeffe, A. (2023). A theoretical rationale for the importance of patterning in language acquisition and the implications for data-driven learning.Nordic Journal of English Studies, 22, 16-41. doi: 10.35360/njes.793
Pérez-Paredes, P. (2020). The pedagogic advantage of teenage corpora for secondary school learners. In P. Crosthwaite (Ed.), Data driven learning for the next generation: Corpora and DDL for pre-tertiary learners. Routledge.
Saeedakhtar A., Bagerin M., & Abdi R. (2020). The effect of hands-on and hands-off data-driven learning on low-intermediate learners' verb-preposition collocations. System, 91, Article 102268. doi: 10.1016/j.system.2020.102268
Smart, J. (2014). The role of guided induction in paper-based data-driven learning.ReCALL, 26(2), 184-201. doi:10.1017/S0958344014000081
Söğüt, S. (2024). L2 Learners' Perspectives on data-driven learning for identifying properties of near-synonymous words: A convergent mixed-methods study.GIST - Education and Learning Research Journal, 27. doi: 10.26817/16925777.1777
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press.
Uchida, S. (2025). Generative AI and CEFR levels: Evaluating the accuracy of text generation with ChatGPT-4o through textual features.Vocabulary Learning and Instruction, 14(1), 2078. doi: 10.29140/vli.v14n1.2078
Uchida S., Arase Y., & Kajiwara T. (2024). Profiling English sentences based on CEFR levels.International Journal of Applied Linguistics, 175(1), 103-126. doi: 10.1075/itl.22018.uch
Yoon, H., & Jo, J. W. (2014). Direct and indirect access to corpora: An exploratory case study comparing students' error correction and learning strategy use in L2 writing.Language Learning & Technology, 18, 96-117. doi: 10125/44356
PDF(1075 KB)

Accesses

Citation

Detail

Sections
Recommended

/