Is Data-Driven Learning with AI-Generated Example Sentences Effective for Elementary-Level Learners? A Case Study of the Synonyms “collect” and “gather”

Ryosuke NAKAHARA

Asian Journal of English Language Teaching ›› 2025, Vol. 34 ›› Issue (1) : 5-22.

PDF(1075 KB)
PDF(1075 KB)
Asian Journal of English Language Teaching ›› 2025, Vol. 34 ›› Issue (1) : 5-22.

Is Data-Driven Learning with AI-Generated Example Sentences Effective for Elementary-Level Learners? A Case Study of the Synonyms “collect” and “gather”

  • Ryosuke NAKAHARA
Author information +
History +

Abstract

Despite its potential, data-driven learning (DDL) remains underutilized in pre-tertiary education, partly due to the lack of tools that provide ample and level-appropriate example sentences. This classroom-based case study investigates whether generative artificial intelligence (GenAI) can generate suitable example sentences to help elementary-level learners distinguish between the synonyms “gather” and “collect”, and whether the learners can discern their differences through hands-off DDL with the AI-generated example sentences using a pre- and post-test design. Analyses based on the English Vocabulary Profile (EVP) and Coh-Metrix revealed that GenAI can generate sentences that align with the CEFR levels specified in the prompt (i.e., A1-A2). Furthermore, a Bayesian Wilcoxon signed-rank test yielded a Bayes factor (BF₁₀) of 31.564 and an effect size (δ) of -0.68, indicating that the hypothesis that post-test scores are higher than pre-test scores is approximately 32 times more plausible, representing a medium effect size. Although this is a small-scale, classroom-based case study, the findings suggest that GenAI can function as a user-friendly concordancer, offering accessible tools for younger learners and potentially addressing some of the key barriers to implementing DDL in schools.

Cite this article

Download Citations
Ryosuke NAKAHARA. (2025). Is Data-Driven Learning with AI-Generated Example Sentences Effective for Elementary-Level Learners? A Case Study of the Synonyms “collect” and “gather”.Asian Journal of English Language Teaching , 34(1): 5-22

References

Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis.Language Learning, 67, 348-393. doi: 10.1111/lang.12224
Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time.Language Learning & Technology, 25(3), 66-89. doi: 10125/73450
Capel, A. (2010). A1-B2 vocabulary: insights and issues arising from the English Profile Wordlists project.English Profile Journal, 1, e3. doi:10.1017/S2041536210000048
Capel, A. (2012). Completing the English Vocabulary Profile: C1 and C2 vocabulary.English Profile Journal, 3, e1. doi:10.1017/S2041536212000013
Crossley, S. A., & McNamara, D. S. (2014). What's so simple about simplified texts? A computational and psycholinguistic investigation of text comprehension and text processing.Reading in a Foreign Language, 26(1), 92-113. doi: 10125/66686
Crosthwaite, P., & Baisa, V. (2023). Generative AI and the end of corpus-assisted data-driven learning? Not so fast!Applied Corpus Linguistics, 3(3). doi:10.1016/j.acorp.2023.100066
DAIR.AI. (2025, April). Prompt engineering guide. https://www.promptingguide.ai/
Flowerdew, J. (2024). Data-driven learning: From Collins Cobuild Dictionary to ChatGPT.Language Teaching, 1-18. doi:10.1017/S0261444824000144
Goss-Sampson, M. (2020). Bayesian Inference in JASP: A guide for students.doi: 10.17605/OSF.IO/CKNXM
Graesser A. C., McNamara D. S., & Kulikowich J. M. (2011). Coh-Metrix: Providing Multilevel Analyses of Text Characteristics. Educational Researcher, 40(5), 223-234. doi:10.3102/0013189X11413260
Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. English Language Research Journal, 4, 1-16. https://lexically.net/wordsmith/corpus_linguistics_links/Tim%20Johns%20and%20DDL.pdf
Lin, P. (2023). ChatGPT: Friend or foe (to corpus linguists)?Applied Corpus Linguistics, 3(3). doi: 10.1016/j.acorp.2023.100065
McLaughlin, B. (1990). Restructuring.Applied Linguistics, 11(2), 113-128. doi: 10.1093/applin/11.2.113
Miočević M., Levy R., & van de Schoot, R. (2020). Introduction to Bayesian statistics. In R. van de Schoot & M. Miočević(Eds.), Small sample size solutions: A guide for applied researchers and practitioners
Mizumoto, A. (2023). Data-driven learning meets generative AI: Introducing the framework of metacognitive resource use.Applied Corpus Linguistics, 3(3). doi: 10.1016/j.acorp.2023.100074
Mizumoto, A., & Chujo, K. (2015). A Meta-analysis of Data-driven Learning Approach in the Japanese EFL Classroom.English Corpus Studies, 22, 1-18.
Mizumoto, A., & Teng, M. F. (2025). Large language models fall short in classifying learners' open-ended responses. Research Methods in Applied Linguistics, 4(2), Article 100210. doi:10.1016/j.rmal.2025.100210
Nakahara, R., & Sugiura, M. (2025). Generalizing linguistic patterns through data-driven learning: A study of the dative alternation in Japanese learners of English. Language Teaching Research,0(0). doi:10.1177/13621688241305787
Nishigaki, C., & Akasegawa, S. (2023). Development and revision of DDL tools for secondary school students: What we can do to nurture autonomous corpus users?English Corpus Studies, 30, 131-149.
O'Keeffe, A. (2023). A theoretical rationale for the importance of patterning in language acquisition and the implications for data-driven learning.Nordic Journal of English Studies, 22, 16-41. doi: 10.35360/njes.793
Pérez-Paredes, P. (2020). The pedagogic advantage of teenage corpora for secondary school learners. In P. Crosthwaite (Ed.), Data driven learning for the next generation: Corpora and DDL for pre-tertiary learners. Routledge.
Saeedakhtar A., Bagerin M., & Abdi R. (2020). The effect of hands-on and hands-off data-driven learning on low-intermediate learners' verb-preposition collocations. System, 91, Article 102268. doi: 10.1016/j.system.2020.102268
Smart, J. (2014). The role of guided induction in paper-based data-driven learning.ReCALL, 26(2), 184-201. doi:10.1017/S0958344014000081
Söğüt, S. (2024). L2 Learners' Perspectives on data-driven learning for identifying properties of near-synonymous words: A convergent mixed-methods study.GIST - Education and Learning Research Journal, 27. doi: 10.26817/16925777.1777
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press.
Uchida, S. (2025). Generative AI and CEFR levels: Evaluating the accuracy of text generation with ChatGPT-4o through textual features.Vocabulary Learning and Instruction, 14(1), 2078. doi: 10.29140/vli.v14n1.2078
Uchida S., Arase Y., & Kajiwara T. (2024). Profiling English sentences based on CEFR levels.International Journal of Applied Linguistics, 175(1), 103-126. doi: 10.1075/itl.22018.uch
Yoon, H., & Jo, J. W. (2014). Direct and indirect access to corpora: An exploratory case study comparing students' error correction and learning strategy use in L2 writing.Language Learning & Technology, 18, 96-117. doi: 10125/44356
PDF(1075 KB)

Accesses

Citation

Detail

Sections
Recommended

/