The Predictive Power of Linguistic Features on EFL Writing Quality Among Chinese Secondary Students: A Multilayer Perceptron Approach 

Gavin Bui

Asian Journal of English Language Teaching ›› 2026, Vol. 35 ›› Issue (3) : 40-56.

PDF(440 KB)
PDF(440 KB)
Asian Journal of English Language Teaching ›› 2026, Vol. 35 ›› Issue (3) : 40-56. DOI: 10.65961/AJELT-2026-3-003
Research Article

The Predictive Power of Linguistic Features on EFL Writing Quality Among Chinese Secondary Students: A Multilayer Perceptron Approach 

  • Gavin Bui
Author information +
History +

Abstract

As one of the pioneering applications of machine learning for quantitative data analysis in applied linguistics, this study investigates the predictive power of linguistic features on English as a Foreign Language (EFL) writing quality. To overcome the limitations of traditional linear regression,  specifically multicollinearity and non-normal data distributions, we employed a Multilayer Perceptron (MLP) neural network to analyse 96 essays from Chinese secondary school students. The model evaluated the relative predictive importance of various lexical, syntactic, fluency, and accuracy features. Demonstrating exceptional fit, the neural network accounted for approximately 80% of the variance in overall writing scores with no evidence of overfitting. Sensitivity analyses revealed that lexical diversity was the most robust predictor of writing quality (100% normalised importance), followed by grammatical accuracy (60.4%). Conversely, syntactic complexity, text length, and lexical sophistication exhibited comparatively minimal influence. These findings underscore the critical role of vocabulary diversity in evaluating adolescent EFL writing, while successfully establishing neural networks as a powerful, innovative methodological tool for future applied linguistic research.

Key words

EFL writing / multilayer perceptron / linguistic predictors / lexical diversity / machine learning as statistical modelling

Cite this article

Download Citations
Gavin Bui. (2026). The Predictive Power of Linguistic Features on EFL Writing Quality Among Chinese Secondary Students: A Multilayer Perceptron Approach .Asian Journal of English Language Teaching , 35(3): 40-56. https://doi.org/10.65961/AJELT-2026-3-003

References

Bui, G. (2021). Influence of learners’ prior knowledge, L2 proficiency and pre-task planning on L2 lexical complexity. IRAL - International Review of Applied Linguistics in Language Teaching, 59(4), 543–567. https://doi.org/10.1515/iral-2018-0244

Bui, G., & Luo, X. (2021). Topic familiarity and story continuation in young English as a foreign language learners’ writing tasks. Studies in Second Language Learning and Teaching, 11(3), 377–400. https://doi.org/10.14746/ssllt.2021.11.3.4

Bui, G., & Teng, F. (2021). Exploring complexity in L2 and L3 motivational systems: A dynamic systems theory perspective. The Language Learning Journal, 49(3), 302–317. https://doi.org/10.1080/09571736.2019.1610032

Bui, G., & Yu, R. (2019). Spaced multi-draft composing and feedback in mainland Chinese English as a foreign language secondary school writing literacy. In B. Reynolds & M. F. Teng (Eds.), English literacy instruction for Chinese speakers (pp. 127–141). Palgrave Macmillan. https://doi.org/10.1007/978-981-13-6653-6_8

Bui, G., & Skehan, P. (2018). Complexity, fluency and accuracy. In J. Liontas (Ed.), The TESOL encyclopedia of English language teaching (pp.1–8). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118784235.eelt0046

Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Investigating complexity, accuracy and fluency in SLA (pp. 21–46). John Benjamins. https://doi.org/10.1075/lllt.32.02bul

Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research11(3), 415-443. https://doi.org/10.17239/jowr-2020.11.03.01

Crossley, S. A., Roscoe, R., & McNamara, D. S. (2014). What is successful writing? An investigation into the multiple ways writers can write successful essays. Written Communication, 31(2), 184–214. https://doi.org/10.1177/0741088314526354

Ferris, D. (2002). Treatment of error in second language student writing. University of Michigan Press. 

Haykin, S. (2009). Neural networks and learning machines (3rd ed.). Pearson Education.

Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63(s1), 87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S.Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 57–71). Lawrence Erlbaum Associates.

Kellogg, R. T. (2001). Competition for working memory among writing processes. American Journal of Psychology, 114(2), 175–192. https://doi.org/10.2307/1423513

Larsen-Freeman, D. (2006). The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English. Applied Linguistics, 27(4), 590–619. https://doi.org/10.1093/applin/aml029

Lin, L. H. F. (2023). Assessing the role of vocabulary knowledge in developing EFL learners’ writing skills: Implications for intentional and incidental vocabulary learning. Asian Journal of English Language Teaching, 32(1), 105-130. https://doi.org/10.65961/AJELT-2023-1-005

Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85–104. https://doi.org/10.1191/0265532202lt221oa

McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381–392. https://doi.org/10.3758/BRM.42.2.381

Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16(3), 5–19.

Michel, M. (2017). Complexity, accuracy and fluency in L2 production. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 50–68). Routledge. https://doi.org/10.4324/9781315676968-4

Norouzian, R., & Bui, G. (2024). Meta-analysis of second language research with complex research designs. Studies in Second Language Acquisition, 46(1), 251-276.  https://doi.org/10.1017/S0272263123000311

Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044

Plonsky, L., & Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA in L2 research. Studies in Second Language Acquisition, 39(3), 579–592. https://doi.org/10.1017/S0272263116000231

Prakash, K. B. (2024). Machine learning for industrial applications. Scrivener Publishing LLC. https://doi.org/10.1002/9781394268993

Polio, C., & Shea, M. C. (2014). An investigation into current measures of linguistic accuracy in second language writing research. Journal of Second Language Writing, 26, 10–27. https://doi.org/10.1016/j.jslw.2014.09.003

Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55, 2495–2527. https://doi.org/10.1007/s10462-021-10068-2

Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied Linguistics, 17(1), 38–62. https://doi.org/10.1093/applin/17.1.38

Skehan, P. (2009). Lexical performance by native and non-native speakers on language-learning tasks. In B. Richards, H. Daller, D. Malvern, P. Meara, J. Milton, & J. Treffers-Daller (Eds.), Vocabulary studies in first and second language acquisition: The interface between theory and application (pp. 107–124). Palgrave Macmillanhttps://doi.org/10.1057/9780230242258_7

Skehan, P. (Ed.) (2014). Processing perspectives on task performance. John Benjamins. https://doi.org/10.1075/tblt.5  

Taghipour, K., & Ng, H. T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1882–1891). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1193

Teng, M. F., & Huang, J. (2023). The effects of incorporating metacognitive strategies instruction into collaborative writing on writing complexity, accuracy, and fluency. Asia Pacific Journal of Education, 43(4), 1071-1090. https://doi.org/10.1080/02188791.2021.1982675

Funding

This research was supported by Research Grants Council, University Grants Committee, Hong Kong. Ref. No. UGC/FDS14/H13/20. Gemini 3.0 was used to proofread the manuscript for language accuracy and lexical appropriateness. The author reviewed and edited all AI‑generated suggestions. All study design, data collection, and analysis were performed by the author and research assistants, who assume full responsibility for the content, including any remaining errors.
PDF(440 KB)

Accesses

Citation

Detail

Sections
Recommended

/