PDF(440 KB)
The Predictive Power of Linguistic Features on EFL Writing Quality Among Chinese Secondary Students: A Multilayer Perceptron Approach
Gavin Bui
Asian Journal of English Language Teaching ›› 2026, Vol. 35 ›› Issue (3) : 40-56.
PDF(440 KB)
PDF(440 KB)
The Predictive Power of Linguistic Features on EFL Writing Quality Among Chinese Secondary Students: A Multilayer Perceptron Approach
As one of the pioneering applications of machine learning for quantitative data analysis in applied linguistics, this study investigates the predictive power of linguistic features on English as a Foreign Language (EFL) writing quality. To overcome the limitations of traditional linear regression, specifically multicollinearity and non-normal data distributions, we employed a Multilayer Perceptron (MLP) neural network to analyse 96 essays from Chinese secondary school students. The model evaluated the relative predictive importance of various lexical, syntactic, fluency, and accuracy features. Demonstrating exceptional fit, the neural network accounted for approximately 80% of the variance in overall writing scores with no evidence of overfitting. Sensitivity analyses revealed that lexical diversity was the most robust predictor of writing quality (100% normalised importance), followed by grammatical accuracy (60.4%). Conversely, syntactic complexity, text length, and lexical sophistication exhibited comparatively minimal influence. These findings underscore the critical role of vocabulary diversity in evaluating adolescent EFL writing, while successfully establishing neural networks as a powerful, innovative methodological tool for future applied linguistic research.
EFL writing / multilayer perceptron / linguistic predictors / lexical diversity / machine learning as statistical modelling
Bui, G. (2021). Influence of learners’ prior knowledge, L2 proficiency and pre-task planning on L2 lexical complexity. IRAL - International Review of Applied Linguistics in Language Teaching, 59(4), 543–567. https://doi.org/10.1515/iral-2018-0244
Bui, G., & Luo, X. (2021). Topic familiarity and story continuation in young English as a foreign language learners’ writing tasks. Studies in Second Language Learning and Teaching, 11(3), 377–400. https://doi.org/10.14746/ssllt.2021.11.3.4
Bui, G., & Teng, F. (2021). Exploring complexity in L2 and L3 motivational systems: A dynamic systems theory perspective. The Language Learning Journal, 49(3), 302–317. https://doi.org/10.1080/09571736.2019.1610032
Bui, G., & Yu, R. (2019). Spaced multi-draft composing and feedback in mainland Chinese English as a foreign language secondary school writing literacy. In B. Reynolds & M. F. Teng (Eds.), English literacy instruction for Chinese speakers (pp. 127–141). Palgrave Macmillan. https://doi.org/10.1007/978-981-13-6653-6_8
Bui, G., & Skehan, P. (2018). Complexity, fluency and accuracy. In J. Liontas (Ed.), The TESOL encyclopedia of English language teaching (pp.1–8). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118784235.eelt0046
Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Investigating complexity, accuracy and fluency in SLA (pp. 21–46). John Benjamins. https://doi.org/10.1075/lllt.32.02bul
Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415-443. https://doi.org/10.17239/jowr-2020.11.03.01
Crossley, S. A., Roscoe, R., & McNamara, D. S. (2014). What is successful writing? An investigation into the multiple ways writers can write successful essays. Written Communication, 31(2), 184–214. https://doi.org/10.1177/0741088314526354
Ferris, D. (2002). Treatment of error in second language student writing. University of Michigan Press.
Haykin, S. (2009). Neural networks and learning machines (3rd ed.). Pearson Education.
Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63(s1), 87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x
Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S.Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 57–71). Lawrence Erlbaum Associates.
Kellogg, R. T. (2001). Competition for working memory among writing processes. American Journal of Psychology, 114(2), 175–192. https://doi.org/10.2307/1423513
Larsen-Freeman, D. (2006). The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English. Applied Linguistics, 27(4), 590–619. https://doi.org/10.1093/applin/aml029
Lin, L. H. F. (2023). Assessing the role of vocabulary knowledge in developing EFL learners’ writing skills: Implications for intentional and incidental vocabulary learning. Asian Journal of English Language Teaching, 32(1), 105-130. https://doi.org/10.65961/AJELT-2023-1-005
Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85–104. https://doi.org/10.1191/0265532202lt221oa
McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381–392. https://doi.org/10.3758/BRM.42.2.381
Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16(3), 5–19.
Michel, M. (2017). Complexity, accuracy and fluency in L2 production. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 50–68). Routledge. https://doi.org/10.4324/9781315676968-4
Norouzian, R., & Bui, G. (2024). Meta-analysis of second language research with complex research designs. Studies in Second Language Acquisition, 46(1), 251-276. https://doi.org/10.1017/S0272263123000311
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044
Plonsky, L., & Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA in L2 research. Studies in Second Language Acquisition, 39(3), 579–592. https://doi.org/10.1017/S0272263116000231
Prakash, K. B. (2024). Machine learning for industrial applications. Scrivener Publishing LLC. https://doi.org/10.1002/9781394268993
Polio, C., & Shea, M. C. (2014). An investigation into current measures of linguistic accuracy in second language writing research. Journal of Second Language Writing, 26, 10–27. https://doi.org/10.1016/j.jslw.2014.09.003
Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55, 2495–2527. https://doi.org/10.1007/s10462-021-10068-2
Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied Linguistics, 17(1), 38–62. https://doi.org/10.1093/applin/17.1.38
Skehan, P. (2009). Lexical performance by native and non-native speakers on language-learning tasks. In B. Richards, H. Daller, D. Malvern, P. Meara, J. Milton, & J. Treffers-Daller (Eds.), Vocabulary studies in first and second language acquisition: The interface between theory and application (pp. 107–124). Palgrave Macmillan. https://doi.org/10.1057/9780230242258_7
Skehan, P. (Ed.) (2014). Processing perspectives on task performance. John Benjamins. https://doi.org/10.1075/tblt.5
Taghipour, K., & Ng, H. T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1882–1891). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1193
Teng, M. F., & Huang, J. (2023). The effects of incorporating metacognitive strategies instruction into collaborative writing on writing complexity, accuracy, and fluency. Asia Pacific Journal of Education, 43(4), 1071-1090. https://doi.org/10.1080/02188791.2021.1982675
/
| 〈 |
|
〉 |