Corpus-driven language learning: A scientometric analysis of contemporary trends and their trajectories for pedagogical innovation within the Indonesian context

Authors

  • Danang Satria Nugraha Sanata Dharma University; University of Szeged

DOI:

https://doi.org/10.26555/bs.v45i1.1214

Keywords:

Corpus-driven, Language learning, Indonesian, Pedagogical innovation

Abstract

In the contemporary landscape of language education, the integration of technology and data-driven approaches has emerged as a significant catalyst for pedagogical innovation. This study investigates contemporary trends and trajectories in corpus-driven language learning (CDLL) through a scientometric analysis of 1,118 journal articles indexed in Scopus from 2014 to 2024. This study selected articles that substantively explored CDLL principles and applications using empirical methodologies, excluding those lacking direct CDLL focus or empirical data to capture the most recent decade of research. Employing bibliometric analysis and keywords co-occurrence visualization with VOSviewer, the study identified key themes, prominent actors, and emerging patterns in CDLL research, aiming to inform pedagogical innovation within the Indonesian context. Results revealed a significant growth in CDLL publications, with research concentrated in China, the United States, and the United Kingdom. Keyword analysis identified four distinct thematic clusters: (1) computational linguistics and artificial intelligence (37%), highlighting the integration of deep learning, language modeling, and machine translation in language learning; (2) specialized applications of CDLL (22%), particularly in information management; (3) human-centered language learning (21.5%), emphasizing social interaction, cognitive processes, and technology integration; and (4) foundational principles of CDLL (19.5%), encompassing corpus linguistics, language acquisition, and pedagogical practices. These findings underscore the growing prominence of CDLL in language education and its potential to transform pedagogical practices in Indonesia by leveraging technology, promoting learner autonomy, and integrating authentic language data into diverse learning contexts.  

References

Alruwaili, A. K. (2024). Exploring language teachers’ perceptions of corpus literacy skills at pre-tertiary level. International Journal of Computer-Assisted Language Learning and Teaching, 14(1). https://doi.org/10.4018/IJCALLT.352064

Bal-Gezegin, B., Akbaş, E., & Başal, A. (2022). Corpus made my job easier: Preservice language teachers’ corrective feedback practices in writing with corpus consultation. In English Language Education, 30. https://doi.org/10.1007/978-3-031-13540-8_14

Balouchzahi, F., Sidorov, G., & Gelbukh, A. (2023). PolyHope: Two-level hope speech detection from tweets. Expert Systems with Applications, 225. https://doi.org/10.1016/j.eswa.2023.120078

Charles, M. (2022). The gap between intentions and reality: Reasons for EAP writers’ non-use of corpora. Applied Corpus Linguistics, 2(3). https://doi.org/10.1016/j.acorp.2022.100032

Chen, M., & Flowerdew, J. (2018). A critical review of research and practice in data-driven learning (DDL) in the academic writing classroom. International Journal of Corpus Linguistics, 23(3), 335–369. https://doi.org/10.1075/ijcl.16130.che

Chen, Z., & Jiao, J. (2019). Effect of the blended learning approach on teaching corpus use for collocation richness and accuracy. In Communications in Computer and Information Science, 1048. https://doi.org/10.1007/978-981-13-9895-7_6

Crosthwaite, P. (2017). Retesting the limits of data-driven learning: feedback and error correction. Computer Assisted Language Learning, 30(6), 447–473. https://doi.org/10.1080/09588221.2017.1312462

___________. (2019). Data-driven learning and younger learners: Introduction to the volume. In Data-Driven Learning for the Next Generation: Corpora and DDL for Pre-tertiary Learners. https://doi.org/10.4324/9780429425899-1

Crosthwaite, P., & Baisa, V. (2023). Generative AI and the end of corpus-assisted data-driven learning? Not so fast! Applied Corpus Linguistics, 3(3). https://doi.org/10.1016/j.acorp.2023.100066

Crosthwaite, P., Luciana, & Schweinberger, M. (2021). Voices from the periphery: Perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and data-driven learning in the L2 English classroom. Applied Corpus Linguistics, 1(1). https://doi.org/10.1016/j.acorp.2021.100003

Crosthwaite, P., & Stell, A. (2019). It helps me get ideas on how to use my words: Primary school students’ initial reactions to corpus use in a private tutoring setting. In Data-Driven Learning for the Next Generation: Corpora and DDL for Pre-tertiary Learners. https://doi.org/10.4324/9780429425899-9

Emir, G., & Yangın-Ekşi, G. (2023). Corpus used as a data-driven learning tool in L2 academic writing: Evidence from Turkish contexts. Teflin Journal, 34(2), 209–225. https://doi.org/10.15639/teflinjournal.v34i2/209-225

Esfahani, M. J. B., & Ketabi, S. (2024). The effect of corpus-assisted language teaching on academic collocation acquisition by Iranian advanced EFL learners. Journal of Applied Research in Higher Education, 16(4), 1188–1213. https://doi.org/10.1108/JARHE-05-2023-0199

Flowerdew, L. (2022). Using corpora for writing instruction (Second edition). In The Routledge Handbook of Corpus Linguistics. https://doi.org/10.4324/9780367076399-31

Flowerdew, L., & Petrić, B. (2024). A critical review of corpus-based pedagogic perspectives on thesis writing: Specificity revisited. English for Specific Purposes, 76, 1–13. https://doi.org/10.1016/j.esp.2024.05.003

Gardner, S. (2024). Corpus approaches to discourse and second language research. In The Routledge Handbook of Second Language Acquisition and Discourse. https://doi.org/10.4324/9781003177579-14

Henneken, E. A., & Kurtz, M. J. (2019). Usage bibliometrics as a tool to measure research activity. https://doi.org/10.1007/978-3-030-02511-3_32

Hu, B., Tang, B., Chen, Q., & Kang, L. (2016). A novel word embedding learning model using the dissociation between nouns and verbs. Neurocomputing, 171, 1108–1117. https://doi.org/10.1016/j.neucom.2015.07.046

Ihrmark, D. (2023). Revisiting the computer as informant from a teacher-mediated perspective: Suggested implementation of an automated language diagnostics tool. NJES Nordic Journal of English Studies, 22(1), 42–67. https://doi.org/10.35360/njes.794

Kızıl, A. S. (2023). Data-driven learning: English as a foreign language writing and complexity, accuracy and fluency measures. Journal of Computer Assisted Learning, 39(4), 1382–1395. https://doi.org/10.1111/jcal.12807

Lin, M. H. (2021). Effects of data-driven learning on college students of different grammar proficiencies: A preliminary empirical assessment in EFL classes. SAGE Open, 11(3). https://doi.org/10.1177/21582440211029936

Liu, S., Tang, B., Chen, Q., & Wang, X. (2015). Effects of semantic features on machine learning-based drug name recognition systems: Word embeddings vs. Manually constructed dictionaries. Information (Switzerland), 6(4), 848–865. https://doi.org/10.3390/info6040848

Liu, T., & Chen, M. (2023). An investigation into learners’ cognitive processes in data-driven learning: Case studies of six learners of Chinese. Chinese Journal of Applied Linguistics, 46(4), 544–561. https://doi.org/10.1515/CJAL-2023-0404

Lusta, A., Demirel, Ö., & Mohammadzadeh, B. (2023). Language corpus and data driven learning (DDL) in language classrooms: A systematic review. Heliyon, 9(12). https://doi.org/10.1016/j.heliyon.2023.e22731

Lyu, Y., & Han, Z. (2023). Applying data-driven learning in self-translation of academic discourse: A case study of a Chinese medical student. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1071123

Ma, Q., Tang, J., & Lin, S. (2022). The development of corpus-based language pedagogy for TESOL teachers: A two-step training approach facilitated by online collaboration. Computer Assisted Language Learning, 35(9), 2731–2760. https://doi.org/10.1080/09588221.2021.1895225

Mamta, Ekbal, A., & Bhattacharyya, P. (2022). Exploring multi-lingual, multi-task, and adversarial learning for low-resource sentiment analysis. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(5). https://doi.org/10.1145/3514498

Muftah, M. (2023). Data-driven learning (DDL) activities: Do they truly promote EFL students’ writing skills development? Education and Information Technologies, 28(10), 13179–13205. https://doi.org/10.1007/s10639-023-11620-z

Nugraha, D. S. (2021). Morphosemantic features of derivational affix {Me(N)-} in the Indonesian denumeral verb constructions. Sirok Bastra, 9(2). https://doi.org/10.37671/sb.v9i2.317

___________. (2024a). Analyzing prefix /me(N)-/ in the Indonesian affixation: A corpus-based morphology. Theory and Practice in Language Studies, 14(6), 1697–1711. https://doi.org/10.17507/tpls.1406.10

___________. (2024b). A morphological analysis of the Indonesian suffixation: A look at the different types of affixes and their semantic changes. GEMA Online® Journal of Language Studies, 24(4), 109–132. https://doi.org/10.17576/gema-2024-2404-07

___________. (2024c). Navigating challenges and opportunities: Incorporating multimodal analysis into corpus linguistics for social media research. In Corpora for Language Learning: Bridging the Research-Practice Divide.

___________. (2024d). Quantitative analysis within language studies: An analytical views based on the bibliometrics method. Script Journal: Journal of Linguistics and English Teaching, 9(2), 16–34. https://doi.org/10.24903/sj.v9i2.1695

___________. (2025). Complex word formation in contemporary syntactic frameworks: Scientometric investigation and its relevance to grammar pedagogy. JOALL (Journal of Applied Linguistics and Literature), 10(1), 283–315. https://doi.org/10.33369/joall.v10i1.40034

Nugraha, D. S., Widharyanto, W., Setyaningsih, Y., & Rahardi, R. K. (2025). Linguistik edukasional: Telaah masalah pendidikan bahasa. Sanata Dharma University Press.

Pawlak, M., & Kruk, M. (2022). Individual differences in computer assisted language learning research. In Individual differences in Computer Assisted Language Learning Research. https://doi.org/10.4324/9781003240051

Pérez-Paredes, P. (2022). How learners use corpora. In The Routledge Handbook of Corpora and English Language Teaching and Learning. https://doi.org/10.4324/9781003002901-31

Saeed, A., Nawab, R. M. A., & Stevenson, M. (2022). Investigating the feasibility of deep learning methods for urdu word sense disambiguation. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(2). https://doi.org/10.1145/3477578

Sooryamoorthy, R. (2020). Scientometrics for the humanities and social sciences. Routledge. https://doi.org/10.4324/9781003110415

Sun, X., & Hu, G. (2023). Direct and indirect data-driven learning: An experimental study of hedging in an EFL writing class. Language Teaching Research, 27(3), 660–688. https://doi.org/10.1177/1362168820954459

van Eck, N. J., & Waltman, L. (2023). VOSviewer (1.6.20). Universiteit Leiden.

Waltman, L., & van Eck, N. J. (2019). Field normalization of scientometric indicators. https://doi.org/10.1007/978-3-030-02511-3_11

Wicher, O. (2019). Data-driven learning in the secondary classroom: A critical evaluation from the perspective of foreign language didactics. In Data-Driven Learning for the Next Generation: Corpora and DDL for Pre-tertiary Learners. https://doi.org/10.4324/9780429425899-3

Yao, G. (2019). Vocabulary learning through data-driven learning in the context of Spanish as a foreign language. Research in Corpus Linguistics, 7, 18–46. https://doi.org/10.32714/ricl.07.02

Yu, X., & Altunel, V. (2023). Data-driven learning for foreign and second language education in diverse contexts. In New Approaches to the Investigation of Language Teaching and Literature. https://doi.org/10.4018/978-1-6684-6020-7.ch005

Zare, J., & Delavar, K. A. (2024). Enhancing English learning materials with data-driven learning: A mixed-methods study of task motivation. Journal of Multilingual and Multicultural Development, 45(9), 4011–4027. https://doi.org/10.1080/01434632.2022.2134881

Downloads

Published

2025-04-10

Issue

Section

Articles