arXiv Open Access 2023

Turkish Native Language Identification V2

Ahmet Yavuz Uluslu Gerold Schneider

Lihat Sumber

Abstrak

This paper presents the first application of Native Language Identification (NLI) for the Turkish language. NLI is the task of automatically identifying an individual's native language (L1) based on their writing or speech in a non-native language (L2). While most NLI research has focused on L2 English, our study extends this scope to L2 Turkish by analyzing a corpus of texts written by native speakers of Albanian, Arabic and Persian. We leverage a cleaned version of the Turkish Learner Corpus and demonstrate the effectiveness of syntactic features, comparing a structural Part-of-Speech n-gram model to a hybrid model that retains function words. Our models achieve promising results, and we analyze the most predictive features to reveal L1-specific transfer effects. We make our data and code publicly available for further study.

Topik & Kata Kunci

cs.CL

Penulis (2)

Ahmet Yavuz Uluslu

Gerold Schneider

Format Sitasi

APA MLA BibTeX

Uluslu, A.Y., Schneider, G. (2023). Turkish Native Language Identification V2. https://arxiv.org/abs/2307.14850

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓