arXiv Open Access 2023

Turkish Native Language Identification V2

Ahmet Yavuz Uluslu Gerold Schneider
Lihat Sumber

Abstrak

This paper presents the first application of Native Language Identification (NLI) for the Turkish language. NLI is the task of automatically identifying an individual's native language (L1) based on their writing or speech in a non-native language (L2). While most NLI research has focused on L2 English, our study extends this scope to L2 Turkish by analyzing a corpus of texts written by native speakers of Albanian, Arabic and Persian. We leverage a cleaned version of the Turkish Learner Corpus and demonstrate the effectiveness of syntactic features, comparing a structural Part-of-Speech n-gram model to a hybrid model that retains function words. Our models achieve promising results, and we analyze the most predictive features to reveal L1-specific transfer effects. We make our data and code publicly available for further study.

Topik & Kata Kunci

Penulis (2)

A

Ahmet Yavuz Uluslu

G

Gerold Schneider

Format Sitasi

Uluslu, A.Y., Schneider, G. (2023). Turkish Native Language Identification V2. https://arxiv.org/abs/2307.14850

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓