DOAJ Open Access 2024

Automatic Gender Identification from Text

Vladimir Younkin Marina Litvak Irina Rabaev

Abstrak

The gender identification of authors in literary texts is a compelling research area at the intersection of computational linguistics and natural language processing, offering insights into historical biases and socio-cultural dynamics while enriching our understanding of literary traditions. This study is inspired by the historical context of women adopting male pseudonyms to navigate a male-dominated literary domain. By leveraging machine learning and state-of-the-art language models, we investigate the feasibility and accuracy of inferring an author’s gender from their writings. Our key contributions include (1) the creation of a large-scale, diverse dataset of literary texts spanning various literary epochs and (2) the evaluation of multiple classification models. Our experiments reveal that the best-performing model achieves an accuracy above <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>90</mn><mo>%</mo></mrow></semantics></math></inline-formula>, highlighting the potential of computational methods to uncover stylistic and linguistic markers tied to gender. These findings open avenues for further research into stylistic and linguistic patterns across literary history and their relationship to authorial identity.

Penulis (3)

V

Vladimir Younkin

M

Marina Litvak

I

Irina Rabaev

Format Sitasi

Younkin, V., Litvak, M., Rabaev, I. (2024). Automatic Gender Identification from Text. https://doi.org/10.3390/app142412041

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.3390/app142412041
Informasi Jurnal
Tahun Terbit
2024
Sumber Database
DOAJ
DOI
10.3390/app142412041
Akses
Open Access ✓