Automatic Gender Identification from Text
Abstrak
The gender identification of authors in literary texts is a compelling research area at the intersection of computational linguistics and natural language processing, offering insights into historical biases and socio-cultural dynamics while enriching our understanding of literary traditions. This study is inspired by the historical context of women adopting male pseudonyms to navigate a male-dominated literary domain. By leveraging machine learning and state-of-the-art language models, we investigate the feasibility and accuracy of inferring an author’s gender from their writings. Our key contributions include (1) the creation of a large-scale, diverse dataset of literary texts spanning various literary epochs and (2) the evaluation of multiple classification models. Our experiments reveal that the best-performing model achieves an accuracy above <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>90</mn><mo>%</mo></mrow></semantics></math></inline-formula>, highlighting the potential of computational methods to uncover stylistic and linguistic markers tied to gender. These findings open avenues for further research into stylistic and linguistic patterns across literary history and their relationship to authorial identity.
Topik & Kata Kunci
Penulis (3)
Vladimir Younkin
Marina Litvak
Irina Rabaev
Akses Cepat
- Tahun Terbit
- 2024
- Sumber Database
- DOAJ
- DOI
- 10.3390/app142412041
- Akses
- Open Access ✓