A Multidimensional Approach to Linguistic Variation in Short Turkish Texts
Abstrak
This study investigates linguistic variation in Turkish using a large-scale social media corpus consisting of neutral, offensive, and hate speech tweets. Drawing on a dataset annotated for parts of speech and grammatical structures, the study identifies the main dimensions of linguistic variation through the framework of Multidimensional Analysis (MDA), using Multiple Correspondence Analysis (MCA). The paper presents the use of MCA method in Turkish, which fills a notable gap in Turkish linguistic analysis due to its suitability for short and contextually limited texts such as those found on social media. The analysis is conducted using the FactoMineR package in R, along with the widely used visualization tool ggplot2. This practical guide helps interpret the dimensions generated by MDA and demonstrates how results can be presented through different data visualization techniques. Additionally, the study presents temporal shifts in linguistic patterns using time-stamped and category-labeled data, presented through various plots and heatmaps. The article is intended as a practical resource for researchers applying MDA to short-text corpora, and for those interested in the use of data visualization in linguistic analysis.
Topik & Kata Kunci
Penulis (1)
Hülya MISIR
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.18492/dad.1675004
- Akses
- Open Access ✓