Keelekorpus kui leksikograafi abiline kõnekeelsuse tuvastamisel
Abstrak
Using corpus data to support lexicographers in identifying informal language This study examines how new corpus analysis tools can assist lexicographers in determining whether to assign a word an informal register label in a dictionary. Labelling words in dictionaries is necessary for language users seeking register information. Moreover, there have been calls for the upcoming Dictionary of Standard Estonian (DSE, 2025) to clearly distinguish standard language from other linguistic varieties. Informal language was chosen for analysis because it is more difficult to define than other marked registers. In DSE 2018, some words were labelled as informal based on language planning decisions rather than empirical analysis. As register labels should be data-driven and based on corpus evidence, a systematic review of these words is necessary for the revised edition. Our study investigates how corpus genre data can support lexicographers in deciding whether to add or remove the informal label. We found that corpus data provided useful insights in 82.1% of cases. Based on our experiment, we developed a guideline to assist in labelling word meanings as informal. Namely, if a word occurs in blogs and forums in 36% or more of its total corpus occurrences, it may be considered as tending towards informal usage. This guideline is not a rigid rule but a supportive tool, as additional factors should be considered based on the lexicographer’s linguistic expertise. Users value reliable linguistic information in dictionaries. Our proposed guideline helps lexicographers make more systematic decisions while maintaining expert judgment as the ultimate determinant.
Topik & Kata Kunci
Penulis (7)
Lydia Risberg
Maria Tuulik
Margit Langemets
Kristina Koppel
Ene Vainik
Esta Prangel
Eleri Aedmaa
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.54013/kk811a3
- Akses
- Open Access ✓