Semantic Scholar Open Access 2019 31 sitasi

Digitising Swiss German: how to process and study a polycentric spoken language

Yves Scherrer T. Samardžić Elvira Glaser

Abstrak

Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety and that it is subject to considerable regional variation. This paper presents the ArchiMob corpus, a freely available general-purpose corpus of spoken Swiss German based on oral history interviews. The corpus is a result of a long design process, intensive manual work and specially adapted computational processing. We first present the modalities of access of the corpus for linguistic, historic and computational research. We then describe how the documents were transcribed, segmented and aligned with the sound source. This work involved a series of experiments that have led to automatically annotated normalisation and part-of-speech tagging layers. Finally, we present several case studies to motivate the use of the corpus for digital humanities in general and for dialectology in particular.

Topik & Kata Kunci

Penulis (3)

Y

Yves Scherrer

T

T. Samardžić

E

Elvira Glaser

Format Sitasi

Scherrer, Y., Samardžić, T., Glaser, E. (2019). Digitising Swiss German: how to process and study a polycentric spoken language. https://doi.org/10.1007/s10579-019-09457-5

Akses Cepat

Lihat di Sumber doi.org/10.1007/s10579-019-09457-5
Informasi Jurnal
Tahun Terbit
2019
Bahasa
en
Total Sitasi
31×
Sumber Database
Semantic Scholar
DOI
10.1007/s10579-019-09457-5
Akses
Open Access ✓