Semantic Scholar Open Access 2022 6 sitasi

CLD² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages

R. Zariquiey Arturo Oncevay Javier Vera

Abstrak

Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are underutilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD²). CLD² calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization.

Penulis (3)

R

R. Zariquiey

A

Arturo Oncevay

J

Javier Vera

Format Sitasi

Zariquiey, R., Oncevay, A., Vera, J. (2022). CLD² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages. https://doi.org/10.18653/v1/2022.computel-1.4

Akses Cepat

Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.18653/v1/2022.computel-1.4
Akses
Open Access ✓