Semantic Scholar Open Access 2025

Use of Pre-Trained Multilingual Models for Karelian Speech Recognition

I. Kipyatkova I. Kagirov Mikhail Dolgushin

Abstrak

This paper presents an experimental study aimed at solving the problem of training speech recognition models under conditions of limited available speech and text data. Current approaches to this issue are discussed in detail, particularly the use of pre-trained multilingual models and data augmentation techniques. As part of this study, multilingual models based on Wav2Vec and Whisper were adapted to the Livvi dialect of the Karelian language, and an investigation into the use of an external language model to enhance recognition accuracy was conducted. The paper also describes a specially collected and prepared speech database and a basic recognition system developed using the Kaldi toolkit. Quantitative test results are provided as well, demonstrating the effectiveness of the chosen methods. For instance, Transformer-based models, particularly Wav2Vec, outperformed the baseline models trained using Kaldi software tools. Fine-tuning the Wav2Vec models reduced the word error rate to 24.73% on the validation set and 25.25% on the test set, while a combination of the Wav2Vec-BERT 2.0-based model with an external language model further reduced errors to 17.12% and 17.72%, respectively. This paper is primarily aimed at specialists in the field of automatic speech recognition for low-resource and Balto-Finnic languages. Additionally, the results of this work can be practically applied in field research involving Karelian text transcription. Future work includes expanding the database to improve model adaptation and enhance performance in real-world scenarios.

Penulis (3)

I

I. Kipyatkova

I

I. Kagirov

M

Mikhail Dolgushin

Format Sitasi

Kipyatkova, I., Kagirov, I., Dolgushin, M. (2025). Use of Pre-Trained Multilingual Models for Karelian Speech Recognition. https://doi.org/10.15622/ia.24.2.9

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.15622/ia.24.2.9
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
Semantic Scholar
DOI
10.15622/ia.24.2.9
Akses
Open Access ✓