arXiv Open Access 2024

Efficient Adaptation of Multilingual Models for Japanese ASR

Mark Bajo Haruka Fukukawa Ryuji Morita Yuma Ogasawara
Lihat Sumber

Abstrak

This study explores fine-tuning multilingual ASR (Automatic Speech Recognition) models, specifically OpenAI's Whisper-Tiny, to improve performance in Japanese. While multilingual models like Whisper offer versatility, they often lack precision in specific languages. Conversely, monolingual models like ReazonSpeech excel in language-specific tasks but are less adaptable. Using Japanese-specific datasets and Low-Rank Adaptation (LoRA) along with end-to-end (E2E) training, we fine-tuned Whisper-Tiny to bridge this gap. Our results show that fine-tuning reduced Whisper-Tiny's Character Error Rate (CER) from 32.7 to 20.8 with LoRA and to 14.7 with end-to-end fine-tuning, surpassing Whisper-Base's CER of 20.2. However, challenges with domain-specific terms remain, highlighting the need for specialized datasets. These findings demonstrate that fine-tuning multilingual models can achieve strong language-specific performance while retaining their flexibility. This approach provides a scalable solution for improving ASR in resource-constrained environments and languages with complex writing systems like Japanese.

Penulis (4)

M

Mark Bajo

H

Haruka Fukukawa

R

Ryuji Morita

Y

Yuma Ogasawara

Format Sitasi

Bajo, M., Fukukawa, H., Morita, R., Ogasawara, Y. (2024). Efficient Adaptation of Multilingual Models for Japanese ASR. https://arxiv.org/abs/2412.10705

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓