arXiv Open Access 2024

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

Shujie Hu Xurong Xie Mengzhe Geng Zengrui Jin Jiajun Deng +6 lainnya

Lihat Sumber

Abstrak

Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition. These include: a) input feature fusion between standard acoustic frontends and domain fine-tuned SSL speech representations; b) frame-level joint decoding between TDNN systems separately trained using standard acoustic features alone and those with additional domain fine-tuned SSL features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain fine-tuned pre-trained ASR models. In addition, fine-tuned SSL speech features are used in acoustic-to-articulatory (A2A) inversion to construct multi-modal ASR systems. Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; and the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets. The TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models and their features consistently outperform the standalone fine-tuned SSL pre-trained models. These systems produced statistically significant WER or CER reductions of 6.53%, 1.90%, 2.04% and 7.97% absolute (24.10%, 23.84%, 10.14% and 31.39% relative) on the four tasks respectively. Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.

Topik & Kata Kunci

eess.AS cs.AI cs.SD

Penulis (11)

Shujie Hu

Xurong Xie

Mengzhe Geng

Zengrui Jin

Jiajun Deng

Guinan Li

Yi Wang

Mingyu Cui

Tianzi Wang

Helen Meng

Xunying Liu

Format Sitasi

APA MLA BibTeX

Hu, S., Xie, X., Geng, M., Jin, Z., Deng, J., Li, G. et al. (2024). Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition. https://arxiv.org/abs/2407.13782

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓