arXiv Open Access 2024

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Yen-Ju Lu Jing Liu Thomas Thebaud Laureano Moro-Velazquez Ariya Rastrow +2 lainnya

Lihat Sumber

Abstrak

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model's capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10% relative reduction in LID errors, a 37% improvement in ASR CER on the ML-SUPERB benchmark, and a 27% decrease in SV EER on VoxCeleb-1, demonstrating its effectiveness.

Topik & Kata Kunci

eess.AS cs.CL cs.LG cs.SD

Penulis (7)

Yen-Ju Lu

Jing Liu

Thomas Thebaud

Laureano Moro-Velazquez

Ariya Rastrow

Najim Dehak

Jesus Villalba

Format Sitasi

APA MLA BibTeX

Lu, Y., Liu, J., Thebaud, T., Moro-Velazquez, L., Rastrow, A., Dehak, N. et al. (2024). CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing. https://arxiv.org/abs/2412.04425

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓