arXiv Open Access 2025

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models

Qiongqiong Wang Hardik B. Sailor Jeremy H. M. Wong Tianchi Liu Shuo Sun +4 lainnya

Lihat Sumber

Abstrak

Current large speech language models (Speech-LLMs) often exhibit limitations in empathetic reasoning, primarily due to the absence of training datasets that integrate both contextual content and paralinguistic cues. In this work, we propose two approaches to incorporate contextual paralinguistic information into model training: (1) an explicit method that provides paralinguistic metadata (e.g., emotion annotations) directly to the LLM, and (2) an implicit method that automatically generates novel training question-answer (QA) pairs using both categorical and dimensional emotion annotations alongside speech transcriptions. Our implicit method boosts performance (LLM-judged) by 38.41% on a human-annotated QA benchmark, reaching 46.02% when combined with the explicit approach, showing effectiveness in contextual paralinguistic understanding. We also validate the LLM judge by demonstrating its correlation with classification metrics, providing support for its reliability.

Topik & Kata Kunci

cs.CL cs.AI eess.AS

Penulis (9)

Qiongqiong Wang

Hardik B. Sailor

Jeremy H. M. Wong

Tianchi Liu

Shuo Sun

Wenyu Zhang

Muhammad Huzaifah

Nancy Chen

Ai Ti Aw

Format Sitasi

APA MLA BibTeX

Wang, Q., Sailor, H.B., Wong, J.H.M., Liu, T., Sun, S., Zhang, W. et al. (2025). Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models. https://arxiv.org/abs/2508.07273

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓