arXiv Open Access 2024

Seal: Advancing Speech Language Models to be Few-Shot Learners

Shuyu Lei Lingen Liu Jiaolong Yang Yasen Jiao Yuxiang Yang +2 lainnya

Lihat Sumber

Abstrak

Existing auto-regressive language models have demonstrated a remarkable capability to perform a new task with just a few examples in prompt, without requiring any additional training. In order to extend this capability to a multi-modal setting (i.e. speech and language), this paper introduces the Seal model, an abbreviation for speech language model. It incorporates a novel alignment method, in which Kullback-Leibler divergence loss is performed to train a projector that bridges a frozen speech encoder with a frozen language model decoder. The resulting Seal model exhibits robust performance as a few-shot learner on two speech understanding tasks. Additionally, consistency experiments are conducted to validate its robustness on different pre-trained language models.

Topik & Kata Kunci

cs.CL

Penulis (7)

Shuyu Lei

Lingen Liu

Jiaolong Yang

Yasen Jiao

Yuxiang Yang

Yushu Yang

Xiang Guo

Format Sitasi

APA MLA BibTeX

Lei, S., Liu, L., Yang, J., Jiao, Y., Yang, Y., Yang, Y. et al. (2024). Seal: Advancing Speech Language Models to be Few-Shot Learners. https://arxiv.org/abs/2407.14875

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓