arXiv Open Access 2023

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Zihao Deng Yinghao Ma Yudong Liu Rongchen Guo Ge Zhang +3 lainnya

Lihat Sumber

Abstrak

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.

Topik & Kata Kunci

eess.AS cs.AI cs.CL cs.MM cs.SD

Penulis (8)

Zihao Deng

Yinghao Ma

Yudong Liu

Rongchen Guo

Ge Zhang

Wenhu Chen

Wenhao Huang

Emmanouil Benetos

Format Sitasi

APA MLA BibTeX

Deng, Z., Ma, Y., Liu, Y., Guo, R., Zhang, G., Chen, W. et al. (2023). MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response. https://arxiv.org/abs/2309.08730

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓