arXiv Open Access 2025

MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language Models

Zhihao Ouyang Ju-Chiang Wang Daiyu Zhang Bin Chen Shangjie Li +1 lainnya

Lihat Sumber

Abstrak

Question-answering (QA) is a natural approach for humans to understand a piece of music audio. However, for machines, accessing a large-scale dataset covering diverse aspects of music is crucial, yet challenging, due to the scarcity of publicly available music data of this type. This paper introduces MQAD, a music QA dataset built on the Million Song Dataset (MSD), encompassing a rich array of musical features, including beat, chord, key, structure, instrument, and genre -- across 270,000 tracks, featuring nearly 3 million diverse questions and captions. MQAD distinguishes itself by offering detailed time-varying musical information such as chords and sections, enabling exploration into the inherent structure of music within a song. To compile MQAD, our methodology leverages specialized Music Information Retrieval (MIR) models to extract higher-level musical features and Large Language Models (LLMs) to generate natural language QA pairs. Then, we leverage a multimodal LLM that integrates the LLaMA2 and Whisper architectures, along with novel subjective metrics to assess the performance of MQAD. In experiments, our model trained on MQAD demonstrates advancements over conventional music audio captioning approaches. The dataset and code are available at https://github.com/oyzh888/MQAD.

Topik & Kata Kunci

cs.SD

Penulis (6)

Zhihao Ouyang

Ju-Chiang Wang

Daiyu Zhang

Bin Chen

Shangjie Li

Quan Lin

Format Sitasi

APA MLA BibTeX

Ouyang, Z., Wang, J., Zhang, D., Chen, B., Li, S., Lin, Q. (2025). MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language Models. https://arxiv.org/abs/2508.19514

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓