Semantic Scholar Open Access 2022 176 sitasi

MuLan: A Joint Embedding of Music Audio and Natural Language

Qingqing Huang A. Jansen Joonseok Lee R. Ganti Judith Yue Li +1 lainnya

Lihat Sumber DOI

Abstrak

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.

Topik & Kata Kunci

Computer Science Engineering Mathematics

Penulis (6)

Qingqing Huang

A. Jansen

Joonseok Lee

R. Ganti

Judith Yue Li

D. Ellis

Format Sitasi

APA MLA BibTeX

Huang, Q., Jansen, A., Lee, J., Ganti, R., Li, J.Y., Ellis, D. (2022). MuLan: A Joint Embedding of Music Audio and Natural Language. https://doi.org/10.48550/arXiv.2208.12415

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2208.12415

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Total Sitasi: 176×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2208.12415
Akses: Open Access ✓