Semantic Scholar Open Access 2022 176 sitasi

MuLan: A Joint Embedding of Music Audio and Natural Language

Qingqing Huang A. Jansen Joonseok Lee R. Ganti Judith Yue Li +1 lainnya

Abstrak

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.

Penulis (6)

Q

Qingqing Huang

A

A. Jansen

J

Joonseok Lee

R

R. Ganti

J

Judith Yue Li

D

D. Ellis

Format Sitasi

Huang, Q., Jansen, A., Lee, J., Ganti, R., Li, J.Y., Ellis, D. (2022). MuLan: A Joint Embedding of Music Audio and Natural Language. https://doi.org/10.48550/arXiv.2208.12415

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2208.12415
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Total Sitasi
176×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2208.12415
Akses
Open Access ✓