arXiv Open Access 2024

Music Grounding by Short Video

Zijie Xin Minquan Wang Jingyu Liu Ye Ma Quan Chen +2 lainnya
Lihat Sumber

Abstrak

Adding proper background music helps complete a short video to be shared. Previous work tackles the task by video-to-music retrieval (V2MR), aiming to find the most suitable music track from a collection to match the content of a given query video. In practice, however, music tracks are typically much longer than the query video, necessitating (manual) trimming of the retrieved music to a shorter segment that matches the video duration. In order to bridge the gap between the practical need for music moment localization and V2MR, we propose a new task termed Music Grounding by Short Video (MGSV). To tackle the new task, we introduce a new benchmark, MGSV-EC, which comprises a diverse set of 53k short videos associated with 35k different music moments from 4k unique music tracks. Furthermore, we develop a new baseline method, MaDe, which performs both video-to-music matching and music moment detection within a unified end-to-end deep network. Extensive experiments on MGSV-EC not only highlight the challenging nature of MGSV but also set MaDe as a strong baseline.

Topik & Kata Kunci

Penulis (7)

Z

Zijie Xin

M

Minquan Wang

J

Jingyu Liu

Y

Ye Ma

Q

Quan Chen

P

Peng Jiang

X

Xirong Li

Format Sitasi

Xin, Z., Wang, M., Liu, J., Ma, Y., Chen, Q., Jiang, P. et al. (2024). Music Grounding by Short Video. https://arxiv.org/abs/2408.16990

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓