arXiv Open Access 2021

Semi-Supervised Music Tagging Transformer

Minz Won Keunwoo Choi Xavier Serra

Lihat Sumber

Abstrak

We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-art music tagging models that are based on convolutional neural networks under a supervised scheme. The Music Tagging Transformer is further improved by noisy student training, a semi-supervised approach that leverages both labeled and unlabeled data combined with data augmentation. To our best knowledge, this is the first attempt to utilize the entire audio of the million song dataset.

Topik & Kata Kunci

cs.SD eess.AS

Penulis (3)

Minz Won

Keunwoo Choi

Xavier Serra

Format Sitasi

APA MLA BibTeX

Won, M., Choi, K., Serra, X. (2021). Semi-Supervised Music Tagging Transformer. https://arxiv.org/abs/2111.13457

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓