Semantic Scholar Open Access 2023 87 sitasi

VampNet: Music Generation via Masked Acoustic Token Modeling

H. F. García Prem Seetharaman Rithesh Kumar Bryan Pardo

Lihat Sumber DOI

Abstrak

We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.

Topik & Kata Kunci

Computer Science Engineering

Penulis (4)

H. F. García

Prem Seetharaman

Rithesh Kumar

Bryan Pardo

Format Sitasi

APA MLA BibTeX

García, H.F., Seetharaman, P., Kumar, R., Pardo, B. (2023). VampNet: Music Generation via Masked Acoustic Token Modeling. https://doi.org/10.48550/arXiv.2307.04686

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2307.04686

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 87×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2307.04686
Akses: Open Access ✓