Semantic Scholar Open Access 2024 102 sitasi

Long-form music generation with latent diffusion

Zach Evans Julian Parker CJ Carr Zack Zukowski Josiah Taylor +1 lainnya

Abstrak

Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure from text prompts. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.

Topik & Kata Kunci

Computer Science Engineering

Penulis (6)

Zach Evans

Julian Parker

CJ Carr

Zack Zukowski

Josiah Taylor

Jordi Pons

Format Sitasi

APA MLA BibTeX

Evans, Z., Parker, J., Carr, C., Zukowski, Z., Taylor, J., Pons, J. (2024). Long-form music generation with latent diffusion. https://doi.org/10.48550/arXiv.2404.10301

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.48550/arXiv.2404.10301

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Total Sitasi: 102×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2404.10301
Akses: Open Access ✓