SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
Abstrak
Low-latency symbolic music generation is essential for real-time improvisation and human-AI co-creation. Existing transformer-based models, however, face a trade-off between inference speed and musical quality. Traditional acceleration techniques such as embedding pooling significantly degrade quality, while recently proposed Byte Pair Encoding (BPE) methods - though effective on single-track piano data - suffer large performance drops in multi-track settings, as revealed by our analysis. We propose Attribute-Specialized Key-Value Head Sharing (AS-KVHS), adapted to music's structured symbolic representation, achieving about 30% inference speedup with only a negligible (about 0.4%) quality drop in objective evaluations and slight improvements in subjective listening tests. Our main contributions are (1) the first systematic study of BPE's generalizability in multi-track symbolic music, and (2) the introduction of AS-KVHS for low-latency symbolic music generation. Beyond these, we also release SAGE-Music, an open-source benchmark that matches or surpasses state-of-the-art models in generation quality.
Penulis (13)
Jiaye Tan
Haonan Luo
Linfeng Song
Shuaiqi Chen
Yishan Lyu
Zian Zhong
Roujia Wang
Daniel Jiang
Haoran Zhang
Jiaming Bai
Haoran Cheng
Q. Vera Liao
Hao-Wen Dong
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓