arXiv Open Access 2025

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

Jiaye Tan Haonan Luo Linfeng Song Shuaiqi Chen Yishan Lyu +8 lainnya
Lihat Sumber

Abstrak

Low-latency symbolic music generation is essential for real-time improvisation and human-AI co-creation. Existing transformer-based models, however, face a trade-off between inference speed and musical quality. Traditional acceleration techniques such as embedding pooling significantly degrade quality, while recently proposed Byte Pair Encoding (BPE) methods - though effective on single-track piano data - suffer large performance drops in multi-track settings, as revealed by our analysis. We propose Attribute-Specialized Key-Value Head Sharing (AS-KVHS), adapted to music's structured symbolic representation, achieving about 30% inference speedup with only a negligible (about 0.4%) quality drop in objective evaluations and slight improvements in subjective listening tests. Our main contributions are (1) the first systematic study of BPE's generalizability in multi-track symbolic music, and (2) the introduction of AS-KVHS for low-latency symbolic music generation. Beyond these, we also release SAGE-Music, an open-source benchmark that matches or surpasses state-of-the-art models in generation quality.

Penulis (13)

J

Jiaye Tan

H

Haonan Luo

L

Linfeng Song

S

Shuaiqi Chen

Y

Yishan Lyu

Z

Zian Zhong

R

Roujia Wang

D

Daniel Jiang

H

Haoran Zhang

J

Jiaming Bai

H

Haoran Cheng

Q

Q. Vera Liao

H

Hao-Wen Dong

Format Sitasi

Tan, J., Luo, H., Song, L., Chen, S., Lyu, Y., Zhong, Z. et al. (2025). SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing. https://arxiv.org/abs/2510.00395

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓