arXiv Open Access 2026

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Zhuohang Bian Feiyang Wu Chengrui Zhang Hangcheng Dong Yun Liang +1 lainnya
Lihat Sumber

Abstrak

Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context. This All-Gather communication pattern creates massive KV Cache redundancy, because every agent's prompt contains the same shared output blocks, yet existing reuse methods fail to exploit it efficiently. We present TokenDance, a system that scales the number of concurrent agents by exploiting the All-Gather pattern for collective KV Cache sharing. TokenDance's KV Collector performs KV Cache reuse over the full round in one collective step, so the cost of reusing a shared block is paid once regardless of agent count. Its Diff-Aware Storage encodes sibling caches as block-sparse diffs against a single master copy, achieving 11-17x compression on representative workloads. Evaluation on GenerativeAgents and AgentSociety shows that TokenDance supports up to 2.7x more concurrent agents than vLLM with prefix caching under SLO requirement, reduces per-agent KV Cache storage by up to 17.5x, and achieves up to 1.9x prefill speedup over per-request position-independent caching.

Topik & Kata Kunci

Penulis (6)

Z

Zhuohang Bian

F

Feiyang Wu

C

Chengrui Zhang

H

Hangcheng Dong

Y

Yun Liang

Y

Youwei Zhuo

Format Sitasi

Bian, Z., Wu, F., Zhang, C., Dong, H., Liang, Y., Zhuo, Y. (2026). TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing. https://arxiv.org/abs/2604.03143

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓