arXiv Open Access 2026

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

Shenggui Li Chao Wang Yikai Zhu Yubo Wang Fan Yin +12 lainnya
Lihat Sumber

Abstrak

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge incorporates target-draft decoupling, hybrid parallelism, optimized training kernels, and integration with production-grade inference engines, enabling up to 9.9x faster EAGLE-3 training for Qwen3-235B-A22B. In addition, we release SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs. Through a systematic study of speculative decoding training recipes, SpecBundle addresses the scarcity of high-quality drafts in the community, and our draft models achieve up to 4.48x end-to-end inference speedup on SGLang, establishing SpecForge as a practical foundation for real-world speculative decoding deployment.

Topik & Kata Kunci

Penulis (17)

S

Shenggui Li

C

Chao Wang

Y

Yikai Zhu

Y

Yubo Wang

F

Fan Yin

S

Shuai Shi

Y

Yefei Chen

X

Xiaomin Dong

Q

Qiaoling Chen

J

Jin Pan

J

Ji Li

L

Laixin Xie

Y

Yineng Zhang

L

Lei Yu

Y

Yonggang Wen

I

Ivor Tsang

T

Tianwei Zhang

Format Sitasi

Li, S., Wang, C., Zhu, Y., Wang, Y., Yin, F., Shi, S. et al. (2026). SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding. https://arxiv.org/abs/2603.18567

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓