arXiv Open Access 2025

Speculative Decoding for Multi-Sample Inference

Yiwei Li Jiayi Shi Shaoxiong Feng Peiwen Yuan Xinglin Wang +6 lainnya
Lihat Sumber

Abstrak

We propose a novel speculative decoding method tailored for multi-sample reasoning scenarios, such as self-consistency and Best-of-N sampling. Our method exploits the intrinsic consensus of parallel generation paths to synthesize high-quality draft tokens without requiring auxiliary models or external databases. By dynamically analyzing structural patterns across parallel reasoning paths through a probabilistic aggregation mechanism, it identifies consensus token sequences that align with the decoding distribution. Evaluations on mathematical reasoning benchmarks demonstrate a substantial improvement in draft acceptance rates over baselines, while reducing the latency in draft token construction. This work establishes a paradigm shift for efficient multi-sample inference, enabling seamless integration of speculative decoding with sampling-based reasoning techniques.

Topik & Kata Kunci

Penulis (11)

Y

Yiwei Li

J

Jiayi Shi

S

Shaoxiong Feng

P

Peiwen Yuan

X

Xinglin Wang

Y

Yueqi Zhang

J

Ji Zhang

C

Chuyi Tan

B

Boyuan Pan

Y

Yao Hu

K

Kan Li

Format Sitasi

Li, Y., Shi, J., Feng, S., Yuan, P., Wang, X., Zhang, Y. et al. (2025). Speculative Decoding for Multi-Sample Inference. https://arxiv.org/abs/2503.05330

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓