arXiv Open Access 2025

Speculative Sampling via Exponential Races

Szymon Kobus Deniz Gündüz

Lihat Sumber

Abstrak

Speculative decoding accelerates large language model inference using a smaller draft model. In this paper, we establish a surprising connection between speculative decoding and channel simulation, which aims at simulating a noisy channel using as few bits as possible. This connection allows us to provide an information-theoretic analysis of the speed up that can be achieved by speculative decoding. Leveraging this link, we derive an explicit relation between generation speed-up and the number of tokens $k$ generated by the draft model for large $k$, which serves as an upper bound for all $k$. We also propose a novel speculative decoding method via exponential race ERSD that matches state-of-the-art performance.

Topik & Kata Kunci

cs.CL cs.IT

Penulis (2)

Szymon Kobus

Deniz Gündüz

Format Sitasi

APA MLA BibTeX

Kobus, S., Gündüz, D. (2025). Speculative Sampling via Exponential Races. https://arxiv.org/abs/2504.15475

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓