arXiv Open Access 2025

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Jonathan Geuter Youssef Mroueh David Alvarez-Melis
Lihat Sumber

Abstrak

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-$n$ test-time scaling with a reward model $r(x,y)$ and speculative samples from a small auxiliary model $π_S(y\mid x)$. We provably approximate both the optimal tilted policy $π_{β,B}(y\mid x) \propto π_B(y\mid x)\exp(β\,r(x,y))$ of soft best-of-$n$ under the base model $π_B$, as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K), our method achieves higher accuracy than standard soft best-of-$n$ with $π_S$ and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-$n$ with $π_B$. The code is available at https://github.com/j-geuter/GSI .

Topik & Kata Kunci

Penulis (3)

J

Jonathan Geuter

Y

Youssef Mroueh

D

David Alvarez-Melis

Format Sitasi

Geuter, J., Mroueh, Y., Alvarez-Melis, D. (2025). Guided Speculative Inference for Efficient Test-Time Alignment of LLMs. https://arxiv.org/abs/2506.04118

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓