arXiv Open Access 2025

Flexible and Efficient Grammar-Constrained Decoding

Kanghee Park Timothy Zhou Loris D'Antoni

Lihat Sumber

Abstrak

Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by masking out tokens that will provably lead to outputs that do not belong to a specified context-free grammar (CFG). To guarantee soundness, GCD algorithms have to compute how a given LLM subword tokenizer can align with the tokens used by a given context-free grammar and compute token masks based on this information. Doing so efficiently is challenging and existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches while preserving state-of-the-art efficiency in online mask computation.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (3)

Kanghee Park

Timothy Zhou

Loris D'Antoni

Format Sitasi

APA MLA BibTeX

Park, K., Zhou, T., D'Antoni, L. (2025). Flexible and Efficient Grammar-Constrained Decoding. https://arxiv.org/abs/2502.05111

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓