arXiv Open Access 2022

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

Heming Xia Tao Ge Peiyi Wang Si-Qing Chen Furu Wei +1 lainnya

Lihat Sumber

Abstrak

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around $5\times$ speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only $1.4\times$$\sim$$2\times$ speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at https://github.com/hemingkx/SpecDec.

Topik & Kata Kunci

cs.CL cs.LG

Penulis (6)

Heming Xia

Tao Ge

Peiyi Wang

Si-Qing Chen

Furu Wei

Zhifang Sui

Format Sitasi

APA MLA BibTeX

Xia, H., Ge, T., Wang, P., Chen, S., Wei, F., Sui, Z. (2022). Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation. https://arxiv.org/abs/2203.16487

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓