arXiv Open Access 2024

Dynamic Depth Decoding: Faster Speculative Decoding for LLMs

Oscar Brown Zhengjie Wang Andrea Do Nikhil Mathew Cheng Yu

Lihat Sumber

Abstrak

The acceleration of Large Language Models (LLMs) with speculative decoding provides a significant runtime improvement without any loss of accuracy. Currently, EAGLE-2 is the state-of-the-art speculative decoding method, improving on EAGLE with a dynamic draft tree. We introduce Dynamic Depth Decoding (DDD), which optimises EAGLE-2's tree drafting method using a dynamic depth. This extends the average speedup that EAGLE-2 achieves over EAGLE by $44\%$, giving DDD an average speedup of $3.16$x.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (5)

Oscar Brown

Zhengjie Wang

Andrea Do

Nikhil Mathew

Cheng Yu

Format Sitasi

APA MLA BibTeX

Brown, O., Wang, Z., Do, A., Mathew, N., Yu, C. (2024). Dynamic Depth Decoding: Faster Speculative Decoding for LLMs. https://arxiv.org/abs/2409.00142

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓