arXiv Open Access 2025

S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models

Tao He Guang Huang Yu Yang Tianshi Xu Sicheng Zhao +3 lainnya
Lihat Sumber

Abstrak

Large language models (LLMs) exhibit remarkable reasoning capabilities across diverse downstream tasks. However, their autoregressive nature leads to substantial inference latency, posing challenges for real-time applications. Speculative sampling mitigates this issue by introducing a drafting phase followed by a parallel validation phase, enabling faster token generation and verification. Existing approaches, however, overlook the inherent coherence in text generation, limiting their efficiency. To address this gap, we propose a Speculative Sampling with Syntactic and Semantic Coherence (S$^4$C) framework, which extends speculative sampling by leveraging multi-head drafting for rapid token generation and a continuous verification tree for efficient candidate validation and feature reuse. Experimental results demonstrate that S$^4$C surpasses baseline methods across mainstream tasks, offering enhanced efficiency, parallelism, and the ability to generate more valid tokens with fewer computational resources. On Spec-bench benchmarks, S$^4$C achieves an acceleration ratio of 2.26x-2.60x, outperforming state-of-the-art methods.

Topik & Kata Kunci

Penulis (8)

T

Tao He

G

Guang Huang

Y

Yu Yang

T

Tianshi Xu

S

Sicheng Zhao

G

Guiguang Ding

P

Pengyang Wang

F

Feng Tian

Format Sitasi

He, T., Huang, G., Yang, Y., Xu, T., Zhao, S., Ding, G. et al. (2025). S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models. https://arxiv.org/abs/2506.14158

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓