arXiv Open Access 2025

S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models

Tao He Guang Huang Yu Yang Tianshi Xu Sicheng Zhao +3 lainnya

Lihat Sumber

Abstrak

Large language models (LLMs) exhibit remarkable reasoning capabilities across diverse downstream tasks. However, their autoregressive nature leads to substantial inference latency, posing challenges for real-time applications. Speculative sampling mitigates this issue by introducing a drafting phase followed by a parallel validation phase, enabling faster token generation and verification. Existing approaches, however, overlook the inherent coherence in text generation, limiting their efficiency. To address this gap, we propose a Speculative Sampling with Syntactic and Semantic Coherence (S$^4$C) framework, which extends speculative sampling by leveraging multi-head drafting for rapid token generation and a continuous verification tree for efficient candidate validation and feature reuse. Experimental results demonstrate that S$^4$C surpasses baseline methods across mainstream tasks, offering enhanced efficiency, parallelism, and the ability to generate more valid tokens with fewer computational resources. On Spec-bench benchmarks, S$^4$C achieves an acceleration ratio of 2.26x-2.60x, outperforming state-of-the-art methods.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (8)

Tao He

Guang Huang

Yu Yang

Tianshi Xu

Sicheng Zhao

Guiguang Ding

Pengyang Wang

Feng Tian

Format Sitasi

APA MLA BibTeX

He, T., Huang, G., Yang, Y., Xu, T., Zhao, S., Ding, G. et al. (2025). S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models. https://arxiv.org/abs/2506.14158

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓