arXiv Open Access 2024

When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs

Jiankun Wei Abdulrahman Abdulrazzag Tianchen Zhang Adel Muursepp Gururaj Saileshwar

Lihat Sumber

Abstrak

Deployed large language models (LLMs) often rely on speculative decoding, a technique that generates and verifies multiple candidate tokens in parallel, to improve throughput and latency. In this work, we reveal a new side-channel whereby input-dependent patterns of correct and incorrect speculations can be inferred by monitoring per-iteration token counts or packet sizes. In evaluations using research prototypes and production-grade vLLM serving frameworks, we show that an adversary monitoring these patterns can fingerprint user queries (from a set of 50 prompts) with over 75% accuracy across four speculative-decoding schemes at temperature 0.3: REST (100%), LADE (91.6%), BiLD (95.2%), and EAGLE (77.6%). Even at temperature 1.0, accuracy remains far above the 2% random baseline - REST (99.6%), LADE (61.2%), BiLD (63.6%), and EAGLE (24%). We also show the capability of the attacker to leak confidential datastore contents used for prediction at rates exceeding 25 tokens/sec. To defend against these, we propose and evaluate a suite of mitigations, including packet padding and iteration-wise token aggregation.

Topik & Kata Kunci

cs.CL cs.AI cs.CR cs.DC cs.LG

Penulis (5)

Jiankun Wei

Abdulrahman Abdulrazzag

Tianchen Zhang

Adel Muursepp

Gururaj Saileshwar

Format Sitasi

APA MLA BibTeX

Wei, J., Abdulrazzag, A., Zhang, T., Muursepp, A., Saileshwar, G. (2024). When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs. https://arxiv.org/abs/2411.01076

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓