arXiv
Open Access
2026
Accelerating OpenPangu Inference on NPU via Speculative Decoding
Yuntao Dai
Jing Wu
Hang Gu
Teng Wang
Abstrak
To mitigate the Memory Wall bottleneck encountered by Large Language Models (LLMs) during inference on \textbf{NPU} hardware, and addressing the scarcity of native support for mainstream speculative decoding algorithms on domestic infrastructure, this study presents an end-to-end speculative inference acceleration scheme for OpenPangu-7B.
Topik & Kata Kunci
Penulis (4)
Y
Yuntao Dai
J
Jing Wu
H
Hang Gu
T
Teng Wang
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2026
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓