arXiv Open Access 2026

Accelerating OpenPangu Inference on NPU via Speculative Decoding

Yuntao Dai Jing Wu Hang Gu Teng Wang
Lihat Sumber

Abstrak

To mitigate the Memory Wall bottleneck encountered by Large Language Models (LLMs) during inference on \textbf{NPU} hardware, and addressing the scarcity of native support for mainstream speculative decoding algorithms on domestic infrastructure, this study presents an end-to-end speculative inference acceleration scheme for OpenPangu-7B.

Topik & Kata Kunci

Penulis (4)

Y

Yuntao Dai

J

Jing Wu

H

Hang Gu

T

Teng Wang

Format Sitasi

Dai, Y., Wu, J., Gu, H., Wang, T. (2026). Accelerating OpenPangu Inference on NPU via Speculative Decoding. https://arxiv.org/abs/2603.03383

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓