arXiv Open Access 2025

Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

Chenyang Le Yinfeng Xia Huiyan Li Manhong Wang Yutao Sun +2 lainnya
Lihat Sumber

Abstrak

Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40\% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6$\times$ speedup over the original Whisper Medium with superior performance.

Topik & Kata Kunci

Penulis (7)

C

Chenyang Le

Y

Yinfeng Xia

H

Huiyan Li

M

Manhong Wang

Y

Yutao Sun

X

Xingyang Ma

Y

Yanmin Qian

Format Sitasi

Le, C., Xia, Y., Li, H., Wang, M., Sun, Y., Ma, X. et al. (2025). Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation. https://arxiv.org/abs/2508.11189

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓