arXiv Open Access 2025

TF-MLPNet: Tiny Real-Time Neural Speech Separation

Malek Itani Tuochao Chen Shyamnath Gollakota
Lihat Sumber

Abstrak

Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks in real-time on the GAP9 processor, achieving a 3.5-4x runtime reduction compared to prior speech separation models.

Topik & Kata Kunci

Penulis (3)

M

Malek Itani

T

Tuochao Chen

S

Shyamnath Gollakota

Format Sitasi

Itani, M., Chen, T., Gollakota, S. (2025). TF-MLPNet: Tiny Real-Time Neural Speech Separation. https://arxiv.org/abs/2508.03047

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓