arXiv Open Access 2025

TF-MLPNet: Tiny Real-Time Neural Speech Separation

Malek Itani Tuochao Chen Shyamnath Gollakota

Lihat Sumber

Abstrak

Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks in real-time on the GAP9 processor, achieving a 3.5-4x runtime reduction compared to prior speech separation models.

Topik & Kata Kunci

cs.SD cs.LG eess.AS

Penulis (3)

Malek Itani

Tuochao Chen

Shyamnath Gollakota

Format Sitasi

APA MLA BibTeX

Itani, M., Chen, T., Gollakota, S. (2025). TF-MLPNet: Tiny Real-Time Neural Speech Separation. https://arxiv.org/abs/2508.03047

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓