arXiv Open Access 2022

SG-VAD: Stochastic Gates Based Speech Activity Detection

Jonathan Svirsky Ofir Lindenbaum
Lihat Sumber

Abstrak

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

Topik & Kata Kunci

Penulis (2)

J

Jonathan Svirsky

O

Ofir Lindenbaum

Format Sitasi

Svirsky, J., Lindenbaum, O. (2022). SG-VAD: Stochastic Gates Based Speech Activity Detection. https://arxiv.org/abs/2210.16022

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓