arXiv Open Access 2023

Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features

Théo Mariotte Anthony Larcher Silvio Montrésor Jean-Hugh Thomas
Lihat Sumber

Abstrak

Speaker diarization is the task of answering Who spoke and when? in an audio stream. Pipeline systems rely on speech segmentation to extract speakers' segments and achieve robust speaker diarization. This paper proposes a common framework to solve three segmentation tasks in the distant speech scenario: Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), and Speaker Change Detection (SCD). In the literature, a few studies investigate the multi-microphone distant speech scenario. In this work, we propose a new set of spatial features based on direction-of-arrival estimations in the circular harmonic domain (CH-DOA). These spatial features are extracted from multi-microphone audio data and combined with standard acoustic features. Experiments on the AMI meeting corpus show that CH-DOA can improve the segmentation while being robust in the case of deactivated microphones.

Topik & Kata Kunci

Penulis (4)

T

Théo Mariotte

A

Anthony Larcher

S

Silvio Montrésor

J

Jean-Hugh Thomas

Format Sitasi

Mariotte, T., Larcher, A., Montrésor, S., Thomas, J. (2023). Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features. https://arxiv.org/abs/2306.04268

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓