arXiv Open Access 2023

A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds

Tanmay Khandelwal Rohan Kumar Das
Lihat Sumber

Abstrak

Sound event detection (SED) entails identifying the type of sound and estimating its temporal boundaries from acoustic signals. These events are uniquely characterized by their spatio-temporal features, which are determined by the way they are produced. In this study, we leverage some distinctive high-level acoustic characteristics of various sound events to assist the SED model training, without requiring additional labeled data. Specifically, we use the DCASE Task 4 2022 dataset and categorize the 10 classes into four subcategories based on their high-level acoustic characteristics. We then introduce a novel multi-task learning framework that jointly trains the SED and high-level acoustic characteristics classification tasks, using shared layers and weighted loss. Our method significantly improves the performance of the SED system, achieving a 36.3% improvement in terms of the polyphonic sound event detection score compared to the baseline on the DCASE 2022 Task 4 validation set.

Topik & Kata Kunci

Penulis (2)

T

Tanmay Khandelwal

R

Rohan Kumar Das

Format Sitasi

Khandelwal, T., Das, R.K. (2023). A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds. https://arxiv.org/abs/2305.10729

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓