arXiv Open Access 2023

Multi-View Spectrogram Transformer for Respiratory Sound Classification

Wentao He Yuchen Yan Jianfeng Ren Ruibin Bai Xudong Jiang

Lihat Sumber

Abstrak

Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.

Topik & Kata Kunci

cs.SD cs.CV eess.AS

Penulis (5)

Wentao He

Yuchen Yan

Jianfeng Ren

Ruibin Bai

Xudong Jiang

Format Sitasi

APA MLA BibTeX

He, W., Yan, Y., Ren, J., Bai, R., Jiang, X. (2023). Multi-View Spectrogram Transformer for Respiratory Sound Classification. https://arxiv.org/abs/2311.09655

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓