arXiv Open Access 2023

Multi-View Spectrogram Transformer for Respiratory Sound Classification

Wentao He Yuchen Yan Jianfeng Ren Ruibin Bai Xudong Jiang
Lihat Sumber

Abstrak

Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.

Topik & Kata Kunci

Penulis (5)

W

Wentao He

Y

Yuchen Yan

J

Jianfeng Ren

R

Ruibin Bai

X

Xudong Jiang

Format Sitasi

He, W., Yan, Y., Ren, J., Bai, R., Jiang, X. (2023). Multi-View Spectrogram Transformer for Respiratory Sound Classification. https://arxiv.org/abs/2311.09655

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓