Semantic Scholar Open Access 2022 41 sitasi

iQuery: Instruments as Queries for Audio-Visual Sound Separation

Jiaben Chen Renrui Zhang Dongze Lian Jiaqi Yang Ziyao Zeng +1 lainnya

Abstrak

Current audiovisual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck. This design confounds the learning of multimodal feature encoding with robust sound decoding for audio separation. To generalize to a new instrument, one must finetune the entire visual and audio network for all musical instruments. We re-formulate the visual-sound separation task and propose Instruments as Queries (iQuery) with a flexible query expansion mechanism. Our approach ensures cross-modal consistency and cross-instrument disentanglement. We utilize “visually named” queries to initiate the learning of audio queries and use cross-modal attention to remove potential sound source interference at the estimated waveforms. To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert additional queries as audio prompts while freezing the attention mechanism. Experimental results on three benchmarks demonstrate that our iQuery improves audiovisual sound source separation performance. Code is available at https://github.com/JiabenChen/iQuery.

Penulis (6)

J

Jiaben Chen

R

Renrui Zhang

D

Dongze Lian

J

Jiaqi Yang

Z

Ziyao Zeng

J

Jianbo Shi

Format Sitasi

Chen, J., Zhang, R., Lian, D., Yang, J., Zeng, Z., Shi, J. (2022). iQuery: Instruments as Queries for Audio-Visual Sound Separation. https://doi.org/10.1109/CVPR52729.2023.01410

Akses Cepat

Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Total Sitasi
41×
Sumber Database
Semantic Scholar
DOI
10.1109/CVPR52729.2023.01410
Akses
Open Access ✓