Semantic Scholar Open Access 2022 66 sitasi

Visual Acoustic Matching

Changan Chen Ruohan Gao P. Calamia K. Grauman

Lihat Sumber DOI

Abstrak

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials. To address this novel task, we propose a cross-modal transformer model that uses audio-visual attention to inject visual properties into the audio and generate realistic audio output. In addition, we devise a self-supervised training objective that can learn acoustic matching from in-the-wild Web videos, despite their lack of acoustically mismatched audio. We demonstrate that our approach successfully translates human speech to a variety of real-world environments depicted in images, outperforming both traditional acoustic matching and more heavily supervised baselines.

Topik & Kata Kunci

Computer Science Engineering

Penulis (4)

Changan Chen

Ruohan Gao

P. Calamia

K. Grauman

Format Sitasi

APA MLA BibTeX

Chen, C., Gao, R., Calamia, P., Grauman, K. (2022). Visual Acoustic Matching. https://doi.org/10.1109/CVPR52688.2022.01829

Akses Cepat

Lihat di Sumber doi.org/10.1109/CVPR52688.2022.01829

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Total Sitasi: 66×
Sumber Database: Semantic Scholar
DOI: 10.1109/CVPR52688.2022.01829
Akses: Open Access ✓