Semantic Scholar Open Access 2025 121 sitasi

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

Andros Tjandra Yi-Chiao Wu Baishan Guo John Hoffman Brian Ellis +8 lainnya

Abstrak

The quantification of audio aesthetics remains a complex challenge in audio processing, primarily due to its subjective nature, which is influenced by human perception and cultural context. Traditional methods often depend on human listeners for evaluation, leading to inconsistencies and high resource demands. This paper addresses the growing need for automated systems capable of predicting audio aesthetics without human intervention. Such systems are crucial for applications like data filtering, pseudo-labeling large datasets, and evaluating generative audio models, especially as these models become more sophisticated. In this work, we introduce a novel approach to audio aesthetic evaluation by proposing new annotation guidelines that decompose human listening perspectives into four distinct axes. We develop and train no-reference, per-item prediction models that offer a more nuanced assessment of audio quality. Our models are evaluated against human mean opinion scores (MOS) and existing methods, demonstrating comparable or superior performance. This research not only advances the field of audio aesthetics but also provides open-source models and datasets to facilitate future work and benchmarking. We release our code and pre-trained model at: https://github.com/facebookresearch/audiobox-aesthetics

Penulis (13)

A

Andros Tjandra

Y

Yi-Chiao Wu

B

Baishan Guo

J

John Hoffman

B

Brian Ellis

A

Apoorv Vyas

B

Bowen Shi

S

Sanyuan Chen

M

Matt Le

N

N. Zacharov

C

Carleigh Wood

A

Ann Lee

W

Wei-Ning Hsu

Format Sitasi

Tjandra, A., Wu, Y., Guo, B., Hoffman, J., Ellis, B., Vyas, A. et al. (2025). Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound. https://doi.org/10.48550/arXiv.2502.05139

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.48550/arXiv.2502.05139
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
121×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2502.05139
Akses
Open Access ✓