arXiv Open Access 2025

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

Junyou Wang Zehua Chen Binjie Yuan Kaiwen Zheng Chang Li +2 lainnya
Lihat Sumber

Abstrak

The design of diffusion-based audio generation systems has been investigated from diverse perspectives, such as data space, network architecture, and conditioning techniques, while most of these innovations require model re-training. In sampling, classifier-free guidance (CFG) has been uniformly adopted to enhance generation quality by strengthening condition alignment. However, CFG often compromises diversity, resulting in suboptimal performance. Although the recent autoguidance (AG) method proposes another direction of guidance that maintains diversity, its direct application in audio generation has so far underperformed CFG. In this work, we introduce AudioMoG, an improved sampling method that enhances text-to-audio (T2A) and video-to-audio (V2A) generation quality without requiring extensive training resources. We start with an analysis of both CFG and AG, examining their respective advantages and limitations for guiding diffusion models. Building upon our insights, we introduce a mixture-of-guidance framework that integrates diverse guidance signals with their interaction terms (e.g., the unconditional bad version of the model) to maximize cumulative advantages. Experiments show that, given the same inference speed, our approach consistently outperforms single guidance in T2A generation across sampling steps, concurrently showing advantages in V2A, text-to-music, and image generation. Demo samples are available at: https://audiomog.github.io.

Topik & Kata Kunci

Penulis (7)

J

Junyou Wang

Z

Zehua Chen

B

Binjie Yuan

K

Kaiwen Zheng

C

Chang Li

Y

Yuxuan Jiang

J

Jun Zhu

Format Sitasi

Wang, J., Chen, Z., Yuan, B., Zheng, K., Li, C., Jiang, Y. et al. (2025). AudioMoG: Guiding Audio Generation with Mixture-of-Guidance. https://arxiv.org/abs/2509.23727

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓