arXiv Open Access 2025

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

Junyou Wang Zehua Chen Binjie Yuan Kaiwen Zheng Chang Li +2 lainnya

Lihat Sumber

Abstrak

The design of diffusion-based audio generation systems has been investigated from diverse perspectives, such as data space, network architecture, and conditioning techniques, while most of these innovations require model re-training. In sampling, classifier-free guidance (CFG) has been uniformly adopted to enhance generation quality by strengthening condition alignment. However, CFG often compromises diversity, resulting in suboptimal performance. Although the recent autoguidance (AG) method proposes another direction of guidance that maintains diversity, its direct application in audio generation has so far underperformed CFG. In this work, we introduce AudioMoG, an improved sampling method that enhances text-to-audio (T2A) and video-to-audio (V2A) generation quality without requiring extensive training resources. We start with an analysis of both CFG and AG, examining their respective advantages and limitations for guiding diffusion models. Building upon our insights, we introduce a mixture-of-guidance framework that integrates diverse guidance signals with their interaction terms (e.g., the unconditional bad version of the model) to maximize cumulative advantages. Experiments show that, given the same inference speed, our approach consistently outperforms single guidance in T2A generation across sampling steps, concurrently showing advantages in V2A, text-to-music, and image generation. Demo samples are available at: https://audiomog.github.io.

Topik & Kata Kunci

cs.SD cs.AI

Penulis (7)

Junyou Wang

Zehua Chen

Binjie Yuan

Kaiwen Zheng

Chang Li

Yuxuan Jiang

Jun Zhu

Format Sitasi

APA MLA BibTeX

Wang, J., Chen, Z., Yuan, B., Zheng, K., Li, C., Jiang, Y. et al. (2025). AudioMoG: Guiding Audio Generation with Mixture-of-Guidance. https://arxiv.org/abs/2509.23727

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓