arXiv Open Access 2025

CoD: A Diffusion Foundation Model for Image Compression

Zhaoyang Jia Zihan Zheng Naifu Xue Jiahao Li Bin Li +4 lainnya
Lihat Sumber

Abstrak

Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes are released at https://github.com/microsoft/GenCodec/tree/main/CoD.

Topik & Kata Kunci

Penulis (9)

Z

Zhaoyang Jia

Z

Zihan Zheng

N

Naifu Xue

J

Jiahao Li

B

Bin Li

Z

Zongyu Guo

X

Xiaoyi Zhang

H

Houqiang Li

Y

Yan Lu

Format Sitasi

Jia, Z., Zheng, Z., Xue, N., Li, J., Li, B., Guo, Z. et al. (2025). CoD: A Diffusion Foundation Model for Image Compression. https://arxiv.org/abs/2511.18706

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓