arXiv Open Access 2024

aMUSEd: An Open MUSE Reproduction

Suraj Patil William Berman Robin Rombach Patrick von Platen

Lihat Sumber

Abstrak

We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.

Topik & Kata Kunci

cs.CV

Penulis (4)

Suraj Patil

William Berman

Robin Rombach

Patrick von Platen

Format Sitasi

APA MLA BibTeX

Patil, S., Berman, W., Rombach, R., Platen, P.v. (2024). aMUSEd: An Open MUSE Reproduction. https://arxiv.org/abs/2401.01808

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓