arXiv Open Access 2024

User-Friendly Customized Generation with Multi-Modal Prompts

Linhao Zhong Yan Hong Wentao Chen Binglin Zhou Yiyi Zhang +2 lainnya
Lihat Sumber

Abstrak

Text-to-image generation models have seen considerable advancement, catering to the increasing interest in personalized image creation. Current customization techniques often necessitate users to provide multiple images (typically 3-5) for each customized object, along with the classification of these objects and descriptive textual prompts for scenes. This paper questions whether the process can be made more user-friendly and the customization more intricate. We propose a method where users need only provide images along with text for each customization topic, and necessitates only a single image per visual concept. We introduce the concept of a ``multi-modal prompt'', a novel integration of text and images tailored to each customization concept, which simplifies user interaction and facilitates precise customization of both objects and scenes. Our proposed paradigm for customized text-to-image generation surpasses existing finetune-based methods in user-friendliness and the ability to customize complex objects with user-friendly inputs. Our code is available at $\href{https://github.com/zhongzero/Multi-Modal-Prompt}{https://github.com/zhongzero/Multi-Modal-Prompt}$.

Topik & Kata Kunci

Penulis (7)

L

Linhao Zhong

Y

Yan Hong

W

Wentao Chen

B

Binglin Zhou

Y

Yiyi Zhang

J

Jianfu Zhang

L

Liqing Zhang

Format Sitasi

Zhong, L., Hong, Y., Chen, W., Zhou, B., Zhang, Y., Zhang, J. et al. (2024). User-Friendly Customized Generation with Multi-Modal Prompts. https://arxiv.org/abs/2405.16501

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓