arXiv Open Access 2023

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Zhoufutu Wen Xinyu Zhao Zhipeng Jin Yi Yang Wei Jia +3 lainnya

Lihat Sumber

Abstrak

In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inconsistent data, and insufficient cross-modal interaction. Also, the separate optimization of retrieval and relevance models affects overall performance. To address this issue, we propose a vision-language framework consisting of two parts. First, we train a base model on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework has been implemented in Baidu search advertising system "Phoneix Nest". Online evaluation shows that it improves cost per mille (CPM) and click-through rate (CTR) by 1.04% and 1.865%.

Topik & Kata Kunci

cs.IR

Penulis (8)

Zhoufutu Wen

Xinyu Zhao

Zhipeng Jin

Yi Yang

Wei Jia

Xiaodong Chen

Shuanglong Li

Lin Liu

Format Sitasi

APA MLA BibTeX

Wen, Z., Zhao, X., Jin, Z., Yang, Y., Jia, W., Chen, X. et al. (2023). Enhancing Dynamic Image Advertising with Vision-Language Pre-training. https://arxiv.org/abs/2306.14112

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓