arXiv Open Access 2023

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Zhoufutu Wen Xinyu Zhao Zhipeng Jin Yi Yang Wei Jia +3 lainnya
Lihat Sumber

Abstrak

In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inconsistent data, and insufficient cross-modal interaction. Also, the separate optimization of retrieval and relevance models affects overall performance. To address this issue, we propose a vision-language framework consisting of two parts. First, we train a base model on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework has been implemented in Baidu search advertising system "Phoneix Nest". Online evaluation shows that it improves cost per mille (CPM) and click-through rate (CTR) by 1.04% and 1.865%.

Topik & Kata Kunci

Penulis (8)

Z

Zhoufutu Wen

X

Xinyu Zhao

Z

Zhipeng Jin

Y

Yi Yang

W

Wei Jia

X

Xiaodong Chen

S

Shuanglong Li

L

Lin Liu

Format Sitasi

Wen, Z., Zhao, X., Jin, Z., Yang, Y., Jia, W., Chen, X. et al. (2023). Enhancing Dynamic Image Advertising with Vision-Language Pre-training. https://arxiv.org/abs/2306.14112

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓