arXiv Open Access 2024

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu Deqing Fu Kai Sun Yi Lu Zhaojiang Lin +6 lainnya
Lihat Sumber

Abstrak

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible, and are not generalizable for multimodal recommendation. We hypothesize that a user's visual history -- comprising images from daily life -- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10% on Hit@3, and outperforms GPT-4o by 2-5%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

Topik & Kata Kunci

Penulis (11)

W

Wang Bill Zhu

D

Deqing Fu

K

Kai Sun

Y

Yi Lu

Z

Zhaojiang Lin

S

Seungwhan Moon

K

Kanika Narang

M

Mustafa Canim

Y

Yue Liu

A

Anuj Kumar

X

Xin Luna Dong

Format Sitasi

Zhu, W.B., Fu, D., Sun, K., Lu, Y., Lin, Z., Moon, S. et al. (2024). VisualLens: Personalization through Task-Agnostic Visual History. https://arxiv.org/abs/2411.16034

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓