DOAJ Open Access 2025

Transforming Product Discovery and Interpretation Using Vision–Language Models

Simona-Vasilica Oprea Adela Bâra

Abstrak

In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (<i>vidore/colqwen2-v1.0</i>) and ColPali (<i>vidore/colpali-v1.2-hf)</i>. These models are integrated into two architectures and evaluated across various product interpretation tasks, including image-grounded question answering, brand recognition and visual retrieval based on natural language prompts. ColQwen2, built on the Qwen2-VL backbone with LoRA-based adapter hot-swapping, demonstrates strong performance, allowing end-to-end image querying and text response synthesis. It excels at identifying attributes such as brand, color or usage based solely on product images and responds fluently to user questions. In contrast, ColPali, which utilizes the PaliGemma backbone, is optimized for explainability. It delivers detailed visual-token alignment maps that reveal how specific regions of an image contribute to retrieval decisions, offering transparency ideal for diagnostics or educational applications. Through comparative experiments using footwear imagery, it is demonstrated that ColQwen2 is highly effective in generating accurate responses to product-related questions, while ColPali provides fine-grained visual explanations that reinforce trust and model accountability.

Topik & Kata Kunci

Business

Penulis (2)

Simona-Vasilica Oprea

Adela Bâra

Format Sitasi

APA MLA BibTeX

Oprea, S., Bâra, A. (2025). Transforming Product Discovery and Interpretation Using Vision–Language Models. https://doi.org/10.3390/jtaer20030191

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.3390/jtaer20030191

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.3390/jtaer20030191
Akses: Open Access ✓