arXiv Open Access 2026

CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity

Xuefeng Wei Zhixuan Wang Xuan Zhou Zhi Qu Hongyao Li +3 lainnya

Lihat Sumber

Abstrak

We introduce CARTBENCH, a museum-grounded benchmark for evaluating vision-language models (VLMs) on Chinese artworks beyond short-form recognition and QA. CARTBENCH comprises four subtasks: CURATORQA for evidence-grounded recognition and reasoning, CATALOGCAPTION for structured four-section expert-style appreciation, REINTERPRET for defensible reinterpretation with expert ratings, and CONNOISSEURPAIRS for diagnostic authenticity discrimination under visually similar confounds. CARTBENCH is built by aligning image-bearing Palace Museum objects from Wikidata with authoritative catalog pages, spanning five art categories across multiple dynasties. Across nine representative VLMs, we find that high overall CURATORQA accuracy can mask sharp drops on hard evidence linking and style-to-period inference; long-form appreciation remains far from expert references; and authenticity-oriented diagnostic discrimination stays near chance, underscoring the difficulty of connoisseur-level reasoning for current models.

Topik & Kata Kunci

cs.CL

Penulis (8)

Xuefeng Wei

Zhixuan Wang

Xuan Zhou

Zhi Qu

Hongyao Li

Yusuke Sakai

Hidetaka Kamigaito

Taro Watanabe

Format Sitasi

APA MLA BibTeX

Wei, X., Wang, Z., Zhou, X., Qu, Z., Li, H., Sakai, Y. et al. (2026). CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity. https://arxiv.org/abs/2604.11632

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2026
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓