arXiv Open Access 2025

SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports

Haotian Xia Haonan Ge Junbo Zou Hyun Woo Choi Xuebin Zhang +14 lainnya

Lihat Sumber

Abstrak

Deeply understanding sports requires an intricate blend of fine-grained visual perception and rule-based reasoning - a challenge that pushes the limits of current multimodal models. To succeed, models must master three critical capabilities: perceiving nuanced visual details, applying abstract sport rule knowledge, and grounding that knowledge in specific visual evidence. Current sports benchmarks either cover single sports or lack the detailed reasoning chains and precise visual grounding needed to robustly evaluate these core capabilities in a multi-sport context. To address this gap, we introduce SportR, the first multi-sports large-scale benchmark designed to train and evaluate MLLMs on the fundamental reasoning required for sports intelligence. Our benchmark provides a dataset of 4,789 images and 2,052 videos. To enable granular evaluation, we structure our benchmark around a progressive hierarchy of question-answer pairs designed to probe reasoning at increasing depths - from simple infraction identification to complex penalty prediction. For the most advanced tasks requiring multi-step reasoning, such as determining penalties or explaining tactics, we provide 6,841 high-quality, human-authored Chain of Thought annotations. In addition, our benchmark incorporates both image and video modalities and provides manual bounding box annotations to test visual grounding in the image part directly. Extensive experiments demonstrate the profound difficulty of our benchmark. State-of-the-art baseline models perform poorly on our most challenging tasks. While training on our data via Supervised Fine-Tuning and Reinforcement Learning improves these scores, they remain relatively low, highlighting a significant gap in current model capabilities. SportR presents a new challenge for the community, providing a critical resource to drive future research in multimodal sports reasoning.

Topik & Kata Kunci

cs.CV

Penulis (19)

Haotian Xia

Haonan Ge

Junbo Zou

Hyun Woo Choi

Xuebin Zhang

Danny Suradja

Botao Rui

Ethan Tran

Wendy Jin

Zhen Ye

Xiyang Lin

Christopher Lai

Shengjie Zhang

Junwen Miao

Shichao Chen

Rhys Tracy

Vicente Ordonez

Weining Shen

Hanjie Chen

Format Sitasi

APA MLA BibTeX

Xia, H., Ge, H., Zou, J., Choi, H.W., Zhang, X., Suradja, D. et al. (2025). SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports. https://arxiv.org/abs/2511.06499

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓