arXiv Open Access 2025

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

Haoxin Wang Xianhan Peng Xucheng Huang Yizhe Huang Ming Gong +3 lainnya
Lihat Sumber

Abstrak

In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. The code and data have been made publicly available at https://github.com/XiaoduoAILab/ECom-Bench to facilitate further research and development in this domain.

Topik & Kata Kunci

Penulis (8)

H

Haoxin Wang

X

Xianhan Peng

X

Xucheng Huang

Y

Yizhe Huang

M

Ming Gong

C

Chenghan Yang

Y

Yang Liu

L

Ling Jiang

Format Sitasi

Wang, H., Peng, X., Huang, X., Huang, Y., Gong, M., Yang, C. et al. (2025). ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?. https://arxiv.org/abs/2507.05639

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓