arXiv Open Access 2025

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

Haoxin Wang Xianhan Peng Xucheng Huang Yizhe Huang Ming Gong +3 lainnya

Lihat Sumber

Abstrak

In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. The code and data have been made publicly available at https://github.com/XiaoduoAILab/ECom-Bench to facilitate further research and development in this domain.

Topik & Kata Kunci

cs.CL

Penulis (8)

Haoxin Wang

Xianhan Peng

Xucheng Huang

Yizhe Huang

Ming Gong

Chenghan Yang

Yang Liu

Ling Jiang

Format Sitasi

APA MLA BibTeX

Wang, H., Peng, X., Huang, X., Huang, Y., Gong, M., Yang, C. et al. (2025). ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?. https://arxiv.org/abs/2507.05639

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓