Semantic Scholar Open Access 2024 37 sitasi

Benchmarking Data Science Agents

Yuge Zhang Qiyang Jiang Xingyu Han Nan Chen Yuqing Yang +1 lainnya

Abstrak

In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. In this paper, we introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents throughout the entire data science lifecycle. Incorporating a novel bootstrapped annotation method, we streamline dataset preparation, improve the evaluation coverage, and expand benchmarking comprehensiveness. Our findings uncover prevalent obstacles and provide critical insights to inform future advancements in the field.

Topik & Kata Kunci

Computer Science

Penulis (6)

Yuge Zhang

Qiyang Jiang

Xingyu Han

Nan Chen

Yuqing Yang

Kan Ren

Format Sitasi

APA MLA BibTeX

Zhang, Y., Jiang, Q., Han, X., Chen, N., Yang, Y., Ren, K. (2024). Benchmarking Data Science Agents. https://doi.org/10.48550/arXiv.2402.17168

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.48550/arXiv.2402.17168

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Total Sitasi: 37×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2402.17168
Akses: Open Access ✓