arXiv Open Access 2024

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Bodhisattwa Prasad Majumder Harshit Surana Dhruv Agarwal Bhavana Dalvi Mishra Abhijeetsingh Meena +5 lainnya

Lihat Sumber

Abstrak

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systematically assess current model capabilities in discovery tasks and provide a useful resource for improving them. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from published papers to approximate the real-world challenges faced by researchers, where each task is defined by a dataset, its metadata, and a discovery goal in natural language. We additionally provide 903 synthetic tasks to conduct controlled evaluations across task complexity. Furthermore, our structured formalism of data-driven discovery enables a facet-based evaluation that provides useful insights into different failure modes. We evaluate several popular LLM-based reasoning frameworks using both open and closed LLMs as baselines on DiscoveryBench and find that even the best system scores only 25%. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.

Topik & Kata Kunci

cs.CL cs.AI cs.LG

Penulis (10)

Bodhisattwa Prasad Majumder

Harshit Surana

Dhruv Agarwal

Bhavana Dalvi Mishra

Abhijeetsingh Meena

Aryan Prakhar

Tirth Vora

Tushar Khot

Ashish Sabharwal

Peter Clark

Format Sitasi

APA MLA BibTeX

Majumder, B.P., Surana, H., Agarwal, D., Mishra, B.D., Meena, A., Prakhar, A. et al. (2024). DiscoveryBench: Towards Data-Driven Discovery with Large Language Models. https://arxiv.org/abs/2407.01725

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓