arXiv Open Access 2026

Generating Literature-Driven Scientific Theories at Scale

Peter Jansen Peter Clark Doug Downey Daniel S. Weld

Lihat Sumber

Abstrak

Contemporary automated scientific discovery has focused on agents for generating scientific experiments, while systems that perform higher-level scientific activities such as theory building remain underexplored. In this work, we formulate the problem of synthesizing theories consisting of qualitative and quantitative laws from large corpora of scientific literature. We study theory generation at scale, using 13.7k source papers to synthesize 2.9k theories, examining how generation using literature-grounding versus parametric knowledge, and accuracy-focused versus novelty-focused generation objectives change theory properties. Our experiments show that, compared to using parametric LLM memory for generation, our literature-supported method creates theories that are significantly better at both matching existing evidence and at predicting future results from 4.6k subsequently-written papers

Topik & Kata Kunci

cs.CL cs.AI

Penulis (4)

Peter Jansen

Peter Clark

Doug Downey

Daniel S. Weld

Format Sitasi

APA MLA BibTeX

Jansen, P., Clark, P., Downey, D., Weld, D.S. (2026). Generating Literature-Driven Scientific Theories at Scale. https://arxiv.org/abs/2601.16282

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2026
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓