arXiv Open Access 2024

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas Jasper Lee John Jennings Jimmy Xin Michelle Ding +3 lainnya

Lihat Sumber

Abstrak

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

Topik & Kata Kunci

cs.AI cs.CL cs.LG cs.LO cs.PL

Penulis (8)

George Tsoukalas

Jasper Lee

John Jennings

Jimmy Xin

Michelle Ding

Michael Jennings

Amitayush Thakur

Swarat Chaudhuri

Format Sitasi

APA MLA BibTeX

Tsoukalas, G., Lee, J., Jennings, J., Xin, J., Ding, M., Jennings, M. et al. (2024). PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition. https://arxiv.org/abs/2407.11214

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓