arXiv Open Access 2024

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas Jasper Lee John Jennings Jimmy Xin Michelle Ding +3 lainnya
Lihat Sumber

Abstrak

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

Penulis (8)

G

George Tsoukalas

J

Jasper Lee

J

John Jennings

J

Jimmy Xin

M

Michelle Ding

M

Michael Jennings

A

Amitayush Thakur

S

Swarat Chaudhuri

Format Sitasi

Tsoukalas, G., Lee, J., Jennings, J., Xin, J., Ding, M., Jennings, M. et al. (2024). PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition. https://arxiv.org/abs/2407.11214

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓