Semantic Scholar Open Access 2023 420 sitasi

Llemma: An Open Language Model For Mathematics

Zhangir Azerbayev Hailey Schoelkopf Keiran Paster Marco Dos Santos S. McAleer +4 lainnya

Abstrak

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

Topik & Kata Kunci

Penulis (9)

Z

Zhangir Azerbayev

H

Hailey Schoelkopf

K

Keiran Paster

M

Marco Dos Santos

S

S. McAleer

A

Albert Q. Jiang

J

Jia Deng

S

Stella Biderman

S

S. Welleck

Format Sitasi

Azerbayev, Z., Schoelkopf, H., Paster, K., Santos, M.D., McAleer, S., Jiang, A.Q. et al. (2023). Llemma: An Open Language Model For Mathematics. https://doi.org/10.48550/arXiv.2310.10631

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.48550/arXiv.2310.10631
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
420×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2310.10631
Akses
Open Access ✓