Semantic Scholar
Open Access
2023
420 sitasi
Llemma: An Open Language Model For Mathematics
Zhangir Azerbayev
Hailey Schoelkopf
Keiran Paster
Marco Dos Santos
S. McAleer
+4 lainnya
Abstrak
We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Topik & Kata Kunci
Penulis (9)
Z
Zhangir Azerbayev
H
Hailey Schoelkopf
K
Keiran Paster
M
Marco Dos Santos
S
S. McAleer
A
Albert Q. Jiang
J
Jia Deng
S
Stella Biderman
S
S. Welleck
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2023
- Bahasa
- en
- Total Sitasi
- 420×
- Sumber Database
- Semantic Scholar
- DOI
- 10.48550/arXiv.2310.10631
- Akses
- Open Access ✓