arXiv Open Access 2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black Stella Biderman Eric Hallahan Quentin Anthony Leo Gao +12 lainnya
Lihat Sumber

Abstrak

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.

Topik & Kata Kunci

Penulis (17)

S

Sid Black

S

Stella Biderman

E

Eric Hallahan

Q

Quentin Anthony

L

Leo Gao

L

Laurence Golding

H

Horace He

C

Connor Leahy

K

Kyle McDonell

J

Jason Phang

M

Michael Pieler

U

USVSN Sai Prashanth

S

Shivanshu Purohit

L

Laria Reynolds

J

Jonathan Tow

B

Ben Wang

S

Samuel Weinbach

Format Sitasi

Black, S., Biderman, S., Hallahan, E., Anthony, Q., Gao, L., Golding, L. et al. (2022). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. https://arxiv.org/abs/2204.06745

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓