arXiv Open Access 2024

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Preferred Elements : Kenshin Abe Kaizaburo Chubachi Yasuhiro Fujita +16 lainnya
Lihat Sumber

Abstrak

We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4. The base model is available at https://huggingface.co/pfnet/plamo-100b.

Topik & Kata Kunci

Penulis (21)

P

Preferred Elements

:

K

Kenshin Abe

K

Kaizaburo Chubachi

Y

Yasuhiro Fujita

Y

Yuta Hirokawa

K

Kentaro Imajo

T

Toshiki Kataoka

H

Hiroyoshi Komatsu

H

Hiroaki Mikami

T

Tsuguo Mogami

S

Shogo Murai

K

Kosuke Nakago

D

Daisuke Nishino

T

Toru Ogawa

D

Daisuke Okanohara

Y

Yoshihiko Ozaki

S

Shotaro Sano

S

Shuji Suzuki

T

Tianqi Xu

T

Toshihiko Yanase

Format Sitasi

Elements, P., :, Abe, K., Chubachi, K., Fujita, Y., Hirokawa, Y. et al. (2024). PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency. https://arxiv.org/abs/2410.07563

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓