arXiv Open Access 2024

How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Shaoxiong Ji Pinzhen Chen
Lihat Sumber

Abstrak

Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages' genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.

Topik & Kata Kunci

Penulis (2)

S

Shaoxiong Ji

P

Pinzhen Chen

Format Sitasi

Ji, S., Chen, P. (2024). How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM. https://arxiv.org/abs/2404.04850

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓