arXiv Open Access 2023

Measuring The Impact Of Programming Language Distribution

Gabriel Orlanski Kefan Xiao Xavier Garcia Jeffrey Hui Joshua Howland +4 lainnya
Lihat Sumber

Abstrak

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced corpus results in, on average, 12.34% higher $pass@k$ across all tasks and languages compared to the baseline. We find that this strategy achieves 66.48% better $pass@k$ on low-resource languages at the cost of only a 12.94% decrease to high-resource languages. In our three translation tasks, this strategy yields, on average, 30.77% better low-resource $pass@k$ while having 19.58% worse high-resource $pass@k$.

Topik & Kata Kunci

Penulis (9)

G

Gabriel Orlanski

K

Kefan Xiao

X

Xavier Garcia

J

Jeffrey Hui

J

Joshua Howland

J

Jonathan Malmaud

J

Jacob Austin

R

Rishabh Singh

M

Michele Catasta

Format Sitasi

Orlanski, G., Xiao, K., Garcia, X., Hui, J., Howland, J., Malmaud, J. et al. (2023). Measuring The Impact Of Programming Language Distribution. https://arxiv.org/abs/2302.01973

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓