arXiv Open Access 2023

Can Programming Languages Boost Each Other via Instruction Tuning?

Daoguang Zan Ailun Yu Bo Shen Jiaxin Zhang Taihong Chen +6 lainnya
Lihat Sumber

Abstrak

When human programmers have mastered a programming language, it would be easier when they learn a new programming language. In this report, we focus on exploring whether programming languages can boost each other during the instruction fine-tuning phase of code large language models. We conduct extensive experiments of 8 popular programming languages (Python, JavaScript, TypeScript, C, C++, Java, Go, HTML) on StarCoder. Results demonstrate that programming languages can significantly improve each other. For example, CodeM-Python 15B trained on Python is able to increase Java by an absolute 17.95% pass@1 on HumanEval-X. More surprisingly, we found that CodeM-HTML 7B trained on the HTML corpus can improve Java by an absolute 15.24% pass@1. Our training data is released at https://github.com/NL2Code/CodeM.

Penulis (11)

D

Daoguang Zan

A

Ailun Yu

B

Bo Shen

J

Jiaxin Zhang

T

Taihong Chen

B

Bing Geng

B

Bei Chen

J

Jichuan Ji

Y

Yafen Yao

Y

Yongji Wang

Q

Qianxiang Wang

Format Sitasi

Zan, D., Yu, A., Shen, B., Zhang, J., Chen, T., Geng, B. et al. (2023). Can Programming Languages Boost Each Other via Instruction Tuning?. https://arxiv.org/abs/2308.16824

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓