arXiv Open Access 2024

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Ali TehraniJamsaz Arijit Bhattacharjee Le Chen Nesreen K. Ahmed Amir Yazdanbakhsh +1 lainnya
Lihat Sumber

Abstrak

Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored due to challenges such as complex parallel semantics. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLEU points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLEU, with 2.75% higher compilation accuracy. Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for this complex task, improving CodeBLEU by at least 4.63 points compared to closed-source and open-code LLMs.

Penulis (6)

A

Ali TehraniJamsaz

A

Arijit Bhattacharjee

L

Le Chen

N

Nesreen K. Ahmed

A

Amir Yazdanbakhsh

A

Ali Jannesari

Format Sitasi

TehraniJamsaz, A., Bhattacharjee, A., Chen, L., Ahmed, N.K., Yazdanbakhsh, A., Jannesari, A. (2024). CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming. https://arxiv.org/abs/2410.20527

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓