arXiv Open Access 2024

From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation

Weipeng Jiang Xuanqi Gao Juan Zhai Shiqing Ma Xiaoyu Zhang +2 lainnya

Lihat Sumber

Abstrak

Large Language Models (LLMs) have demonstrated promising capabilities for code generation. While existing benchmarks evaluate the correctness and efficiency of LLM-generated code, the potential linguistic bias - where code quality varies based on the natural language used to describe programming tasks - remains underexplored. In this paper, we aim to investigate this linguistic bias through the lens of English and Chinese. To facilitate our investigation, we present a unified evaluation framework comprising a curated dataset of 52 Python programming questions with parallel bilingual task descriptions, automated correctness verification, and efficiency quantification tools based on runtime complexity estimation. Based on this framework, we conduct the first empirical study towards the linguistic bias in LLM-generated code on eight popular LCGMs, as well as GPT-3.5-Turbo and GPT-4. We observe that these LCGM-generated code show different correctness on an average of 12% bilingual programming tasks, where 39% also exhibits diverse efficiency. Our findings indicate that LLMs commonly exhibit linguistic bias for code generation.

Topik & Kata Kunci

cs.SE cs.PL

Penulis (7)

Weipeng Jiang

Xuanqi Gao

Juan Zhai

Shiqing Ma

Xiaoyu Zhang

Ziyan Lei

Chao Shen

Format Sitasi

APA MLA BibTeX

Jiang, W., Gao, X., Zhai, J., Ma, S., Zhang, X., Lei, Z. et al. (2024). From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation. https://arxiv.org/abs/2406.00602

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓