arXiv Open Access 2024

From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation

Weipeng Jiang Xuanqi Gao Juan Zhai Shiqing Ma Xiaoyu Zhang +2 lainnya
Lihat Sumber

Abstrak

Large Language Models (LLMs) have demonstrated promising capabilities for code generation. While existing benchmarks evaluate the correctness and efficiency of LLM-generated code, the potential linguistic bias - where code quality varies based on the natural language used to describe programming tasks - remains underexplored. In this paper, we aim to investigate this linguistic bias through the lens of English and Chinese. To facilitate our investigation, we present a unified evaluation framework comprising a curated dataset of 52 Python programming questions with parallel bilingual task descriptions, automated correctness verification, and efficiency quantification tools based on runtime complexity estimation. Based on this framework, we conduct the first empirical study towards the linguistic bias in LLM-generated code on eight popular LCGMs, as well as GPT-3.5-Turbo and GPT-4. We observe that these LCGM-generated code show different correctness on an average of 12% bilingual programming tasks, where 39% also exhibits diverse efficiency. Our findings indicate that LLMs commonly exhibit linguistic bias for code generation.

Topik & Kata Kunci

Penulis (7)

W

Weipeng Jiang

X

Xuanqi Gao

J

Juan Zhai

S

Shiqing Ma

X

Xiaoyu Zhang

Z

Ziyan Lei

C

Chao Shen

Format Sitasi

Jiang, W., Gao, X., Zhai, J., Ma, S., Zhang, X., Lei, Z. et al. (2024). From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation. https://arxiv.org/abs/2406.00602

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓