DOAJ Open Access 2026

A Multi-Objective Statistical Framework for Evaluating LLM-Based Code Modernization: Transformation Pattern Analysis and Effect Size Validation

Bashair Althani

Abstrak

Automated legacy code modernization using Large Language Models lacks rigorous evaluation frameworks and multi-objective quality assessment methodologies. Existing research suffers from three critical deficiencies: single-metric evaluation paradigms creating pathological optimization incentives, statistical validation limited to <i>p</i>-values without effect size analysis, and absence of systematic transformation pattern taxonomies explaining what works and why. We present a novel multi-objective statistical framework that jointly assesses Cyclomatic Complexity (CC) and Maintainability Index (MI) while providing comprehensive effect size analysis addressing software engineering research gaps. Applied to 47 legacy Java samples from Apache Ant (version 1.10.x, commit rel/1.10.14), our framework achieves 97.9% metric-level improvement with very large practical effects (Cohen’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>=</mo><mn>1.86</mn></mrow></semantics></math></inline-formula>, 95% CI [1.36, 2.35], <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo><</mo><mn>0.0001</mn></mrow></semantics></math></inline-formula>) for maintainability—substantially exceeding prior work and conventional significance thresholds. We note that this success rate reflects quality metric improvement; functional equivalence was verified through syntactic validation and manual inspection of a 20% random sample, while comprehensive automated test-based verification remains a limitation addressed in future work. We contribute: (1) first multi-objective quality assessment framework for code modernization with weighted composite scoring and sensitivity analysis, (2) rigorous statistical methodology with effect size analysis beyond <i>p</i>-values, (3) systematic transformation pattern taxonomy identifying four successful patterns and three failure modes with predictive value (inter-rater agreement <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>κ</mi><mo>=</mo><mn>0.82</mn></mrow></semantics></math></inline-formula>), and (4) negative result showing iterative refinement provides no benefit (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>=</mo><mn>0.08</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo>=</mo><mn>0.179</mn></mrow></semantics></math></inline-formula>), saving community resources. Our transformation taxonomy enables practitioners to predict success likelihood from code characteristics, while our statistical framework provides replicable methodology for evaluating LLM-based software engineering tools. The very large effect size indicates metric-level improvements are materially meaningful for real-world software maintenance, not merely statistically detectable.

Topik & Kata Kunci

Electronic computers. Computer science

Penulis (1)

Bashair Althani

Format Sitasi

APA MLA BibTeX

Althani, B. (2026). A Multi-Objective Statistical Framework for Evaluating LLM-Based Code Modernization: Transformation Pattern Analysis and Effect Size Validation. https://doi.org/10.3390/computers15030148

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.3390/computers15030148

Informasi Jurnal

Tahun Terbit: 2026
Sumber Database: DOAJ
DOI: 10.3390/computers15030148
Akses: Open Access ✓