DOAJ Open Access 2026

Cloud-assisted LLM-enhanced datasets for AST hierarchy-aware code summarization model

Junsan Zhang Yudie Yan Junxiao Han Ao Lu Juncai Guo +1 lainnya

Abstrak

Abstract Code summarization is an important task in software engineering that helps developers understand and maintain code by generating natural language summaries. Existing approaches predominantly rely on single models, facing a dilemma: directly deploying large language models (LLMs) incurs high training costs, while lightweight models specialized for summarization are constrained by the quality of training data and their ability to capture the complex structural semantics of code. This highlights the urgent need for synergistic collaboration between large and small models in cloud computing environments. To address these issues, this paper proposes a cloud-assisted code summarization framework. First, we achieve code enhancement by invoking cloud-deployed LLM services. The specific workflow involves using preset prompt templates to guide the model in evaluating code quality and automatically repairing defects based on its feedback, thereby constructing high-quality datasets Java-QE and Python-QE. Second, for efficient edge deployment, we introduce HiSum: AST Hierarchy-Aware Code Summarization model, a lightweight model. HiSum transforms code AST into Directed Syntax Graphs (DSG) to preserve structural semantics, encodes them via a directed graph convolutional network and decode to improve summary quality. Experimental results show that our framework significantly enhances code summarization performance. On the constructed Java-QE and Python-QE datasets, the HiSum model achieves notable improvements over state-of-the-art baselines in BLEU, METEOR, and ROUGE-L metrics (increases of 1.06%, 1.98%, 3.12% for Java-QE, and 1.46%, 3.24%, 2.20% for Python-QE, respectively). This research provides a solution that utilizes cloud LLM-assisted data enhancement to empower a lightweight hierarchical-aware model.

Penulis (6)

J

Junsan Zhang

Y

Yudie Yan

J

Junxiao Han

A

Ao Lu

J

Juncai Guo

J

Javad Pourzamani

Format Sitasi

Zhang, J., Yan, Y., Han, J., Lu, A., Guo, J., Pourzamani, J. (2026). Cloud-assisted LLM-enhanced datasets for AST hierarchy-aware code summarization model. https://doi.org/10.1186/s13677-026-00852-2

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1186/s13677-026-00852-2
Informasi Jurnal
Tahun Terbit
2026
Sumber Database
DOAJ
DOI
10.1186/s13677-026-00852-2
Akses
Open Access ✓