Rubric-Guided Evaluation Framework for Consistent Scoring with Large Language Models
Abstrak
Large language models (LLMs) are increasingly integrated into educational contexts, particularly for the automated assessment of problem-solving and reasoning tasks. Their capacity to generate answers and explanatory feedback at scale makes them attractive for engineering education, but inconsistency, bias, and limited reproducibility remain major concerns. To address these limitations, this paper reports on the application of CourseEvalAI, a rubric-guided evaluation framework designed to ensure transparency and comparability in automated scoring.The framework was applied to a dataset derived from a university-level course in artificial intelligence. Two configurations of the Mistral-7B model were investigated: the baseline version and a LoRA-adapted variant fine-tuned on course-specific data. Model outputs, consisting of answers and explanations, were evaluated by GPT-4 using rubric-based criteria across technical, argumentative, and explanation dimensions. Prior work has shown that GPT-4 achieves results comparable to human evaluators, supporting its use as an expert proxy in this context.Experimental validation demonstrates that CourseEvalAI enables fine-grained analysis of model behavior, detecting scoring drift, rubric-specific improvements, and inter-model performance differences. By integrating structured rubrics, expert evaluation, and graph-based storage, the framework enhances transparency and reproducibility. The approach shows direct applicability in electrical and electronics engineering, automation, and computer science, offering a robust methodology for reliable and interpretable automated assessment.
Penulis (6)
Catalin Anghel
M. Craciun
Emilia Pecheanu
A. Cocu
A. Istrate
A. Anghel
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- Semantic Scholar
- DOI
- 10.1109/ISEEE67817.2025.11304842
- Akses
- Open Access ✓