arXiv Open Access 2025

On the effectiveness of LLMs for automatic grading of open-ended questions in Spanish

Germán Capdehourat Isabel Amigo Brian Lorenzo Joaquín Trigo
Lihat Sumber

Abstrak

Grading is a time-consuming and laborious task that educators must face. It is an important task since it provides feedback signals to learners, and it has been demonstrated that timely feedback improves the learning process. In recent years, the irruption of LLMs has shed light on the effectiveness of automatic grading. In this paper, we explore the performance of different LLMs and prompting techniques in automatically grading short-text answers to open-ended questions. Unlike most of the literature, our study focuses on a use case where the questions, answers, and prompts are all in Spanish. Experimental results comparing automatic scores to those of human-expert evaluators show good outcomes in terms of accuracy, precision and consistency for advanced LLMs, both open and proprietary. Results are notably sensitive to prompt styles, suggesting biases toward certain words or content in the prompt. However, the best combinations of models and prompt strategies, consistently surpasses an accuracy of 95% in a three-level grading task, which even rises up to more than 98% when the it is simplified to a binary right or wrong rating problem, which demonstrates the potential that LLMs have to implement this type of automation in education applications.

Topik & Kata Kunci

Penulis (4)

G

Germán Capdehourat

I

Isabel Amigo

B

Brian Lorenzo

J

Joaquín Trigo

Format Sitasi

Capdehourat, G., Amigo, I., Lorenzo, B., Trigo, J. (2025). On the effectiveness of LLMs for automatic grading of open-ended questions in Spanish. https://arxiv.org/abs/2503.18072

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓