Semantic Scholar Open Access 2025 9 sitasi

An astronomical question answering dataset for evaluating large language models

Jie Li Fuyong Zhao Panfeng Chen Jiafu Xie Xiangrui Zhang +4 lainnya

Abstrak

Large language models (LLMs) have recently demonstrated exceptional capabilities across a variety of linguistic tasks including question answering (QA). However, it remains challenging to assess their performance in astronomical QA due to the lack of comprehensive benchmark datasets. To bridge this gap, we construct Astro-QA, the first benchmark dataset specifically for QA in astronomy. The dataset contains a collection of 3,082 questions of six types in both English and Chinese, along with standard (reference) answers and related material. These questions encompass several core branches of astronomy, including astrophysics, astrometry, celestial mechanics, history of astronomy, and astronomical techniques and methods. Furthermore, we propose a new measure called DGscore that integrates different measures for objective and subjective questions and incorporates a weighting scheme based on type- and question-specific difficulty coefficients to accurately assess the QA performance of each LLM. We validate the Astro-QA dataset through extensive experimentation with 27 open-source and commercial LLMs. The results show that it can serve as a reliable benchmark dataset to evaluate the capacity of LLM in terms of instruction following, knowledge reasoning, and natural language generation in the astronomical domain, which can calibrate current progress and facilitate future research of astronomical LLMs.

Topik & Kata Kunci

Penulis (9)

J

Jie Li

F

Fuyong Zhao

P

Panfeng Chen

J

Jiafu Xie

X

Xiangrui Zhang

H

Hui Li

M

Mei Chen

Y

Yanhao Wang

M

Ming-yi Zhu

Format Sitasi

Li, J., Zhao, F., Chen, P., Xie, J., Zhang, X., Li, H. et al. (2025). An astronomical question answering dataset for evaluating large language models. https://doi.org/10.1038/s41597-025-04613-9

Akses Cepat

Lihat di Sumber doi.org/10.1038/s41597-025-04613-9
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.1038/s41597-025-04613-9
Akses
Open Access ✓