arXiv Open Access 2025

MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models

Kangkun Mao Jinru Ding Jiayuan Chen Mouxiao Bian Ruiyao Chen +4 lainnya
Lihat Sumber

Abstrak

As large language models (LLMs) enter the medical domain, most benchmarks evaluate them on question answering or descriptive reasoning, overlooking quantitative reasoning critical to clinical decision-making. Existing datasets like MedCalc-Bench cover few calculation tasks and fail to reflect real-world computational scenarios. We introduce MedCalc-Eval, the largest benchmark for assessing LLMs' medical calculation abilities, comprising 700+ tasks across two types: equation-based (e.g., Cockcroft-Gault, BMI, BSA) and rule-based scoring systems (e.g., Apgar, Glasgow Coma Scale). These tasks span diverse specialties including internal medicine, surgery, pediatrics, and cardiology, offering a broader and more challenging evaluation setting. To improve performance, we further develop MedCalc-Env, a reinforcement learning environment built on the InternBootcamp framework, enabling multi-step clinical reasoning and planning. Fine-tuning a Qwen2.5-32B model within this environment achieves state-of-the-art results on MedCalc-Eval, with notable gains in numerical sensitivity, formula selection, and reasoning robustness. Remaining challenges include unit conversion, multi-condition logic, and contextual understanding. Code and datasets are available at https://github.com/maokangkun/MedCalc-Eval.

Topik & Kata Kunci

Penulis (9)

K

Kangkun Mao

J

Jinru Ding

J

Jiayuan Chen

M

Mouxiao Bian

R

Ruiyao Chen

X

Xinwei Peng

S

Sijie Ren

L

Linyang Li

J

Jie Xu

Format Sitasi

Mao, K., Ding, J., Chen, J., Bian, M., Chen, R., Peng, X. et al. (2025). MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models. https://arxiv.org/abs/2510.27267

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓