arXiv Open Access 2025

Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs

Chenqian Le Ziheng Gong Chihang Wang Haowei Ni Panfeng Li +1 lainnya
Lihat Sumber

Abstrak

Large language models (LLMs) have shown great potential in medical question answering (MedQA), yet adapting them to biomedical reasoning remains challenging due to domain-specific complexity and limited supervision. In this work, we study how prompt design and lightweight fine-tuning affect the performance of open-source LLMs on PubMedQA, a benchmark for multiple-choice biomedical questions. We focus on two widely used prompting strategies - standard instruction prompts and Chain-of-Thought (CoT) prompts - and apply QLoRA for parameter-efficient instruction tuning. Across multiple model families and sizes, our experiments show that CoT prompting alone can improve reasoning in zero-shot settings, while instruction tuning significantly boosts accuracy. However, fine-tuning on CoT prompts does not universally enhance performance and may even degrade it for certain larger models. These findings suggest that reasoning-aware prompts are useful, but their benefits are model- and scale-dependent. Our study offers practical insights into combining prompt engineering with efficient finetuning for medical QA applications.

Topik & Kata Kunci

Penulis (6)

C

Chenqian Le

Z

Ziheng Gong

C

Chihang Wang

H

Haowei Ni

P

Panfeng Li

X

Xupeng Chen

Format Sitasi

Le, C., Gong, Z., Wang, C., Ni, H., Li, P., Chen, X. (2025). Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs. https://arxiv.org/abs/2506.12182

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓