arXiv Open Access 2025

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Teague McMillan Gabriele Dominici Martin Gjoreski Marc Langheinrich

Lihat Sumber

Abstrak

Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially problematic: explanations that omit salient clinical cues or mask spurious shortcuts can undermine clinician trust and lead to unsafe decision support. We study how inference and training-time choices shape explanation faithfulness, focusing on factors practitioners can control at deployment. We evaluate three LLMs (GPT-4.1-mini, LLaMA 70B, LLaMA 8B) on two datasets-BBQ (social bias) and MedQA (medical licensing questions), and manipulate the number and type of few-shot examples, prompting strategies, and training procedure. Our results show: (i) both the quantity and quality of few-shot examples significantly impact model faithfulness; (ii) faithfulness is sensitive to prompting design; (iii) the instruction-tuning phase improves measured faithfulness on MedQA. These findings offer insights into strategies for enhancing the interpretability and trustworthiness of LLMs in sensitive domains.

Topik & Kata Kunci

cs.CL

Penulis (4)

Teague McMillan

Gabriele Dominici

Martin Gjoreski

Marc Langheinrich

Format Sitasi

APA MLA BibTeX

McMillan, T., Dominici, G., Gjoreski, M., Langheinrich, M. (2025). Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?. https://arxiv.org/abs/2510.24236

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓