arXiv Open Access 2025

YpathRAG:A Retrieval-Augmented Generation Framework and Benchmark for Pathology

Deshui Yu Yizhi Wang Saihui Jin Taojie Zhu Fanyi Zeng +7 lainnya

Lihat Sumber

Abstrak

Large language models (LLMs) excel on general tasks yet still hallucinate in high-barrier domains such as pathology. Prior work often relies on domain fine-tuning, which neither expands the knowledge boundary nor enforces evidence-grounded constraints. We therefore build a pathology vector database covering 28 subfields and 1.53 million paragraphs, and present YpathRAG, a pathology-oriented RAG framework with dual-channel hybrid retrieval (BGE-M3 dense retrieval coupled with vocabulary-guided sparse retrieval) and an LLM-based supportive-evidence judgment module that closes the retrieval-judgment-generation loop. We also release two evaluation benchmarks, YpathR and YpathQA-M. On YpathR, YpathRAG attains Recall@5 of 98.64%, a gain of 23 percentage points over the baseline; on YpathQA-M, a set of the 300 most challenging questions, it increases the accuracies of both general and medical LLMs by 9.0% on average and up to 15.6%. These results demonstrate improved retrieval quality and factual reliability, providing a scalable construction paradigm and interpretable evaluation for pathology-oriented RAG.

Topik & Kata Kunci

cs.CL

Penulis (12)

Deshui Yu

Yizhi Wang

Saihui Jin

Taojie Zhu

Fanyi Zeng

Wen Qian

Zirui Huang

Jingli Ouyang

Jiameng Li

Zhen Song

Tian Guan

Yonghong He

Format Sitasi

APA MLA BibTeX

Yu, D., Wang, Y., Jin, S., Zhu, T., Zeng, F., Qian, W. et al. (2025). YpathRAG:A Retrieval-Augmented Generation Framework and Benchmark for Pathology. https://arxiv.org/abs/2510.08603

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓