arXiv Open Access 2024

BianCang: A Traditional Chinese Medicine Large Language Model

Sibo Wei Xueping Peng Yi-Fei Wang Tao Shen Jiasheng Si +6 lainnya
Lihat Sumber

Abstrak

The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on https://github.com/QLU-NLP/BianCang.

Topik & Kata Kunci

Penulis (11)

S

Sibo Wei

X

Xueping Peng

Y

Yi-Fei Wang

T

Tao Shen

J

Jiasheng Si

W

Weiyu Zhang

F

Fa Zhu

A

Athanasios V. Vasilakos

W

Wenpeng Lu

X

Xiaoming Wu

Y

Yinglong Wang

Format Sitasi

Wei, S., Peng, X., Wang, Y., Shen, T., Si, J., Zhang, W. et al. (2024). BianCang: A Traditional Chinese Medicine Large Language Model. https://arxiv.org/abs/2411.11027

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓