arXiv Open Access 2024

Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

Phillip Richter-Pechanski Philipp Wiesenbach Dominic M. Schwab Christina Kiriakou Nicolas Geis +2 lainnya

Lihat Sumber

Abstrak

Automatic extraction of medical information from clinical documents poses several challenges: high costs of required clinical expertise, limited interpretability of model predictions, restricted computational resources and privacy regulations. Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data using lightweight masked language models, which are suited for well-established interpretability methods. We are first to present a systematic evaluation of these methods in a low-resource setting, by performing multi-class section classification on German doctor's letters. We conduct extensive class-wise evaluations supported by Shapley values, to validate the quality of our small training data set and to ensure the interpretability of model predictions. We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy. Our results serve as a process-oriented guideline for clinical information extraction projects working with low-resource.

Topik & Kata Kunci

cs.CL cs.AI cs.LG

Penulis (7)

Phillip Richter-Pechanski

Philipp Wiesenbach

Dominic M. Schwab

Christina Kiriakou

Nicolas Geis

Christoph Dieterich

Anette Frank

Format Sitasi

APA MLA BibTeX

Richter-Pechanski, P., Wiesenbach, P., Schwab, D.M., Kiriakou, C., Geis, N., Dieterich, C. et al. (2024). Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting. https://arxiv.org/abs/2403.13369

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓