arXiv Open Access 2024

NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text

Prajwal Kailas Max Homilius Rahul C. Deo Calum A. MacRae

Lihat Sumber

Abstrak

Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patients status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.

Topik & Kata Kunci

cs.LG cs.CL

Penulis (4)

Prajwal Kailas

Max Homilius

Rahul C. Deo

Calum A. MacRae

Format Sitasi

APA MLA BibTeX

Kailas, P., Homilius, M., Deo, R.C., MacRae, C.A. (2024). NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text. https://arxiv.org/abs/2412.11477

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓