Semantic Scholar Open Access 2023 27 sitasi

Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models

Aleksa Bisercic Mladen Nikolic M. Schaar Boris Delibasic Pietro Lio +1 lainnya

Abstrak

Tabular data is often hidden in text, particularly in medical diagnostic reports. Traditional machine learning (ML) models designed to work with tabular data, cannot effectively process information in such form. On the other hand, large language models (LLMs) which excel at textual tasks, are probably not the best tool for modeling tabular data. Therefore, we propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. Drawing upon the reasoning capabilities of LLMs, TEMED-LLM goes beyond traditional extraction techniques, accurately inferring tabular features, even when their names are not explicitly mentioned in the text. This is achieved by combining domain-specific reasoning guidelines with a proposed data validation and reasoning correction feedback loop. By applying interpretable ML models such as decision trees and logistic regression over the extracted and validated data, we obtain end-to-end interpretable predictions. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics. Given its predictive performance, simplicity, and interpretability, TEMED-LLM underscores the potential of leveraging LLMs to improve the performance and trustworthiness of ML models in medical applications.

Topik & Kata Kunci

Penulis (6)

A

Aleksa Bisercic

M

Mladen Nikolic

M

M. Schaar

B

Boris Delibasic

P

Pietro Lio

A

A. Petrović

Format Sitasi

Bisercic, A., Nikolic, M., Schaar, M., Delibasic, B., Lio, P., Petrović, A. (2023). Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models. https://doi.org/10.48550/arXiv.2306.05052

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2306.05052
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
27×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2306.05052
Akses
Open Access ✓