arXiv Open Access 2025

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Aditya Nagori Ayush Gautam Matthew O. Wiens Vuong Nguyen Nathan Kenya Mugisha +4 lainnya
Lihat Sumber

Abstrak

Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.

Penulis (9)

A

Aditya Nagori

A

Ayush Gautam

M

Matthew O. Wiens

V

Vuong Nguyen

N

Nathan Kenya Mugisha

J

Jerome Kabakyenga

N

Niranjan Kissoon

J

John Mark Ansermino

R

Rishikesan Kamaleswaran

Format Sitasi

Nagori, A., Gautam, A., Wiens, M.O., Nguyen, V., Mugisha, N.K., Kabakyenga, J. et al. (2025). Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models. https://arxiv.org/abs/2505.09805

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓