Unsupervized clustering reveals a tri-phenotype model of hospitalized COVID-19 patients: Beirut cohort study and literature synthesis
Abstrak
Introduction COVID-19, caused by severe acure respiratory syndrome coronavirus 2, has posed unprecedented challenges globally, with diverse clinical manifestations ranging from asymptomatic and mild symptoms to severe and fatal illness. Identifying patient subgroups with distinct clinical profiles could enhance individualized treatment strategies. Clustering mixed clinical data offers a promising avenue for uncovering meaningful patterns; however, few algorithms effectively manage heterogeneous datasets. This study applied evidence-based clustering algorithms, that is, KAMILA and K-prototypes, to categorize COVID-19 patients on the basis of medical history and biochemical and radiological data. Methods A retrospective cohort study was conducted on 556 COVID-19 patients admitted to Hôtel Dieu de France Hospital in Beirut between March 2020 and October 2021. Only data collected within the first 24 hours of admission were used for clustering to ensure early prognostic relevance. After data cleaning, the missing values were imputed into 30 datasets. KAMILA and K-prototype algorithms were applied to these datasets, generating clusters ranging from two to six groups. The optimal clustering solution was determined via the silhouette, Calinski–Harabasz, and Dunn indices, followed by statistical analyses to characterize cluster-specific patient profiles and outcomes. Results Clustering identified three distinct patient groups, with the KAMILA algorithm providing the best fit. Cluster 1 primarily included middle-aged male patients exhibiting elevated inflammatory markers, consistent oxygen requirements, and extended hospital stays. Cluster 2 included elderly patients with multiple comorbidities and high intensive care unit (ICU) admission rates, requiring cautious anticoagulation and early antibiotic intervention. Cluster 3 included younger, generally healthier individuals who required minimal interventions and experienced low mortality. Conclusions Mixed-data clustering revealed three COVID-19 patient clusters indicating the clinical meaningfulness and global reproducibility with prognostic and therapeutic implications. This unsupervised approach may inform early triage and resource allocation. Further prospective validation in diverse, vaccinated populations is warranted.
Topik & Kata Kunci
Penulis (5)
Christopher El Hadi
Rindala Saliba
Georges Maalouly
Moussa Riachy
Ghassan Sleilaty
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1177/20552076251394299
- Akses
- Open Access ✓