Abstract The respiratory tract microbiome, a complex ecosystem of microorganisms colonizing the respiratory mucous layers and epithelial surfaces along with their associated microenvironment, plays a vital role in maintaining respiratory function and promoting the maturation of the respiratory immune system. Current research suggests that environmental changes can disrupt the respiratory microbiota, potentially leading to disease. This review summarizes existing research on the impact of environmental factors on the respiratory microbiome and associated diseases, aiming to offer new insights into the prevention and treatment of respiratory disease.
ABSTRACT This systematic review and meta‐analysis evaluated the performance of machine learning (ML) models in predicting mortality among pulmonary embolism (PE) patients, synthesizing data from 17 studies encompassing 844,071 cases. Logistic Regression was the most commonly used algorithm, followed by advanced models like Random Forests, Support Vector Machines, XGBoost, and Neural Networks. Pooled performance metrics from 12 studies demonstrated a sensitivity of 0.88 (95% CI: 0.78–0.94, I2 = 90.43%), specificity of 0.79 (95% CI: 0.62–0.89, I2 = 99.53%), positive likelihood ratio of 4.1 (95% CI: 2.2–7.7), negative likelihood ratio of 0.16 (95% CI: 0.08–0.29), diagnostic odds ratio of 26 (95% CI: 10–71), and an AUROC of 0.91 (95% CI: 0.88–0.93), indicating excellent discriminative ability. Subgroup analyses revealed higher sensitivity in advanced ML models (89.7%) and non‐USA studies (97.2%), with advanced ML showing lower specificity heterogeneity (I2 = 0%). Significant heterogeneity was observed, particularly in specificity (I2 = 99%), driven by traditional ML and USA‐based studies. Minimal publication bias was noted for sensitivity (Egger's p = 0.942), but specificity showed potential bias (Egger's p = 0.038 after outlier exclusion). These findings suggest that ML models outperform traditional risk stratification tools in predicting PE mortality, offering robust potential for clinical decision‐making, though heterogeneity and retrospective study designs warrant cautious interpretation. Trial Registration: PROSPERO: CRD420251026696
Diseases of the circulatory (Cardiovascular) system, Diseases of the respiratory system
Sarah Cataldi, Elena Maria Ticozzi, Federica Morani
et al.
Background: This article examines the infectious disease surveillance system in the Lombardy region of Italy, with a focus on its response mechanisms to respiratory syndromes. This study aims to describe the alert system and the organizational procedures in place, assessing their effectiveness in managing health crises. Methods: This study is based on the analysis of Lombardy’s regional resolution No. 1125, developed by regional public health experts. Surveillance levels were categorized based on incidence thresholds and healthcare system impacts, establishing specific indicators and activation protocols. Information flows are managed through real-time data portals, enabling the real-time monitoring of COVID-19, influenza, and other infectious respiratory diseases. Results: A multi-level response system was established, with levels ranging from ordinary regimes to critical epidemic activation. Each level includes specific actions, such as resource reallocation, emergency department support, and the suspension of elective procedures. The use of technological tools, such as electronic health records, streamlined reporting processes, and real-time data flow management, has strengthened the region’s response capabilities. Conclusions: This study underscores the value of a structured, multi-level response system for infectious disease management, showing that a unified regional approach improves crisis response efficiency. It suggests that sharing activation indicators and protocols within the scientific community can help harmonize national and international responses to future pandemics. The system, while effective in its current context, may require adaptation for future health challenges.
The airway epithelial barrier (AEB) is a dynamic interface that maintains respiratory homeostasis. Complex networks of epithelial cells, intercellular junctions, and immune constituents support the structural and functional integrity of the AEB. This review synthesizes how the respiratory exposome components disrupt AEB physiology by compromising junctional integrity, triggering oxidative stress, and inducing inflammation. The review further analyzes how these perturbations lead to maladaptive responses in chronic respiratory diseases (CRDs) and the effectiveness of emerging biologics targeting epithelial-derived alarmins in treating CRDs. By integrating exposome science with epithelial physiology, we provide a unified framework for understanding environmental impacts on airway health.
The evolutionary origins of ageing and age-associated diseases continue to pose a fundamental question in biology. This study is concerned with a recently proposed framework, which conceptualises development and ageing as a continuous process, driven by genetically encoded epigenetic changes in target sets of cells. According to the Evolvable Soma Theory of Ageing (ESTA), ageing reflects the cumulative manifestation of epigenetic changes that are predominantly expressed during the post-reproductive phase. These late-acting modifications are not yet evolutionarily optimised but are instead subject to ongoing selection, functioning as somatic "experiments" through which evolution explores novel phenotypic variation. These experiments are often detrimental, leading to progressive physical decline and eventual death, while a small subset may produce beneficial adaptations, that evolution can exploit to shape future developmental trajectories. According to ESTA, ageing can be understood as evolution in action, yet old age is also the strongest risk factor for major diseases such as cardiovascular diseases, cancer, neurodegenerative disorders, and metabolic syndrome. We argue that this association is not merely correlational but causal: the same epigenetic process that drive development and ageing also underlie age-associated diseases. Growing evidence points to epigenetic regulation as a central factor in these pathologies, since no consistent patterns of genetic mutations have been identified, whereas widespread regulatory and epigenetic disruptions are observed. From this perspective, evolution is not only the driver of ageing but also the ultimate source of the diseases that accompany it, making it the root cause of most age-related pathologies.
Abdullah Al Shafi, Rowzatul Zannat, Abdul Muntakim
et al.
Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases. The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports. Only the verified medical sources were included in the dataset, while those from non-peer-reviewed and anecdotal sources were excluded. The dataset is structured in a tabular format, where the first column represents diseases, and the remaining columns represent symptoms. Each symptom cell contains a binary value, indicating whether a symptom is associated with a disease. Thereby, this structured representation makes the dataset very useful for a wide range of applications, including machine learning-based disease prediction, clinical decision support systems, and epidemiological studies. Although there are some advancements in the field of disease-symptom datasets, there is a significant gap in structured datasets for the Bangla language. This dataset aims to bridge that gap by facilitating the development of multilingual medical informatics tools and improving disease prediction models for underrepresented linguistic communities. Further developments should include region-specific diseases and further fine-tuning of symptom associations for better diagnostic performance
Medical artificial intelligence (AI) systems frequently lack systematic domain expertise integration, potentially compromising diagnostic reliability. This study presents an ontology-based framework for bone disease diagnosis, developed in collaboration with Ho Chi Minh City Hospital for Traumatology and Orthopedics. The framework introduces three theoretical contributions: (1) a hierarchical neural network architecture guided by bone disease ontology for segmentation-classification tasks, incorporating Visual Language Models (VLMs) through prompts, (2) an ontology-enhanced Visual Question Answering (VQA) system for clinical reasoning, and (3) a multimodal deep learning model that integrates imaging, clinical, and laboratory data through ontological relationships. The methodology maintains clinical interpretability through systematic knowledge digitization, standardized medical terminology mapping, and modular architecture design. The framework demonstrates potential for extension beyond bone diseases through its standardized structure and reusable components. While theoretical foundations are established, experimental validation remains pending due to current dataset and computational resource limitations. Future work will focus on expanding the clinical dataset and conducting comprehensive system validation.
Aniketh Garikaparthi, Manasi Patwardhan, Lovekesh Vig
et al.
The rapid advancement in capabilities of large language models (LLMs) raises a pivotal question: How can LLMs accelerate scientific discovery? This work tackles the crucial first stage of research, generating novel hypotheses. While recent work on automated hypothesis generation focuses on multi-agent frameworks and extending test-time compute, none of the approaches effectively incorporate transparency and steerability through a synergistic Human-in-the-loop (HITL) approach. To address this gap, we introduce IRIS: Interactive Research Ideation System, an open-source platform designed for researchers to leverage LLM-assisted scientific ideation. IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis. Designed to empower researchers with greater control and insight throughout the ideation process. We additionally conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation. We open-source our code at https://github.com/Anikethh/IRIS-Interactive-Research-Ideation-System
Modeling disease progression through multiple stages is critical for clinical decision-making for chronic diseases, e.g., cancer, diabetes, chronic kidney diseases, and so on. Existing approaches often model the disease progression as a uniform trajectory pattern at the population level. However, chronic diseases are highly heterogeneous and often have multiple progression patterns depending on a patient's individual genetics and environmental effects due to lifestyles. We propose a personalized disease progression model to jointly learn the heterogeneous progression patterns and groups of genetic profiles. In particular, an end-to-end pipeline is designed to simultaneously infer the characteristics of patients from genetic markers using a variational autoencoder and how it drives the disease progressions using an RNN-based state-space model based on clinical observations. Our proposed model shows improvement on real-world and synthetic clinical data.
Abstract Background Chronic thromboembolic pulmonary hypertension (CTEPH) is a progressive pulmonary vascular disorder with substantial morbidity and mortality, also a disease underdiagnosed and undertreated. It is potentially curable by pulmonary endarterectomy (PEA) in patients with surgically accessible thrombi. Balloon pulmonary angioplasty (BPA) and targeted medical therapy are options for patients with distal lesions or persistent/recurrent pulmonary hypertension after PEA. There is an urgent need to increase the awareness of CTEPH. Qualified CTEPH centers are still quite limited. Baseline characteristics, management pattern and clinical outcome of CTEPH in China needs to be reported. Methods and design The CHinese reAl-world study to iNvestigate the manaGEment pattern and outcomes of chronic thromboembolic pulmonary hypertension (CHANGE) study is designed to provide the multimodality treatment pattern and clinical outcomes of CTEPH in China. Consecutive patients who are ≥ 14 year-old and diagnosed with CTEPH are enrolled. The diagnosis of CTEPH is confirmed in right heart catheterization and imaging examinations. The multimodality therapeutic strategy, which consists of PEA, BPA and targeted medical therapy, is made by a multidisciplinary team. The blood sample and tissue from PEA are stored in the central biobank for further research. The patients receive regular follow-up every 3 or 6 months for at least 3 years. The primary outcomes include all-cause mortality and changes in functional and hemodynamic parameters from baseline. The secondary outcomes include the proportion of patients experiencing lung transplantation, the proportion of patients experiencing heart and lung transplantation, and changes in health-related quality of life. Up to 31 December 2023, the study has enrolled 1500 eligible patients from 18 expert centers. Conclusions As a real-world study, the CHANGE study is expected to increase our understanding of CTEPH, and to fill the gap between guidelines and the clinical practice in the diagnosis, assessment and treatment of patients with CTEPH. Registration Number in ClinicalTrials.gov NCT05311072.
Mohammad G.A. Khalaf, Raafat T.I. El-Sokkary, Mariam L.A. Sourial
et al.
Background Pulmonary embolism (PE) is one of the most fatal emergencies with a high risk of mortality. Multiple risk stratification scores have been developed to assess a patient’s overall mortality risk. Objective This study aimed to validate modified FAST and modified Bova scores for risk stratification and predicting the risk of early mortality in patients presenting with acute PE. Patients and methods Patients admitted to Assiut University Hospital with PE were sequentially included. Pulmonary Embolism Severity Index (PESI), modified Bova, and modified FAST scores were calculated for all included patients. Results A total of 100 patients with PE were sequentially included. It was found that predictors of in-hospital mortality in patients with PE were; chronic heart failure [odds ratio (OR)= 1.87], chronic respiratory disease (OR= 1.99), chronic kidney disease (OR= 2.01), hypotension (OR= 2.99), intermediate-high risk- PESI (simplified version; OR=2.76), intermediate-high risk modified Bova score (OR= 3.01) and intermediate-high risk modified FAST score (OR= 3.90).It was found that the modified FAST score had the best diagnostic accuracy (89.2%) with an area under the curve (AUC) 0.962, followed by the modified Bova score with accuracy 76.8% and AUC 0.761. The two scores had higher accuracy than that for PESI score (53.4%, AUC= 0.627). Conclusion Modified FAST and modified Bova scores are simple and reliable tools for risk stratification of patients with acute PE.
Since 2000, efforts to develop new treatments for TB have been promising, but diagnosing TB, especially in children, remains a challenge. This issue of the Journal includes the first in a series of articles related to TB in children, highlighting new diagnostic tests that do not rely on sputum, and have great potential for improving diagnosis and treatment initiation. Key to a reduction in TB prevalence, experts are engaging with communities to find undiagnosed cases and combat the stigma of TB. International collaborations are also central to integrating novel diagnostics and providing support for these vulnerable populations.
The medical domain is vast and diverse, with many existing embedding models focused on general healthcare applications. However, these models often struggle to capture a deep understanding of diseases due to their broad generalization across the entire medical field. To address this gap, I present DisEmbed, a disease-focused embedding model. DisEmbed is trained on a synthetic dataset specifically curated to include disease descriptions, symptoms, and disease-related Q\&A pairs, making it uniquely suited for disease-related tasks. For evaluation, I benchmarked DisEmbed against existing medical models using disease-specific datasets and the triplet evaluation method. My results demonstrate that DisEmbed outperforms other models, particularly in identifying disease-related contexts and distinguishing between similar diseases. This makes DisEmbed highly valuable for disease-specific use cases, including retrieval-augmented generation (RAG) tasks, where its performance is particularly robust.
Objective: Our objective is to develop and validate TrajVis, an interactive tool that assists clinicians in using artificial intelligence (AI) models to leverage patients' longitudinal electronic medical records (EMR) for personalized precision management of chronic disease progression. Methods: We first perform requirement analysis with clinicians and data scientists to determine the visual analytics tasks of the TrajVis system as well as its design and functionalities. A graph AI model for chronic kidney disease (CKD) trajectory inference named DEPOT is used for system development and demonstration. TrajVis is implemented as a full-stack web application with synthetic EMR data derived from the Atrium Health Wake Forest Baptist Translational Data Warehouse and the Indiana Network for Patient Care research database. A case study with a nephrologist and a user experience survey of clinicians and data scientists are conducted to evaluate the TrajVis system. Results: The TrajVis clinical information system is composed of four panels: the Patient View for demographic and clinical information, the Trajectory View to visualize the DEPOT-derived CKD trajectories in latent space, the Clinical Indicator View to elucidate longitudinal patterns of clinical features and interpret DEPOT predictions, and the Analysis View to demonstrate personal CKD progression trajectories. System evaluations suggest that TrajVis supports clinicians in summarizing clinical data, identifying individualized risk predictors, and visualizing patient disease progression trajectories, overcoming the barriers of AI implementation in healthcare. Conclusion: TrajVis bridges the gap between the fast-growing AI/ML modeling and the clinical use of such models for personalized and precision management of chronic diseases.
The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Deep learning (DL) offers a promising solution for automatic disease screening but demands substantial data. Collecting and labeling large volumes of ophthalmic images across various modalities encounters several real-world challenges, especially for rare diseases. Here, we introduce EyeDiff, a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts and evaluate its applicability in diagnosing common and rare diseases. EyeDiff is trained on eight large-scale datasets using the advanced latent diffusion model, covering 14 ophthalmic image modalities and over 80 ocular diseases, and is adapted to ten multi-country external datasets. The generated images accurately capture essential lesional characteristics, achieving high alignment with text prompts as evaluated by objective metrics and human experts. Furthermore, integrating generated images significantly enhances the accuracy of detecting minority classes and rare eye diseases, surpassing traditional oversampling methods in addressing data imbalance. EyeDiff effectively tackles the issue of data imbalance and insufficiency typically encountered in rare diseases and addresses the challenges of collecting large-scale annotated images, offering a transformative solution to enhance the development of expert-level diseases diagnosis models in ophthalmic field.
Abstract Background Ventilator-induced lung injury (VILI) is a clinical complication of mechanical ventilation observed in patients with acute respiratory distress syndrome. It is characterized by inflammation mediated by inflammatory cells and their secreted mediators. Methods To investigate the mechanisms underlying VILI, a C57BL/6J mouse model was induced using high tidal volume (HTV) mechanical ventilation. Mice were pretreated with Clodronate liposomes to deplete alveolar macrophages or administered normal bone marrow-derived macrophages or Group V phospholipase A2 (gVPLA2) intratracheally to inhibit bone marrow-derived macrophages. Lung tissue and bronchoalveolar lavage fluid (BALF) were collected to assess lung injury and measure Ca2 + concentration, gVPLA2, downstream phosphorylated cytoplasmic phospholipase A2 (p-cPLA2), prostaglandin E2 (PGE2), protein expression related to mitochondrial dynamics and mitochondrial damage. Cellular experiments were performed to complement the animal studies. Results Depletion of alveolar macrophages attenuated HTV-induced lung injury and reduced gVPLA2 levels in alveolar lavage fluid. Similarly, inhibition of alveolar macrophage-derived gVPLA2 had a similar effect. Activation of the cPLA2/PGE2/Ca2 + pathway in alveolar epithelial cells by gVPLA2 derived from alveolar macrophages led to disturbances in mitochondrial dynamics and mitochondrial dysfunction. The findings from cellular experiments were consistent with those of animal experiments. Conclusions HTV mechanical ventilation induces the secretion of gVPLA2 by alveolar macrophages, which activates the cPLA2/PGE2/Ca2 + pathway, resulting in mitochondrial dysfunction. These findings provide insights into the pathogenesis of VILI and may contribute to the development of therapeutic strategies for preventing or treating VILI.
Abstract Background Lung ultrasound (LUS) is a useful tool for assessing the severity of lung disease, without radiation exposure. However, there is little data on the practicality of LUS in assessing the severity of bronchopulmonary dysplasia (BPD) and evaluating short-term clinical outcomes. We adapted a LUS score to evaluate BPD severity and assess the reliability of mLUS score correlated with short-term clinical outcomes. Methods Prospective diagnostic accuracy study was designed to enroll preterm infants with gestational age < 34 weeks. Lung ultrasonography was performed at 36 weeks postmenstrual age. The diagnostic and predictive values of new modified lung ultrasound (mLUS) scores based on eight standard sections were compared with classic lung ultrasound (cLUS) scores. Results A total of 128 infants were enrolled in this cohort, including 30 without BPD; 31 with mild BPD; 23 with moderate BPD and 44 with severe BPD. The mLUS score was significantly correlated with the short-term clinical outcomes, superior to cLUS score. The mLUS score well correlated with moderate and severe BPD (AUC = 0.813, 95% CI 0.739–0.888) and severe BPD (AUC = 0.801, 95% CI 0.728–0.875), which were superior to cLUS score. The ROC analysis of mLUS score to evaluate the other short-term outcomes also showed significant superiority to cLUS score. The optimal cutoff points for mLUS score were 14 for moderate and severe BPD and 16 for severe BPD. Conclusions The mLUS score correlates significantly with short-term clinical outcomes and well evaluates these outcomes in preterm infants.
Pablo Francisco Oliva-Sánchez, Felipe Vadillo-Ortega, Rafael Bojalil-Parra
et al.
Resumen: Objetivos: Describir la asociación entre las enfermedades crónicas no transmisibles y la edad, con la hospitalización, desenlaces clínicos graves y las defunciones por COVID-19 en los casos confirmados en población mexicana, comparando las tres primeras olas epidemiológicas de la pandemia en México. Diseño: Se realizó un análisis transversal utilizando el Sistema de Vigilancia Epidemiológica de Enfermedad Respiratoria Viral para COVID-19. Emplazamiento: Sistema de Vigilancia Epidemiológica de Enfermedad Respiratoria Viral en México (SISVER). Participantes: Población mexicana confirmada para SARS-CoV-2 registrada en el SISVER. Mediciones principales: Los desenlaces graves analizados fueron hospitalización, neumonía, necesidad de ventilación mecánica, ingreso a la UCI y defunción. Se evaluó la asociación (odds ratio [OR]) entre los desenlaces y las variables clínicas, comparando las tres olas epidemiológicas en México. Resultados: Una edad mayor de 65 años se asocia a un mayor porcentaje de hospitalización, neumonía, y notablemente, con el total de defunciones, independientemente del efecto de las comorbilidades crónicas. Existe interacción entre la edad en conjunto con la obesidad, la cual se asocia con la hospitalización y neumonía. Estos hallazgos fueron consistentes a lo largo de las tres olas epidemiológicas. Conclusión: La obesidad, EPOC y la diabetes en interacción con la edad se asocian con peores desenlaces clínicos, primordialmente con defunciones en los pacientes con COVID-19. Abstract: Objectives: To describe the association between chronic noncommunicable diseases and age with hospitalization, death and severe clinical outcomes for COVID-19 in confirmed cases within the mexican population, comparing the first three epidemiological waves of the pandemic in Mexico. Design: We performed an analysis using Mexico's Government Epidemiological Surveillance System database for COVID-19. Emplacement: Mexico's Epidemiological Surveillance System for Respiratory Diseases. Participants: Mexican population confirmed with SARS-CoV-2 registered on Mexico's Epidemiological Surveillance System for Respiratory Diseases. Primary measurements: The analysed severe outcomes were hospitalization, pneumonia, use of mechanical ventilation, intensive care unit admission and death. The association (odds ratio) between the outcomes and clinical variables was evaluated, comparing the three epidemiological waves in Mexico. Results: Age over 65 is associated with a higher ratio of hospitalization and pneumonia, independent of the effect of chronic comorbidities. There is an interaction between age and obesity, which is associated with hospitalization, pneumonia and highly associated with death. These findings were consistent throughout the three epidemiological waves. Conclusion: Obesity, COPD and diabetes in interaction with age, are associated with worse clinical outcomes and, more importantly, death in patients with COVID-19.
Patients with multiple myeloma (MM), an age-dependent neoplasm of antibody-producing plasma cells, have compromised immune systems and might be at increased risk for severe COVID-19 outcomes. This study characterizes risk factors associated with clinical indicators of COVID-19 severity and all-cause mortality in myeloma patients utilizing NCATS' National COVID Cohort Collaborative (N3C) database. The N3C consortium is a large, centralized data resource representing the largest multi-center cohort of COVID-19 cases and controls nationwide (>16 million total patients, and >6 million confirmed COVID-19+ cases to date). Our cohort included myeloma patients (both inpatients and outpatients) within the N3C consortium who have been diagnosed with COVID-19 based on positive PCR or antigen tests or ICD-10-CM diagnosis code. The outcomes of interest include all-cause mortality (including discharge to hospice) during the index encounter and clinical indicators of severity (i.e., hospitalization/emergency department/ED visit, use of mechanical ventilation, or extracorporeal membrane oxygenation (ECMO)). Finally, causal inference analysis was performed using the propensity score matching (PSM) method. As of 05/16/2022, the N3C consortium included 1,061,748 cancer patients, out of which 26,064 were MM patients (8,588 were COVID-19 positive). The mean age at COVID-19 diagnosis was 65.89 years, 46.8% were females, and 20.2% were of black race. 4.47% of patients died within 30 days of COVID-19 hospitalization. Overall, the survival probability was 90.7% across the course of the study. Multivariate logistic regression analysis showed histories of pulmonary and renal disease, dexamethasone, proteasome inhibitor/PI, immunomodulatory/IMiD therapies, and severe Charlson Comorbidity Index/CCI were significantly associated with higher risks of severe COVID-19 outcomes. Protective associations were observed with blood-or-marrow transplant/BMT and COVID-19 vaccination. Further, multivariate cox proportional hazard analysis showed that high and moderate CCI levels, International Staging System (ISS) moderate or severe stage, and PI therapy were associated with worse survival, while BMT and COVID-19 vaccination were associated with lower risk of death. Finally, matched sample average treatment effect on the treated (SATT) confirmed the causal effect of BMT and vaccination status as top protective factors associated with COVID-19 risk among US patients suffering from multiple myeloma. To the best of our knowledge, this is the largest nationwide study on myeloma patients with COVID-19.
Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Since it is dangerous to collect data from patients with such contagious diseases, many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose the underlying diseases present in a limited number of uploaded recordings. However, this approach suffers from potential mislabeling of the coughs, as well as notable disagreement between experts. In this work, we use a semi-supervised learning (SSL) approach to improve the labeling consistency of the COUGHVID dataset and the robustness of COVID-19 versus healthy cough sound classification. First, we leverage existing SSL expert knowledge aggregation techniques to overcome the labeling inconsistencies and sparsity in the dataset. Next, our SSL approach is used to identify a subsample of re-labeled COUGHVID audio samples that can be used to train or augment future cough classification models. The consistency of the re-labeled data is demonstrated in that it exhibits a high degree of class separability, 3x higher than that of the user-labeled data, despite the expert label inconsistency present in the original dataset. Furthermore, the spectral differences in the user-labeled audio segments are amplified in the re-labeled data, resulting in significantly different power spectral densities between healthy and COVID-19 coughs, which demonstrates both the increased consistency of the new dataset and its explainability from an acoustic perspective. Finally, we demonstrate how the re-labeled dataset can be used to train a cough classifier. This SSL approach can be used to combine the medical knowledge of several experts to improve the database consistency for any diagnostic classification task.