Associations of Multimarkers of Metabolic Malnutrition and Inflammation With All‐Cause Mortality and Their Interplay With Thyroid Function
Setor K. Kunutsor, Reyhaneh Rikhtehgaran, Yanning Xu
et al.
ABSTRACT Introduction The metabolic vulnerability index (MVX)—a composite biomarker reflecting metabolic malnutrition and inflammation—has been linked to increased mortality risk in populations with cardiovascular disease. Thyroid function, a key regulator of metabolism and inflammation, may confound or modify this relationship, but evidence in the general population is limited. Objectives To evaluate the interplay between MVX and its subcomponents (inflammation vulnerability index, IVX and metabolic malnutrition index, MMX), thyroid function, and mortality risk in the general population. Methods In the PREVEND prospective study, which included 5446 participants (mean age 54 years; 49.9% male), both MVX (estimated using six metabolites measured simultaneously through nuclear magnetic resonance spectroscopy) and thyroid function (FT3, FT4, TSH) were evaluated at baseline. Hazard ratios (HRs) with 95% confidence intervals (CIs) for all‐cause mortality were estimated. Results During a median follow‐up of 14.1 years, 806 deaths were recorded. Spline analyses showed graded dose–response relationships of MVX, IVX and MMX with mortality risk. In separate analyses adjusted for several established risk factors, the HRs (95% CIs) of mortality were 1.28 (1.18–1.38), 1.23 (1.14–1.32) and 1.16 (1.07–1.25) per 1 standard deviation increment in MVX, IVX and MMX, respectively. The HRs remained consistent on further adjustment for FT3, FT4 and TSH. Sex as well as levels of FT3, FT4 and TSH did not significantly modify the associations. Conclusions The MVX and its subcomponents (IVX and MMX) are independently associated with all‐cause mortality, consistent with graded dose–response relationships. Thyroid function does not confound or modify these associations.
Diseases of the endocrine glands. Clinical endocrinology
Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent
Zhongzhen Huang, Yan Ling, Hong Chen
et al.
We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases. To evaluate its capabilities, we curated a benchmark of 82 authentic endocrinology case reports encompassing a broad spectrum of disease types and incidence levels. In controlled experiments, we compared PULSE's performance against physicians with varying levels of expertise-from residents to senior specialists-and examined how AI assistance influenced human diagnostic reasoning. PULSE attained expert-competitive accuracy, outperforming residents and junior specialists while matching senior specialist performance at both Top@1 and Top@4 thresholds. Unlike physicians, whose accuracy declined with disease rarity, PULSE maintained stable performance across incidence tiers. The agent also exhibited adaptive reasoning, increasing output length with case difficulty in a manner analogous to the longer deliberation observed among expert clinicians. When used collaboratively, PULSE enabled physicians to correct initial errors and broaden diagnostic hypotheses, but also introduced risks of automation bias. The study explores both serial and concurrent collaboration workflows, revealing that PULSE offers robust support across common and rare presentations. These findings underscore both the promise and the limitations of language model-based agents in clinical diagnosis, and offer a framework for evaluating their role in real-world decision-making.
Desteroidization as a methodology for reducing the risk of adrenal insufficiency in adult non-endocrine patients: literature data and own experience
М.Л. Кирилюк, В.І. Паньків
The article presents literature data and personal experience of observing non-endocrine patients after discontinuation of glucocorticoids or their withdrawal. The use of a new term and methodology of desteroidization to prevent glucocorticoid dependence is justified. Glucocorticoid withdrawal syndrome is interpreted as glucocorticoid dependence syndrome. The clinical signs of iatrogenic Cushing’s syndrome and glucocorticoid-induced adrenal insufficiency in adult patients are described, as well as modern clinical guidelines for managing patients after discontinuation of glucocorticoid therapy. Our own experience of observing non-endocrine patients shows that after discontinuation of glucocorticoid therapy: a) some patients take glucocorticoids themselves, for example, dexamethasone injections after inpatient treatment for severe COVID-19 infection to improve well-being and physical performance, which led to a fatal outcome (generalized iatrogenic Cushing’s syndrome, multiple organ failure) in one case; b) others reach a certain minimum dose (for example, 5 mg of prednisolone per day) and do not want to refuse glucocorticoid therapy due to fear of relapse or exacerbation of the disease; c) the third group cannot refuse glucocorticoid therapy due to the underlying disease (our patient has been taking glucocorticoids after kidney transplantation for more than 15 years); d) the fourth group require repeated administration of low doses of glucocorticoids in the absence of the complete recovery of the hypothalamic-pituitary-adrenal system after glucocorticoid withdrawal at the first signs of adrenal insufficiency (due to stress, dental procedures, hyperthermia, dehydration, sunstroke, acute respiratory viral infections, etc.). Data of modern clinical guidelines (European Society of Endocrinology and Endocrine Society (USA) 2024) on medical management of patients with non-endocrine pathology are presented.
Diseases of the endocrine glands. Clinical endocrinology
Approach to signal loss in intraoperative nerve monitoring in thyroid surgery questionnaire: a Turkish surgical perspective
Yalin Iscan, Irem Karatas, Nurcihan Aygun
et al.
PurposeThis study aimed to evaluate surgeons’ use of intraoperative nerve monitoring (IONM) during thyroidectomy and their approach to loss of signal (LOS) in various clinical scenarios.Materials and MethodsA survey was conducted by the Turkish Endocrine Surgery Society on members of the Society in February 2020 and consisted of 16 questions. The practice of IONM use, rate of inclusion in informed consent texts, and attitudes of participants in case of signal loss were investigated. The study was conducted with 183 participants between February 4-12, 2020.ResultsMost participants (58.2%) had more than 10 years of surgical experience and 36.6% performed more than 50 thyroidectomies annually. IONM was routinely used by 78.7% of the participants, whereas 16.4% reserved its use in difficult cases. Only 5.2% of the participants performed continuous monitoring. In case-based LOS scenarios, the majority of participants (approximately 60%) terminated the operation when the nerve was anatomically intact but LOS persisted, except in high-risk cancer cases. When the nerve anatomy was disrupted, most participants terminated the surgery, except for the high-risk cancer group. In cases of irreversible LOS with preserved nerve integrity, 58.9% of the patients preferred continuous vagus stimulation on the contralateral side, whereas 41.1% preferred intermittent nerve monitoring. Although 68.2% of the participants verbally informed the patients about the risks of LOS, only 24.4% provided this information on the consent form.ConclusionThe use of IONM in thyroid surgery is increasing in our country. However, there is still no consensus on the approach for staged thyroidectomy in cases of signal loss, and institutional and individual differences persist. Further studies are needed to determine the medical-legal implications and effects of these variations.
Diseases of the endocrine glands. Clinical endocrinology
The correlation between primary ovarian insufficiency, sex hormones and immune cells: a two-step Mendelian randomization study
Tongtong Hong, Danhua Pu, Jie Wu
BackgroundPrimary ovarian insufficiency (POI), a cause of female infertility, is characterized by elevated gonadotropin levels and fluctuating estrogen reductions, accompanied by irregular menstruation, osteoporosis, cardiovascular disease, and genitourinary syndrome of menopause. Previous studies have shown an association between POI and immune cells, but the causal relationship remains unclear. Sex hormones play a crucial role in immune regulation by influencing the function and levels of immune cells, suggesting they may be key mediators between POI and immune cells.MethodsUtilizing genome-wide association studies (GWAS), we conducted a comprehensive bidirectional two-sample Mendelian randomization (MR) analysis to explore the causal relationship between 731 immune cell traits and POI. Furthermore, a two-step MR analysis was employed to examine the potential mediating effects of sex hormones between these two systems. To ensure the robustness of our findings, we performed extensive sensitivity analyses, evaluating heterogeneity and horizontal pleiotropy.ResultsAfter FDR adjustment (PFDR < 0.05), ten immune cell phenotypes were significantly correlated with the risk of POI. Among these, one immune cell phenotype was identified as a risk factor for POI (OR > 1), while the other nine immune cell phenotypes were protective factors (OR < 1). In the reverse MR analysis, POI was positively correlated with seven immunocyte phenotypes (OR > 1) and negatively correlated with eleven immunocyte phenotypes (OR < 1). No potential mediating effects of ten sex hormones were found between POI and immune cell traits.ConclusionsOur study comprehensively assessed the correlation between immune cell phenotypes and POI in the European population, excluding the mediating role of sex hormones, thus providing valuable insights into the biological mechanisms of POI and informing early prevention and treatment strategies.
Diseases of the endocrine glands. Clinical endocrinology
The relationship between cardiometabolic Index and diabetic kidney disease in people with diabetes
Jianping Kong, Wenting Tao, Yuhong Sun
et al.
IntroductionStudies have shown a strong correlation between the cardiometabolic index (CMI) and health issues such as diabetes, atherosclerosis, and decreased renal function. Nevertheless, the correlation between CMI and diabetic kidney disease (DKD) remains ambiguous. The objective of this study is to evaluate the correlation between CMI and DKD in patients with diabetes in the United States.MethodsThe study involved individuals who were part of the National Health and Nutrition Examination Survey (NHANES) conducted between 2003 and 2018. A multivariable logistic regression analysis was employed for investigating the correlation between CMI and DKD. The study employed Generalized Additive Models (GAM) and smooth curve fitting methods for investigating the nonlinear relationship between CMI and DKD. Two-stage regression analysis was applied for investigating threshold effects in the connection between CMI and DKD. In addition, subgroup analysis and interaction tests were also carried out.ResultsThis analysis included a total of 6,540 adults with diabetes. After adjusting for variables including age, sex, race, education level, smoking status, household income and poverty rate, body mass index, hypertension status, aspartate aminotransferase, alanine aminotransferase, serum albumin, and serum globulin, we discovered a significant connection between CMI levels and the risk of DKD (OR=1.11, 95% CI: 1.05, 1.17, p<0.0001). Individuals with varying smoking statuses showed variations in this connection according to subgroup analysis and interaction tests (p for interaction=0.0216). Conversely, this correlation appeared similar across different genders, ages, races, BMI categories, hypertension statuses, and insulin usage among people with diabetes (all p for interaction >0.05). A nonlinear relationship existed between CMI and DKD, with threshold analysis indicating a turning point at CMI=1.7. A positive correlation was observed between CMI levels in people with diabetes and the risk of DKD when CMI exceeded 1.7.ConclusionThe risk of DKD was significantly positively correlated with the CMI levels of people with diabetes. Further larger prospective studies are required to confirm our results.
Diseases of the endocrine glands. Clinical endocrinology
A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs
Niccolò McConnell, Pardeep Vasudev, Daisuke Yamada
et al.
Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.
TRACER: Transfer Learning based Real-time Adaptation for Clinical Evolving Risk
Mengying Yan, Ziye Tian, Siqi Li
et al.
Clinical decision support tools built on electronic health records often experience performance drift due to temporal population shifts, particularly when changes in the clinical environment initially affect only a subset of patients, resulting in a transition to mixed populations. Such case-mix changes commonly arise following system-level operational updates or the emergence of new diseases, such as COVID-19. We propose TRACER (Transfer Learning-based Real-time Adaptation for Clinical Evolving Risk), a framework that identifies encounter-level transition membership and adapts predictive models using transfer learning without full retraining. In simulation studies, TRACER outperformed static models trained on historical or contemporary data. In a real-world application predicting hospital admission following emergency department visits across the COVID-19 transition, TRACER improved both discrimination and calibration. TRACER provides a scalable approach for maintaining robust predictive performance under evolving and heterogeneous clinical conditions.
Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases
Valentina Carbonari, Pierangelo Veltri, Pietro Hiram Guzzi
Recent advances in artificial intelligence, particularly large language models LLMs, have shown promising capabilities in transforming rare disease research. This survey paper explores the integration of LLMs in the analysis of rare diseases, highlighting significant strides and pivotal studies that leverage textual data to uncover insights and patterns critical for diagnosis, treatment, and patient care. While current research predominantly employs textual data, the potential for multimodal data integration combining genetic, imaging, and electronic health records stands as a promising frontier. We review foundational papers that demonstrate the application of LLMs in identifying and extracting relevant medical information, simulating intelligent conversational agents for patient interaction, and enabling the formulation of accurate and timely diagnoses. Furthermore, this paper discusses the challenges and ethical considerations inherent in deploying LLMs, including data privacy, model transparency, and the need for robust, inclusive data sets. As part of this exploration, we present a section on experimentation that utilizes multiple LLMs alongside structured questionnaires, specifically designed for diagnostic purposes in the context of different diseases. We conclude with future perspectives on the evolution of LLMs towards truly multimodal platforms, which would integrate diverse data types to provide a more comprehensive understanding of rare diseases, ultimately fostering better outcomes in clinical settings.
Evaluation of clinical utility in emulated clinical trials
Johannes Hruza, Arvid Sjölander, Erin Gabriel
et al.
Dynamic treatment regimes have been proposed to personalize treatment decisions by utilizing historical patient data, but they may not always improve on the current standard of care. It is thus meaningful to integrate the standard of care into the evaluation of treatment strategies, and previous works have suggested doing so through the concept of clinical utility. Here we will focus on the comparative component of clinical utility as the average outcome had the full population received treatment based on the proposed dynamic treatment regime in comparison to the full population receiving the ``standard" treatment assignment mechanism, such as a physician's choice. Clinical trials to evaluate clinical utility are rarely conducted, and thus, previous works have proposed an emulated clinical trial framework using observational data. However, only one simple estimator was previously suggested, and the practical details of how one would conduct this emulated trial were not detailed. Here, we illuminate these details and propose several estimators of clinical utility based on estimators proposed in the dynamic treatment regime literature. We illustrate the considerations and the estimators in a real data example investigating treatment rules for rheumatoid arthritis, where we highlight that in addition to the standard of care, the current medical guidelines should also be compared to any estimated ``optimal'' decision rule.
Social Determinants of Health and Cardiovascular Risk among Adults with Diabetes: The Reasons for Geographic and Racial Differences in Stroke (REGARDS) Study
Lisa Zhang, Evgeniya Reshetnyak, Joanna B. Ringel
et al.
Background Social determinants of health (SDOH) have been associated with diabetes risk; however, their association with cardiovascular disease (CVD) events in individuals with diabetes is poorly described. We hypothesized that a greater number of SDOH among individuals with diabetes would be associated with a higher risk of CVD events. Methods The REasons for Geographic and Racial Differences in Stroke (REGARDS) study is a national, biracial cohort of 30,239 individuals ≥45 years old recruited in 2003–2007. We included 6,322 participants with diabetes at baseline, defined as healthcare professional diagnosis, diabetes medication use, or blood glucose values. Seven SDOH that were individually associated with CVD events were included (P<0.20). The outcome was CVD events, a composite of expert-adjudicated myocardial infarction, stroke, or cardiovascular death. We estimated Cox proportional hazard models to examine associations between number of SDOH (0, 1, 2, ≥3) and CVD events. Results In an age and sex adjusted model, the presence of multiple SDOH significantly increased the risk of any CVD event (hazard ratio [HR], 1.48; 95% confidence interval [CI], 1.26 to 1.74 for two SDOH; HR, 1.68; 95% CI, 1.43 to 1.96 for ≥3 SDOH). This finding was attenuated but remained statistically significant in a fully adjusted model (HR, 1.19; 95% CI, 1.01 to 1.40 for two SDOH; HR, 1.27; 95% CI, 1.07 to 1.50 for ≥3 SDOH). Conclusion Having multiple SDOH was independently associated with an increased risk of CVD events, a finding driven by cardiovascular death. Identifying individuals with diabetes who have multiple SDOH may be helpful for detecting those at higher risk of experiencing or dying from CVD events.
Diseases of the endocrine glands. Clinical endocrinology
Experience of X-linked hypophosphatemic rickets in the Gulf Cooperation Council countries: case series
Fahad Al-Juraibah, Adnan Al Shaikh, Afaf Al-Sagheir
et al.
X-linked hypophosphatemic rickets (XLH), the most prevalent form of inherited hypophosphatemic rickets, is caused by loss-of-function mutations in the gene encoding phosphate-regulating endopeptidase homolog, X-linked (PHEX). This case series presents 14 cases of XLH from Gulf Cooperation Council (GCC) countries. The patients’ medical history, biochemical and radiological investigative findings, as well as treatment responses and side effects from both conventional and burosumab therapy, are described. Cases were aged 2–40 years at diagnosis. There were two male cases and 12 female cases. All cases were treated with conventional therapy which resulted in a lack of improvement in or worsening of the clinical signs and symptoms of rickets or biochemical parameters. Side effects of conventional therapy included nausea, diarrhea, abdominal pain, nephrocalcinosis, and hyperparathyroidism, which affected the patients’ quality of life and adherence to treatment. In the 10 patients treated with burosumab, there was a marked improvement in the biochemical markers of rickets, with a mean increase in serum phosphate of +0.56 mmol/L and tubular maximum phosphate reabsorption (TmP) to glomerular filtration rate (GFR) ratio (TmP/GFR) of +0.39 mmol/L at 12 months compared to baseline. Furthermore, a mean decrease in serum alkaline phosphatase (ALP) of −80.80 IU/L and parathyroid hormone (PTH) of −63.61 pmol/L at 12 months compared to baseline was observed in these patients. Additionally, patients treated with burosumab reported reduced pain, muscle weakness, and fatigue as well as the ability to lead more physically active lives with no significant side effects of treatment.
Diseases of the endocrine glands. Clinical endocrinology
Effectiveness of Neuromuscular Taping on Balance, Proprioception, Pain, and Nerve Conduction Parameters in Patients with Diabetic Peripheral Neuropathy: A Two-Group Pretest–Posttest Randomized Sham-Controlled Trial: A Pilot Study
Kanika Thakur, Manu Goyal
Aim:
Neuromuscular Taping (NMT) is the application of elastic adhesive tape to the skin without any tension in it. NMT creates wrinkles on the surface of the skin to help stretch the skin passively and this elongation force assists in muscle contraction and relaxation. Therefore, this study aimed to assess the effectiveness of NMT in the improvement of sensorimotor complications following diabetic peripheral neuropathy (DPN).
Setting and Design:
A randomized controlled trial performed at a tertiary health care center.
Materials and Methods:
A total of 20 participants were recruited and were divided into two groups: The experimental group (EG; N = 10) and the control group (CG; N = 10). Eight weeks of physiotherapy intervention including NMT on bilateral tibialis anterior, tibialis posterior, peroneus longus muscles, transverse arch of the foot, and transcutaneous electrical nerve stimulation (TENS) on the course of bilateral tibial and peroneal nerves for EG. CG received sham taping and TENS as an EG group. The patients were assessed for pre- and post-intervention using the outcomes: Leeds Assessment of Neuropathic Signs and Symptoms and nerve conduction velocity of tibial, peroneal, and sural nerves and H-reflex, Berg Balance Scale, and Pedalo-Sensamove Mini Board.
Results:
The results revealed that both the groups showed significant improvements in all variables except for the H-reflex of the right and left sides at P > 0.05. EG showed more clinical and symptomatic improvement compared with CG. In EG, most of the variables showed moderate to large effect sizes ranging from 0.66 to 0.97 except for bilateral H-reflex; ranging from 0.15 to 0.33.
Conclusion:
This study concludes that NMT can be a trustworthy approach to treating DPN.
Diseases of the endocrine glands. Clinical endocrinology
Automated Clinical Data Extraction with Knowledge Conditioned LLMs
Diya Li, Asim Kadav, Aijing Gao
et al.
The extraction of lung lesion information from clinical and medical imaging reports is crucial for research on and clinical care of lung-related diseases. Large language models (LLMs) can be effective at interpreting unstructured text in reports, but they often hallucinate due to a lack of domain-specific knowledge, leading to reduced accuracy and posing challenges for use in clinical settings. To address this, we propose a novel framework that aligns generated internal knowledge with external knowledge through in-context learning (ICL). Our framework employs a retriever to identify relevant units of internal or external knowledge and a grader to evaluate the truthfulness and helpfulness of the retrieved internal-knowledge rules, to align and update the knowledge bases. Experiments with expert-curated test datasets demonstrate that this ICL approach can increase the F1 score for key fields (lesion size, margin and solidity) by an average of 12.9% over existing ICL methods.
DALL-M: Context-Aware Clinical Data Augmentation with LLMs
Chihcheng Hsieh, Catarina Moreira, Isabel Blanco Nobre
et al.
X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating the integration of structured clinical features with radiology reports. To address this, we introduce DALL-M, a novel framework that enhances clinical datasets by generating contextual synthetic data. DALL-M augments structured patient data, including vital signs (e.g., heart rate, oxygen saturation), radiology findings (e.g., lesion presence), and demographic factors. It integrates this tabular data with contextual knowledge extracted from radiology reports and domain-specific resources (e.g., Radiopaedia, Wikipedia), ensuring clinical consistency and reliability. DALL-M follows a three-phase process: (i) clinical context storage, (ii) expert query generation, and (iii) context-aware feature augmentation. Using large language models (LLMs), it generates both contextual synthetic values for existing clinical features and entirely new, clinically relevant features. Applied to 799 cases from the MIMIC-IV dataset, DALL-M expanded the original 9 clinical features to 91. Empirical validation with machine learning models (including Decision Trees, Random Forests, XGBoost, and TabNET) demonstrated a 16.5% improvement in F1 score and a 25% increase in Precision and Recall. DALL-M bridges an important gap in clinical data augmentation by preserving data integrity while enhancing predictive modeling in healthcare. Our results show that integrating LLM-generated synthetic features significantly improves model performance, making DALL-M a scalable and practical approach for AI-driven medical diagnostics.
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan, Haitian Liu, Tengxiao Wu
et al.
LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.
Data-driven subgrouping of patient trajectories with chronic diseases: Evidence from low back pain
Christof Naumzik, Alice Kongsted, Werner Vach
et al.
Clinical data informs the personalization of health care with a potential for more effective disease management. In practice, this is achieved by subgrouping, whereby clusters with similar patient characteristics are identified and then receive customized treatment plans with the goal of targeting subgroup-specific disease dynamics. In this paper, we propose a novel mixture hidden Markov model for subgrouping patient trajectories from chronic diseases. Our model is probabilistic and carefully designed to capture different trajectory phases of chronic diseases (i.e., "severe", "moderate", and "mild") through tailored latent states. We demonstrate our subgrouping framework based on a longitudinal study across 847 patients with non-specific low back pain. Here, our subgrouping framework identifies 8 subgroups. Further, we show that our subgrouping framework outperforms common baselines in terms of cluster validity indices. Finally, we discuss the applicability of the model to other chronic and long-lasting diseases.
Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding
Bokang Bi, Leibo Liu, Sanja Lujic
et al.
The increasing volume and complexity of clinical documentation in Electronic Medical Records systems pose significant challenges for clinical coders, who must mentally process and summarise vast amounts of clinical text to extract essential information needed for coding tasks. While large language models have been successfully applied to shorter summarisation tasks in recent years, the challenge of summarising a hospital course remains an open area for further research and development. In this study, we adapted three pre trained LLMs, Llama 3, BioMistral, Mistral Instruct v0.1 for the hospital course summarisation task, using Quantized Low Rank Adaptation fine tuning. We created a free text clinical dataset from MIMIC III data by concatenating various clinical notes as the input clinical text, paired with ground truth Brief Hospital Course sections extracted from the discharge summaries for model training. The fine tuned models were evaluated using BERTScore and ROUGE metrics to assess the effectiveness of clinical domain fine tuning. Additionally, we validated their practical utility using a novel hospital course summary assessment metric specifically tailored for clinical coding. Our findings indicate that fine tuning pre trained LLMs for the clinical domain can significantly enhance their performance in hospital course summarisation and suggest their potential as assistive tools for clinical coding. Future work should focus on refining data curation methods to create higher quality clinical datasets tailored for hospital course summary tasks and adapting more advanced open source LLMs comparable to proprietary models to further advance this research.
Beyond the exome: what's next in diagnostic testing for Mendelian conditions
Monica H. Wojcik, Chloe M. Reuter, Shruti Marwaha
et al.
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order and emerging technologies, such as optical genome mapping and long-read DNA or RNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to a consortium such as GREGoR, which is focused on elucidating the underlying cause of rare unsolved genetic disorders.
ChatGPT Assisting Diagnosis of Neuro-ophthalmology Diseases Based on Case Reports
Yeganeh Madadi, Mohammad Delsoz, Priscilla A. Lao
et al.
Objective: To evaluate the efficiency of large language models (LLMs) such as ChatGPT to assist in diagnosing neuro-ophthalmic diseases based on detailed case descriptions. Methods: We selected 22 different case reports of neuro-ophthalmic diseases from a publicly available online database. These cases included a wide range of chronic and acute diseases that are commonly seen by neuro-ophthalmic sub-specialists. We inserted the text from each case as a new prompt into both ChatGPT v3.5 and ChatGPT Plus v4.0 and asked for the most probable diagnosis. We then presented the exact information to two neuro-ophthalmologists and recorded their diagnoses followed by comparison to responses from both versions of ChatGPT. Results: ChatGPT v3.5, ChatGPT Plus v4.0, and the two neuro-ophthalmologists were correct in 13 (59%), 18 (82%), 19 (86%), and 19 (86%) out of 22 cases, respectively. The agreement between the various diagnostic sources were as follows: ChatGPT v3.5 and ChatGPT Plus v4.0, 13 (59%); ChatGPT v3.5 and the first neuro-ophthalmologist, 12 (55%); ChatGPT v3.5 and the second neuro-ophthalmologist, 12 (55%); ChatGPT Plus v4.0 and the first neuro-ophthalmologist, 17 (77%); ChatGPT Plus v4.0 and the second neuro-ophthalmologist, 16 (73%); and first and second neuro-ophthalmologists 17 (17%). Conclusions: The accuracy of ChatGPT v3.5 and ChatGPT Plus v4.0 in diagnosing patients with neuro-ophthalmic diseases was 59% and 82%, respectively. With further development, ChatGPT Plus v4.0 may have potential to be used in clinical care settings to assist clinicians in providing quick, accurate diagnoses of patients in neuro-ophthalmology. The applicability of using LLMs like ChatGPT in clinical settings that lack access to subspeciality trained neuro-ophthalmologists deserves further research.