Lisa S. Rotenstein, Zili He, James Dziura et al.
Hasil untuk "Specialties of internal medicine"
Menampilkan 20 dari ~4991680 hasil · dari arXiv, DOAJ, CrossRef, Semantic Scholar
Mahmoud Alwakeel, Aditya Nagori, Vijay Krishnamoorthy et al.
Objectives: To evaluate the current limitations of large language models (LLMs) in medical question answering, focusing on the quality of datasets used for their evaluation. Materials and Methods: Widely-used benchmark datasets, including MedQA, MedMCQA, PubMedQA, and MMLU, were reviewed for their rigor, transparency, and relevance to clinical scenarios. Alternatives, such as challenge questions in medical journals, were also analyzed to identify their potential as unbiased evaluation tools. Results: Most existing datasets lack clinical realism, transparency, and robust validation processes. Publicly available challenge questions offer some benefits but are limited by their small size, narrow scope, and exposure to LLM training. These gaps highlight the need for secure, comprehensive, and representative datasets. Conclusion: A standardized framework is critical for evaluating LLMs in medicine. Collaborative efforts among institutions and policymakers are needed to ensure datasets and methodologies are rigorous, unbiased, and reflective of clinical complexities.
Gina D'Angelo, Xiaowen Tian, Chuyu Deng et al.
Precision medicine is an evolving area in the medical field and rely on biomarkers to make patient enrichment decisions, thereby providing drug development direction. A traditional statistical approach is to find the cut-off that leads to the minimum p-value of the interaction between the biomarker dichotomized at that cut-off and treatment. Such an approach does not incorporate clinical significance and the biomarker is not evaluated on a continuous scale. We are proposing to evaluate the biomarker in a continuous manner from a predicted risk standpoint, based on the model that includes the interaction between the biomarker and treatment. The predicted risk can be graphically displayed to explain the relationship between the outcome and biomarker, whereby suggesting a cut-off for biomarker positive/negative groups. We adapt the TreatmentSelection approach and extend it to account for covariates via G-computation. Other features include biomarker comparisons using net gain summary measures and calibration to assess the model fit. The PRIME (Predictive biomarker graphical approach) approach is flexible in the type of outcome and covariates considered. A R package is available and examples will be demonstrated.
Mohammad Amaan Sayeed, Mohammed Talha Alam, Raza Imam et al.
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the agentic prompt adds another 10% improvement through deeper mechanistic insight and safety considerations. Our results demonstrate that blending classical Islamic texts with retrieval and self-evaluation enables reliable, culturally sensitive medical question-answering.
Hyunjae Kim, Jiwoong Sohn, Aidan Gilson et al.
Large language models (LLMs) are transforming the landscape of medicine, yet two fundamental challenges persist: keeping up with rapidly evolving medical knowledge and providing verifiable, evidence-grounded reasoning. Retrieval-augmented generation (RAG) has been widely adopted to address these limitations by supplementing model outputs with retrieved evidence. However, whether RAG reliably achieves these goals remains unclear. Here, we present the most comprehensive expert evaluation of RAG in medicine to date. Eighteen medical experts contributed a total of 80,502 annotations, assessing 800 model outputs generated by GPT-4o and Llama-3.1-8B across 200 real-world patient and USMLE-style queries. We systematically decomposed the RAG pipeline into three components: (i) evidence retrieval (relevance of retrieved passages), (ii) evidence selection (accuracy of evidence usage), and (iii) response generation (factuality and completeness of outputs). Contrary to expectation, standard RAG often degraded performance: only 22% of top-16 passages were relevant, evidence selection remained weak (precision 41-43%, recall 27-49%), and factuality and completeness dropped by up to 6% and 5%, respectively, compared with non-RAG variants. Retrieval and evidence selection remain key failure points for the model, contributing to the overall performance drop. We further show that simple yet effective strategies, including evidence filtering and query reformulation, substantially mitigate these issues, improving performance on MedMCQA and MedXpertQA by up to 12% and 8.2%, respectively. These findings call for re-examining RAG's role in medicine and highlight the importance of stage-aware evaluation and deliberate system design for reliable medical LLM applications.
D.Alan Herbst, MD, Banafsheh Shakibajahromi, MD, Michael V. Genuardi, MD et al.
Advanced heart failure is associated with accelerated brain atrophy, largely related to chronic cerebral malperfusion. Both heart transplantation (HT) and left ventricular assist device (LVAD) implantation improve vital organ perfusion, but the comparative effect on brain atrophy remains unclear. Given the MR incompatibility of LVADs, we leveraged serial CT imaging in patients who underwent either HT or LVAD implantation. 58 patients were included in this single-center retrospective cohort (23 LVAD; 35 HT). LVAD patients experienced greater brain atrophy (median: 7.1 mL/year; IQR: 0.9–15.7) than transplant patients (median: 0.4 mL/year; IQR: −6.7–13.9), but this difference was non-significant (p=0.09). Temporal atrophy (expansion of the Sylvian fissure) was greater in LVAD patients (median: 0.91 mm/year; IQR: 0.14–2.27) than HT patients (median: 0.10 mm/year; IQR: 0.02–0.55), p=0.005. These observations reveal a need for future work to prospectively quantify brain atrophy after LVAD implantation and HT, while comparing with that of advanced heart failure.
Yanling Xiao, Lixia Liu, Xiaoying Peng et al.
Abstract Background Gastrointestinal bleeding (GIB) is associated with high mortality rates among critically ill patients. The hemoglobin-to-red blood cell distribution width ratio (HRR) has recently emerged as a potential prognostic marker in various clinical settings. However, the association between HRR and prognosis in critically ill patients with GIB is unclear. Methods We conducted a retrospective cohort study using the MIMIC-IV database (version 2.2). Patients diagnosed with GIB were included based on predefined criteria. The HRR was calculated as the ratio of hemoglobin to red blood cell distribution width. Kaplan-Meier curves and multivariate Cox regression models assessed the association between HRR and 180-day mortality. Restricted cubic spline curves were employed to evaluate the nonlinear relationship between HRR and mortality. Additionally, a segmented regression model was constructed to determine the threshold effect in nonlinearity. Subgroup analyses were performed to assess the consistency of the relationship between HRR and 180-day mortality across different patient populations. Results A total of 2,346 patients met the inclusion criteria. Higher HRR was independently associated with reduced 180-day all-cause mortality (adjusted HR, 0.15; 95% CI, 0.07–0.31; P < 0.001). Non-linear associations were observed using restricted cubic splines (P for overall < 0.001, P for non-linearity = 0.002). When HRR was less than 0.81, each unit increase in HRR was associated with a 90% reduction in 180-day mortality among patients with GIB (HR, 0.10; 95% CI, 0.04–0.24; P < 0.001). Subgroup analyses demonstrated that the association between HRR and 180-day mortality was consistent across all subgroups. Conclusion HRR exhibits a significant nonlinear negative association with 180-day mortality in critically ill patients with GIB. This association was consistent across multiple subgroups, suggesting that HRR may serve as a simple and effective prognostic biomarker in patients with GIB.
Bongsu Kang, Jundong Kim, Tae-Rim Yun et al.
We propose a natural language prompt-based retrieval augmented generation (Prompt-RAG), a novel approach to enhance the performance of generative large language models (LLMs) in niche domains. Conventional RAG methods mostly require vector embeddings, yet the suitability of generic LLM-based embedding representations for specialized domains remains uncertain. To explore and exemplify this point, we compared vector embeddings from Korean Medicine (KM) and Conventional Medicine (CM) documents, finding that KM document embeddings correlated more with token overlaps and less with human-assessed document relatedness, in contrast to CM embeddings. Prompt-RAG, distinct from conventional RAG models, operates without the need for embedding vectors. Its performance was assessed through a Question-Answering (QA) chatbot application, where responses were evaluated for relevance, readability, and informativeness. The results showed that Prompt-RAG outperformed existing models, including ChatGPT and conventional vector embedding-based RAGs, in terms of relevance and informativeness. Despite challenges like content structuring and response latency, the advancements in LLMs are expected to encourage the use of Prompt-RAG, making it a promising tool for other domains in need of RAG methods.
Khadija Khatun, Chen Shen, Jun Tanimoto et al.
Understanding how cooperation emerges in public goods games is crucial for addressing societal challenges. While optional participation can establish cooperation without identifying cooperators, it relies on specific assumptions -- that individuals abstain and receive a non-negative payoff, or that non-participants cause damage to public goods -- which limits our understanding of its broader role. We generalize this mechanism by considering non-participants' payoffs and their potential direct influence on public goods, allowing us to examine how various strategic motives for non-participation affect cooperation. Using replicator dynamics, we find that cooperation thrives only when non-participants are motivated by individualistic or prosocial values, with individualistic motivations yielding optimal cooperation. These findings are robust to mutation, which slightly enlarges the region where cooperation can be maintained through cyclic dominance among strategies. Our results suggest that while optional participation can benefit cooperation, its effectiveness is limited and highlights the limitations of bottom-up schemes in supporting public goods.
Candice P. Chu
ChatGPT, the most accessible generative artificial intelligence (AI) tool, offers considerable potential for veterinary medicine, yet a dedicated review of its specific applications is lacking. This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine. It intends to provide specific guidance and actionable examples of how generative AI can be directly utilized by veterinary professionals without a programming background. For practitioners, ChatGPT can extract patient data, generate progress notes, and potentially assist in diagnosing complex cases. Veterinary educators can create custom GPTs for student support, while students can utilize ChatGPT for exam preparation. ChatGPT can aid in academic writing tasks in research, but veterinary publishers have set specific requirements for authors to follow. Despite its transformative potential, careful use is essential to avoid pitfalls like hallucination. This review addresses ethical considerations, provides learning resources, and offers tangible examples to guide responsible implementation. Carefully selected, up-to-date links to platforms that host large language models are provided for advanced readers with programming capability. A table of key takeaways was provided to summarize this review. By highlighting potential benefits and limitations, this review equips veterinarians, educators, and researchers to harness the power of ChatGPT effectively.
Do Young Kim, Sung Hea Kim, Eung-Ju Kim et al.
Abstract Introduction The ROsulord® sAfety for patients with Dyslipidemia study (ROAD study) in the Republic of Korea investigated the safety and efficacy of rosuvastatin in routine clinical practice. Methods This non-interventional, multicenter, prospective, observational study was conducted over a period of approximately 4.6 years and involved 14,243 participants. During this study, we assessed the adverse events, changes in laboratory test results, and efficacy endpoints associated with rosuvastatin use. Results The findings revealed a notably low adverse event rate of 1.63%, indicating a favorable safety profile for rosuvastatin in the management of dyslipidemia. Importantly, no clinically significant incidences of statin-associated myopathy, hepatotoxicity, or diabetes were observed during the study period. Moreover, this study demonstrated significant improvements in lipid profiles among patients receiving rosuvastatin treatment, with a reduction in total cholesterol, low-density lipoprotein cholesterol, and triglyceride levels. These improvements contributed to a lower cardiovascular risk in the study population. Conclusion Overall, these findings suggest that rosuvastatin is safe and effective in managing dyslipidemia in real-world clinical settings, providing clinicians with valuable insights into the benefits and risks associated with statin therapy in this patient population.
Paul D. W. Kirk, Filippo Pagani, Sylvia Richardson
Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide information on a diverse array of different biomolecular processes and pathways. Different groups of variables (e.g. genes or proteins) will be implicated in different biomolecular processes, and hence undertaking analyses that are limited to identifying just a single clustering partition of the whole dataset is therefore liable to conflate the multiple clustering structures that may arise from these distinct processes. To address this, we propose a multi-view Bayesian mixture model that identifies groups of variables (``views"), each of which defines a distinct clustering structure. We consider applications in stratified medicine, for which our principal goal is to identify clusters of patients that define distinct, clinically actionable disease subtypes. We adopt the semi-supervised, outcome-guided mixture modelling approach of Bayesian profile regression that makes use of a response variable in order to guide inference toward the clusterings that are most relevant in a stratified medicine context. We present the model, together with illustrative simulation examples, and examples from pan-cancer proteomics. We demonstrate how the approach can be used to perform integrative clustering, and consider an example in which different 'omics datasets are integrated in the context of breast cancer subtyping.
Evgeny S. Saveliev, Mihaela van der Schaar
TemporAI is an open source Python software library for machine learning (ML) tasks involving data with a time component, focused on medicine and healthcare use cases. It supports data in time series, static, and eventmodalities and provides an interface for prediction, causal inference, and time-to-event analysis, as well as common preprocessing utilities and model interpretability methods. The library aims to facilitate innovation in the medical ML space by offering a standardized temporal setting toolkit for model development, prototyping and benchmarking, bridging the gaps in the ML research, healthcare professional, medical/pharmacological industry, and data science communities. TemporAI is available on GitHub (https://github.com/vanderschaarlab/temporai) and we welcome community engagement through use, feedback, and code contributions.
Clemence J. Belle, James M. Lonie, Sandra Brosda et al.
The poor treatment response of oesophageal adenocarcinoma (OAC) leads to low survival rates. Its increasing incidence makes finding more effective treatment a priority. Recent treatment improvements can be attributed to the inclusion of the tumour microenvironment (TME) and immune infiltrates in treatment decisions. OAC TME is largely immunosuppressed and reflects treatment resistance as patients with inflamed TME have better outcomes. Priming the tumour with the appropriate neoadjuvant chemoradiotherapy treatment could lead to higher immune infiltrations and higher expression of immune checkpoints, such as PD-1/PDL-1, CTLA4 or emerging new targets: LAG-3, TIM-3, TIGIT or ICOS. Multiple trials support the addition of immune checkpoint inhibitors to the current standard of care. However, results vary, supporting the need for better response biomarkers based on TME composition. This review explores what is known about OAC TME, the clinical significance of the various cell populations infiltrating it and the emerging therapeutical combination with a focus on immune checkpoints inhibitors.
Anisia-Iuliana Alexa, Department of Ophthalmology, “Grigore T. Popa” University of Medicine and Pharmacy, Iași, Romania, Alin Dumitru Ciubotaru et al.
Roberta Gaudiano, Marcello Trizzino, Salvatore Torre et al.
Enterococcus hirae is a rare pathogen in human infections, although its incidence may be underestimated due to its difficult isolation. We describe the first known case of E. hirae infective endocarditis (IE), which involves the mitral valve alone, and the seventh E. hirae IE worldwide. Case presentation: a 62-year-old male was admitted to our department with a five-month history of intermittent fever without responding to antibiotic treatment. His medical history included mitral valve prolapse, recent pleurisy, and lumbar epidural steroid injections due to lumbar degenerative disc disease. Pre-admission transesophageal echocardiography (TEE) showed mitral valve vegetation, and Enterococcus faecium was isolated on blood cultures by MALDI-TOF VITEK MS. During hospitalization, intravenous (IV) therapy with ampicillin and ceftriaxone was initiated, and E. hirae was identified by MALDI-TOF Bruker Biotyper on three blood culture sets. A second TEE revealed mitral valve regurgitation, which worsened due to infection progression. The patient underwent mitral valve replacement with a bioprosthetic valve and had an uncomplicated postoperative course; he was discharged after six weeks of IV ampicillin and ceftriaxone treatment.
Dominique Vervoort, Ge Bai
Jinhua Chen, Zhenhua Yin, Wenping Song et al.
Accumulating evidence has showed that sushi-repeat-containing protein X-linked 2 (SRPX2) is an abnormal expression in a variety of cancers and involved in cancer carcinogenesis, chemosensitivity, and prognosis, which mainly promote cancer cell metastasis, invasion, and migration by regulating the uPAR/integrins/FAK signaling pathway, epithelial-mesenchymal transition (EMT), angiogenesis, and glycosylation. Inflammation has been regarded as a key role in regulating cancer initiation, progression, EMT, and therapeutics. Furthermore, SRPX2 exhibited excellent antifibrosis effect via the TGFβR1/SMAD3/SRPX2/AP1/SMAD7 signaling pathway. Therefore, this review provides compelling evidence that SRPX2 might be a therapeutic target for inflammation and cancer-related inflammation for future cancer therapeutics.
John Illman
Omar Vázquez-Estrada, Anays Acevedo-Barrera, Alexander Nahmad-Rohen et al.
Light's internal reflectivity near a critical angle is very sensitive to the angle of incidence and the optical properties of the external medium near the interface. Novel applications in biology and medicine of subcritical internal reflection are being pursued. In many practical situations the refractive index of the external medium may vary with respect to its bulk value due to different physical phenomena at surfaces. Thus, there is a pressing need to understand the effects of a refractive-index gradient at a surface for near-critical-angle reflection. In this work we investigate theoretically the reflectivity near the critical angle at an interface with glass assuming the external medium has a continuous depth-dependent refractive index. We present graphs of the internal reflectivity as a function of the angle of incidence, which exhibit the effects of a refractive-index gradient at the interface. We analyse the behaviour of the reflectivity curves before total internal reflection is achieved. Our results provide insight into how one can recognise the existence of a refractive-index gradient at the interface and shed light on the viability of characterising it.
Halaman 14 dari 249584