Converting pretrained attention modules such as grouped-query attention (GQA) into multi-head latent attention (MLA) can improve expressivity without increasing KV-cache cost, making it attractive for efficient inference. However, many practical conversion baselines rely on weight-only low-rank approximations (e.g., SVD-style initializations) and uniform rank allocation. They focus on minimizing the difference between weight matrices rather than on how those weights affect input activations, ignore the covariance structure of activations, and enforce uniform rank across layers, causing activation drift and degraded attention fidelity. To address these issues, we propose CARE, a Covariance-Aware, Rank-Enhanced MLA conversion pipeline under a fixed KV width. CARE introduces three key steps: (i) activation-preserving factorization, which aligns the approximation with the actual input activations rather than just the weights; (ii) adjusted-rank allocation, which spreads a fixed KV budget across layers by giving more capacity to layers that need it most; and (iii) KV-parity mapping, which reparameterizes the converted K and V to fit the MLA format while keeping the KV-cache size unchanged. Our method outperforms a uniform-rank SVD baseline on Qwen3-4B/30B-A3B-Instruct-2507 and Llama-3.1-8B/70B-Instruct, reducing one-shot perplexity by up to 215x and improving mean accuracy by up to 1.70x at matched KV budgets. With a brief post-SVD healing fine-tune, we fully recover the original model's accuracy.
Large Language Models (LLMs) have been widely adopted across various domains, yet their application in the medical field poses unique challenges, particularly concerning the generation of hallucinations. Hallucinations in open-ended long medical text manifest as misleading critical claims, which are difficult to verify due to two reasons. First, critical claims are often deeply entangled within the text and cannot be extracted based solely on surface-level presentation. Second, verifying these claims is challenging because surface-level token-based retrieval often lacks precise or specific evidence, leaving the claims unverifiable without deeper mechanism-based analysis. In this paper, we introduce a novel method termed Iterative Tree Analysis (ITA) for medical critics. ITA is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process. This process involves a combination of top-down task decomposition and bottom-up evidence consolidation, enabling precise verification of complex medical claims through detailed mechanism-level reasoning. Our extensive experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%. Additionally, we will release a comprehensive test set to the public, aiming to foster further advancements in research within this domain.
Multimodal large language models (MLLMs) have shown strong potential for medical image reasoning, yet fairness across demographic groups remains a major concern. Existing debiasing methods often rely on large labeled datasets or fine-tuning, which are impractical for foundation-scale models. We explore In-Context Learning (ICL) as a lightweight, tuning-free alternative for improving fairness. Through systematic analysis, we find that conventional demonstration selection (DS) strategies fail to ensure fairness due to demographic imbalance in selected exemplars. To address this, we propose Fairness-Aware Demonstration Selection (FADS), which builds demographically balanced and semantically relevant demonstrations via clustering-based sampling. Experiments on multiple medical imaging benchmarks show that FADS consistently reduces gender-, race-, and ethnicity-related disparities while maintaining strong accuracy, offering an efficient and scalable path toward fair medical image reasoning. These results highlight the potential of fairness-aware in-context learning as a scalable and data-efficient solution for equitable medical image reasoning.
Vishalie Shah, Julia Hatamyar, Taufik Hidayat
et al.
This paper uses instrumental causal forests, a novel machine learning method, to explore the treatment effect heterogeneity of Indonesia's conditional cash transfer scheme on maternal health care utilisation. Using randomised programme assignment as an instrument for enrollment in the scheme, we estimate conditional local average treatment effects for four key outcomes: good assisted delivery, delivery in a health care facility, pre-natal visits, and post-natal visits. We find significant treatment effect heterogeneity by supply-side characteristics, even though supply-side readiness was taken into account during programme development. Mothers in areas with more doctors, nurses, and delivery assistants were more likely to benefit from the programme, in terms of increased rates of good assisted delivery outcome. We also find large differences in benefits according to indicators of household poverty and survey wave, reflecting the possible impact of changes in programme design in its later years. The impact on post-natal visits in 2013 displayed the largest heterogeneity among all outcomes, with some women less likely to attend post-natal check ups after receiving the cash transfer in the long term.
Johannes Kiechle, Stefan M. Fischer, Daniel M. Lang
et al.
The sharp rise in medical tomography examinations has created a demand for automated systems that can reliably extract informative features for downstream tasks such as tumor characterization. Although 3D volumes contain richer information than individual slices, effective 3D classification remains difficult: volumetric data encode complex spatial dependencies, and the scarcity of large-scale 3D datasets has constrained progress toward 3D foundation models. As a result, many recent approaches rely on 2D vision foundation models trained on natural images, repurposing them as feature extractors for medical scans with surprisingly strong performance. Despite their practical success, current methods that apply 2D foundation models to 3D scans via slice-based decomposition remain fundamentally limited. Standard slicing along axial, sagittal, and coronal planes often fails to capture the true spatial extent of a structure when its orientation does not align with these canonical views. More critically, most approaches aggregate slice features independently, ignoring the underlying 3D geometry and losing spatial coherence across slices. To overcome these limitations, we propose TomoGraphView, a novel framework that integrates omnidirectional volume slicing with spherical graph-based feature aggregation. Instead of restricting the model to axial, sagittal, or coronal planes, our method samples both canonical and non-canonical cross-sections generated from uniformly distributed points on a sphere enclosing the volume. We publicly share our accessible code base at http://github.com/compai-lab/2025-MedIA-kiechle and provide a user-friendly library for omnidirectional volume slicing at https://pypi.org/project/OmniSlicer.
We investigate fine-tuning Vision-Language Models (VLMs) for multi-task medical image understanding, focusing on detection, localization, and counting of findings in medical images. Our objective is to evaluate whether instruction-tuned VLMs can simultaneously improve these tasks, with the goal of enhancing diagnostic accuracy and efficiency. Using MedMultiPoints, a multimodal dataset with annotations from endoscopy (polyps and instruments) and microscopy (sperm cells), we reformulate each task into instruction-based prompts suitable for vision-language reasoning. We fine-tune Qwen2.5-VL-7B-Instruct using Low-Rank Adaptation (LoRA) across multiple task combinations. Results show that multi-task training improves robustness and accuracy. For example, it reduces the Count Mean Absolute Error (MAE) and increases Matching Accuracy in the Counting + Pointing task. However, trade-offs emerge, such as more zero-case point predictions, indicating reduced reliability in edge cases despite overall performance gains. Our study highlights the potential of adapting general-purpose VLMs to specialized medical tasks via prompt-driven fine-tuning. This approach mirrors clinical workflows, where radiologists simultaneously localize, count, and describe findings - demonstrating how VLMs can learn composite diagnostic reasoning patterns. The model produces interpretable, structured outputs, offering a promising step toward explainable and versatile medical AI. Code, model weights, and scripts will be released for reproducibility at https://github.com/simula/PointDetectCount.
Saptarshi Purkayastha, Hrishikesh Bhagwat, Keerthika Sunchu
et al.
Premature infant mortality remains a critical challenge in low- and middle-income countries (LMICs), with continuous vital sign monitoring being essential for early detection of life-threatening conditions. This paper presents an integrated system combining NeoWarm, a novel biomedical device, with NeoRoo, a mobile application, and NeoSmartML, a machine learning infrastructure, to enable comprehensive vital sign monitoring during Kangaroo Mother Care (KMC). Our power-optimized device achieves 6-6.5 days of continuous operation on a single charge, while the mobile application implements an offline-first architecture with efficient data synchronization. The optical character recognition pipeline demonstrates promising accuracy (F1 scores 0.78-0.875) for automated vital sign extraction from existing NICU monitors. Experimental validation shows the system's feasibility for deployment in resource-constrained settings, though further optimization of heart rate and temperature detection, along with the risk classification foundation model is needed.
Medical Referring Image Segmentation (MRIS) involves segmenting target regions in medical images based on natural language descriptions. While achieving promising results, recent approaches usually involve complex design of multimodal fusion or multi-stage decoders. In this work, we propose NTP-MRISeg, a novel framework that reformulates MRIS as an autoregressive next-token prediction task over a unified multimodal sequence of tokenized image, text, and mask representations. This formulation streamlines model design by eliminating the need for modality-specific fusion and external segmentation models, supports a unified architecture for end-to-end training. It also enables the use of pretrained tokenizers from emerging large-scale multimodal models, enhancing generalization and adaptability. More importantly, to address challenges under this formulation-such as exposure bias, long-tail token distributions, and fine-grained lesion edges-we propose three novel strategies: (1) a Next-k Token Prediction (NkTP) scheme to reduce cumulative prediction errors, (2) Token-level Contrastive Learning (TCL) to enhance boundary sensitivity and mitigate long-tail distribution effects, and (3) a memory-based Hard Error Token (HET) optimization strategy that emphasizes difficult tokens during training. Extensive experiments on the QaTa-COV19 and MosMedData+ datasets demonstrate that NTP-MRISeg achieves new state-of-the-art performance, offering a streamlined and effective alternative to traditional MRIS pipelines.
Kartik Prabhakaran, Joshua Klein, Bardiya Zangbar
et al.
Background This study aims to compare outcomes of robotic cholecystectomy (RC) versus laparoscopic cholecystectomy (LC) in the setting of a level 1 trauma center.Methods We performed a retrospective study of our hospital data (2021–2024) on patients who underwent LC or RC. Using a previously validated intraoperative grading system, four grades of cholecystitis were defined as mild (A), moderate (B), severe (C), and extreme (D). Outcomes were operative times and rates of conversion to open surgery.Results In total, 260 patients (n=130 RC and n=130 LC) were included. Patients were primarily female (69.2%), with mean age of 47±18.3 years. The majority of cases had grade B cholecystitis (41.2%). Patients undergoing RC had lower operative times compared with LC in grade B (101.87±17.54 vs 114.96±29.44 min, p=0.003) and grade C (134.68±26.97 vs 152.06±31.3 min, p=0.038). Conversion rate to open cholecystectomy were similar in both groups (p=0.19).Conclusion RC had similar results as LC in terms of operative time and in fact has significantly lower operative time in patients with grade B and grade C cholecystitis.Level of evidence Level III—retrospective study.
Surgery, Medical emergencies. Critical care. Intensive care. First aid
Hayley Motowski, MD, Daniel Ilges, PharmD, Nicholas Hampton, PharmD
et al.
IMPORTANCE:. Hospital-acquired pneumonia (HAP) is the most common hospital-acquired infection, accounting for 22% of all nosocomial infections. The available studies to date have not attempted to assess whether confounding factors may account for the observed difference in mortality for the two forms of nosocomial pneumonia associated with mechanical ventilation, namely ventilated HAP (vHAP) and ventilator-associated pneumonia (VAP).
OBJECTIVES:. To determine if vHAP is an independent predictor of mortality among patients with nosocomial pneumonia.
DESIGN, SETTING, AND PARTICIPANTS:. Single-center retrospective cohort study conducted at Barnes-Jewish Hospital, St. Louis, MO, between 2016 and 2019. Adult patients with a pneumonia discharge diagnosis were screened and patients diagnosed with vHAP and VAP were included. All patient data was extracted from the electronic health record.
MAIN OUTCOMES AND MEASURES:. The primary outcome was 30-day all-cause mortality (ACM).
RESULTS:. One thousand one-hundred twenty unique patient admissions were included (410 vHAP, 710 VAP). Thirty-day ACM was greater for patients with vHAP compared with VAP (37.1% vs 28.5%; p = 0.003). Logistic regression analysis identified vHAP (adjusted odds ratio [AOR], 1.77; 95% CI, 1.51–2.07), vasopressor use (AOR, 2.34; 95% CI, 1.94–2.82), Charlson Comorbidity Index (1-point increments) (AOR, 1.21; 95% CI, 1.18–1.24), total antibiotic treatment days (1-d increments) (AOR, 1.13; 95% CI, 1.11–1.14), and Acute Physiology and Chronic Health Evaluation II score (1-point increments) (AOR, 1.04; 95% CI, 1.03–1.06) as independent predictors of 30-day ACM. The most common bacterial pathogens identified as causes of vHAP and VAP were Staphylococcus aureus, Enterobacterales species, and Pseudomonas aeruginosa.
CONCLUSIONS AND RELEVANCE:. In this single-center cohort study with low rates of initial inappropriate antibiotic therapy, vHAP had greater 30-day ACM compared with VAP after adjusting for potential confounding variables including disease severity and comorbidities. This finding suggests that clinical trials enrolling patients with vHAP need to account for this outcome difference in their trial design and data interpretation.
Medical emergencies. Critical care. Intensive care. First aid
Muriithi Eliud Kennedy, Nangole Ferdinand Wanjala, Mwangi Peter Wambugu
et al.
Objective: To correlate initial serum lactic acid and base deficit (BD) levels with early mortality in major thermal burns. Methods: This was a prospective descriptive study conducted over 6 months at Kenyatta National Hospital (KNH), Nairobi, Kenya. Ninety consecutive patients with major thermal burns exceeding 20% of total body surface area (TBSA), who met other inclusion criteria participated. Biographic and clinical data were collected using a structured questionnaire. Blood samples were drawn at admission for arterial blood gas analysis (ABGAs) to obtain serum lactic acid and BD levels. Patients were followed up for 7 days at KNH Burns Unit. Results: Studied patients had burns from 21% to 100% TBSA. Majority, 54(60%), had burns between 21% and 50% TBSA. 55(61.1%) patients died within 7 days after admission. 38(69.1%) of these deaths occured within the initial 48 h. Both mean serum lactic acid (P < 0.001) and BD (P < 0.001) levels were statistically associated with mortality in 7 days, compared with the 7-day survivors. On average, patients who died had higher serum lactic acid (2.8 mmol/L versus 5.1 mmol/L) and lower BD (-9.8 mmol/L versus −15.5 mmol/L) compared to those who survived the initial 7 days. Conclusion: Initial Serum lactic acid and BD were found to be good prognostic indicators of early mortality in major thermal burns.
Dermatology, Medical emergencies. Critical care. Intensive care. First aid
Giulia Fierro, Barbara Milan, Silvia Bettinelli
et al.
Abstract Background Systemic infection has always been considered a relative contraindication to neuraxial anesthesia, despite the fact that infectious complications are relatively uncommon. Pregnancy-related physiological changes and coronavirus disease (COVID-19) neurotropic features may facilitate the virus’ entry into the central nervous system. The principal aim of this study was to test the safety of spinal anesthesia in “severe acute respiratory syndrome coronavirus 2” (SARS-CoV-2)-positive pregnant women and to examine cerebrospinal fluid (CSF) characteristics. Methods We conducted a prospective observational single-center study in asymptomatic or paucisymptomatic consecutive pregnant SARS-CoV-2 patients who underwent spinal anesthesia for cesarean section. Women with severe infection were excluded because they underwent general anesthesia. At the time of spinal anesthesia, we collected CSF samples, and then we performed a chemical-physical analysis to look for signs of inflammation and for SARS-CoV-2 genome. Results We included 26 women. No spinal anesthesia complications were reported in the perioperative period and after 2 months. All CSF samples were crystal clear, and all physical-chemical values were within physiological ranges: the median concentration of CSF/plasma glucose ratio was 0.66, IQR 0.5500 (0.6000–0.7100), and the average CSF protein concentration value was 23.2 mg/dl (SD 4.87). In all samples, genomes of SARS-CoV-2 and other neurotropic viruses were not detected. Conclusions Spinal anesthesia was safe in SARS-CoV-2 pregnant women with mild disease; no clinical maternal complications were detected, and no CSF changes indicative of inflammatory or infectious diseases that would compromise the safety of the procedure were found.
Anesthesiology, Medical emergencies. Critical care. Intensive care. First aid
Xing Shen, Hengguan Huang, Brennan Nichyporuk
et al.
Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations. These unknown covariate shifts present significant challenges to deep learning based methods trained on "clean" images. This often results in unreliable predictions and poorly calibrated confidence, hence hindering clinical applicability. While recent methods have been developed to address specific issues such as confidence calibration or adversarial robustness, no single framework effectively tackles all these challenges simultaneously. To bridge this gap, we propose LaDiNE, a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models for improved reliability in medical image classification. Specifically, transformer encoder blocks are used as hierarchical feature extractors that learn invariant features from images for each ensemble member, resulting in features that are robust to input perturbations. In addition, diffusion models are used as flexible density estimators to estimate member densities conditioned on the invariant features, leading to improved modeling of complex data distributions while retaining properly calibrated confidence. Extensive experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods by simultaneously improving prediction accuracy and confidence calibration under unseen noise, adversarial perturbations, and resolution degradation.
This thesis addressed the HHCRSP, which is a class of workforce scheduling problems. The HHCRSP is an extension of the VRPTW to which the constraints related to the HHC context are added. It aims to provide care services to patients at their homes instead of going to the hospital. We dealt with three different problems from the optimization viewpoint. In the first one, we considered a deterministic model to tackle the HHCRSP with multiple time windows, multiple services, their synchronization if they are required to be simultaneous and skill requirements. We proposed a new mathematical to solve this problem along with a GVNS based heuristic to solve large instances. In the second problem, we extended the deterministic model to cope with uncertainties in terms travel and service times. We proposed two SPR models. In the first SPR model, we defined the recourse as a penalty cost for the tardiness of services and a remuneration for caregivers' overtime. In the second SPR model, we defined the recourse as skipping patients if their time windows should be violated. We embedded Monte Carlo simulation, which is used to estimate the expected value of recourse, into a GA based heuristic to solve SPR models. In the last problem, we kept the multi-objective aspect of the deterministic model without aggregating its objective functions, and we used algorithms based on Pareto dominance to find the non-dominated solutions and then involve the decision-maker to select which one he prefers. Two approaches, Pareto and decomposition based, with multi-objective evolutionary algorithms are adopted to solve the HHCRSP. Three algorithms are implemented: NSGA-II, MOEA/D and a hybrid NSGA-II with MOEA/D (hybrid) algorithm.