Failure of early creatinine recovery predicts poor survival after emergency surgery for bowel perforation or infarction
Hyun Il Kim, Min Hong Lee, Byung Jun Jeon
et al.
Background Early postoperative recovery of kidney function is critical in emergency bowel surgery. This study evaluated the prognostic value of preoperative creatinine elevation (PCE) and early creatinine recovery (ECR). Methods A total of 424 patients underwent emergency surgery for bowel perforation or ischemia from January 2019 to December 2024. Sixteen trauma-related cases (including procedure-related injuries) were excluded, leaving 408 patients for analysis. Of these, 35 patients with end-stage renal disease or chronic kidney disease—either pre-existing or newly diagnosed during hospitalization—were excluded. ECR was defined as a decrease in serum creatinine to <1.3 mg/dl by postoperative day (POD) 3. PCE was defined as serum creatinine >1.3 mg/dl. Associations with postoperative complications and 30-day mortality were estimated using multivariable logistic regression and reported as adjusted odds ratios (aORs) with 95% CIs. Results PCE occurred in 18.5% (69/373) of the tested patients; among these, 58.0% (40/69) achieved ECR by POD 3. Failure of ECR was associated with severe complications (93.1% vs. 27.5%, P<0.001) and higher mortality (72.4% vs. 7.5%, P<0.001). In multivariable analysis, ECR failure independently predicted complications (aOR, 28.71; 95% CI, 5.44–151.57) and 30-day mortality (aOR, 32.37; 95% CI, 7.74–135.40; P<0.001 for both). Conclusions Failure to achieve ECR is independently associated with poor survival after emergency laparotomy for peritonitis. This finding supports the use of a creatinine-based checkpoint to trigger intensified monitoring and targeted rescue interventions.
Medical emergencies. Critical care. Intensive care. First aid
Association of Wet-Bulb Globe Temperature with heat-related illness hospitalizations in Japan: a time-stratified, case-crossover study
Yuka Yamamura, Takashi Hongo, Tetsuya Yumoto
et al.
Abstract Background Heat-related illnesses are a serious public health concern and are exacerbated by global warming. Wet-Bulb Globe Temperature (WBGT) is widely used as a heat stress indicator, but its clinical impact remains unclear. This study aimed to investigate the association between hourly variations in WBGT and the incidence of hospitalizations for heat-related illness in Japan using a nationwide database. By incorporating individual-level clinical data and performing stratified analyses, we sought to provide a more granular understanding of how heat exposure affects the risk of heat-related illness requiring hospitalization. Methods We conducted a time-stratified, case-crossover study using data collected from July to September in 2020 and 2021 in the Heatstroke STUDY registry. The inclusion criteria were patients registered in the Heatstroke STUDY registry, specifically hospitalized patients with heat-related illness who were transported to participating hospitals during the study period. Hourly WBGT values were assigned based on the nearest monitoring station to each hospital. Conditional logistic regression and distributed lag models were used to estimate associations between WBGT and the risk of hospitalization. Results A total of 1,653 heat-related illness hospitalizations were analyzed. The mean patient age was 67.9 years; 67.6% were male. Each 1 °C increase in WBGT at onset (hospital arrival) was associated with a significantly increased risk of hospitalization (OR 1.10, 95% CI: 1.05–1.15). The cumulative effect over the prior six hours was also significant (OR 1.56, 95% CI: 1.50–1.62). Compared with WBGT < 25 °C, adjusted ORs were 3.39 (25–27 °C), 8.81 (28–30 °C), and 22.10 (≥ 31 °C). Stratified analyses suggested stronger associations among several subgroups; however, only patients with mental disorders showed statistically significant effect modification, whereas elevated WBGT posed a risk across all groups. Conclusions Higher WBGT levels were associated with an increased risk of heat-related hospitalization. Although the effect appeared greater in some subgroups, only patients with mental disorders demonstrated statistically significant effect modification, suggesting elevated WBGT confers risk broadly.
Medical emergencies. Critical care. Intensive care. First aid
Pocket RAG: On-Device RAG for First Aid Guidance in Offline Mobile Environment
Dong Ho Kang, Hyunjoon Lee, Hyeonjeong Cha
et al.
In disaster scenarios or remote areas, first responders often lose network connectivity when providing first aid. In such situations, server-based AI systems fail to provide critical guidance. To address this issue, we present a lightweight, mobile-based retrieval-augmented generation system for small language models (SLMs) that can run directly on Android devices. Our system integrates a mobile-friendly optimized pipeline featuring Hybrid RAG, selective compression, batched prompt decoding, and quantization caching. Despite the model's small size, our RAG-based system achieves 94.5\% accuracy for physical first aid and 97.0\% for psychological first aid. Additionally, we reduce response time from 14.2s to 3.7s, achieving a nearly 4x speedup. These results prove that our system is practical and can deliver reliable first aid guidance even without internet connectivity.
Tree-NET: Enhancing Medical Image Segmentation Through Efficient Low-Level Feature Training
Orhan Demirci, Bulent Yilmaz
This paper introduces Tree-NET, a novel framework for medical image segmentation that leverages bottleneck feature supervision to enhance both segmentation accuracy and computational efficiency. While previous studies have employed bottleneck feature supervision, their applications have largely been limited to the training phase, offering no computational benefits during training or evaluation. To the best of our knowledge, this study is the first to propose a framework that incorporates two additional training phases for segmentation models, utilizing bottleneck features at both input and output stages. This approach significantly improves computational performance by reducing input and output dimensions with a negligible addition to parameter count, without compromising accuracy. Tree-NET features a three-layer architecture comprising Encoder-Net and Decoder-Net, which are autoencoders designed to compress input and label data, respectively, and Bridge-Net, a segmentation framework that supervises the bottleneck features. By focusing on dense, compressed representations, Tree-NET enhances operational efficiency and can be seamlessly integrated into existing segmentation models without altering their internal structures or increasing model size. We evaluate Tree-NET on two critical segmentation tasks -- skin lesion and polyp segmentation -- using various backbone models, including U-NET variants and Polyp-PVT. Experimental results demonstrate that Tree-NET reduces FLOPs by a factor of 4 to 13 and decreases memory usage, while achieving comparable or superior accuracy compared to the original architectures. These findings underscore Tree-NET's potential as a robust and efficient solution for medical image segmentation.
Limits of trust in medical AI
Joshua Hatherley
Artificial intelligence (AI) is expected to revolutionize the practice of medicine. Recent advancements in the field of deep learning have demonstrated success in a variety of clinical tasks: detecting diabetic retinopathy from images, predicting hospital readmissions, aiding in the discovery of new drugs, etc. AI's progress in medicine, however, has led to concerns regarding the potential effects of this technology upon relationships of trust in clinical practice. In this paper, I will argue that there is merit to these concerns, since AI systems can be relied upon, and are capable of reliability, but cannot be trusted, and are not capable of trustworthiness. Insofar as patients are required to rely upon AI systems for their medical decision-making, there is potential for this to produce a deficit of trust in relationships in clinical practice.
Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning
Yogesh Thakku Suresh, Vishwajeet Shivaji Hogale, Luca-Alexandru Zamfira
et al.
We present a transformer-based multimodal framework for generating clinically relevant captions for MRI scans. Our system combines a DEiT-Small vision transformer as an image encoder, MediCareBERT for caption embedding, and a custom LSTM-based decoder. The architecture is designed to semantically align image and textual embeddings, using hybrid cosine-MSE loss and contrastive inference via vector similarity. We benchmark our method on the MultiCaRe dataset, comparing performance on filtered brain-only MRIs versus general MRI images against state-of-the-art medical image captioning methods including BLIP, R2GenGPT, and recent transformer-based approaches. Results show that focusing on domain-specific data improves caption accuracy and semantic alignment. Our work proposes a scalable, interpretable solution for automated medical image reporting.
EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services
Keshara Weerasinghe, Xueren Ge, Tessa Heick
et al.
Emergency Medical Services (EMS) are critical to patient survival in emergencies, but first responders often face intense cognitive demands in high-stakes situations. AI cognitive assistants, acting as virtual partners, have the potential to ease this burden by supporting real-time data collection and decision making. In pursuit of this vision, we introduce EgoEMS, the first end-to-end, high-fidelity, multimodal, multiperson dataset capturing over 20 hours of realistic, procedural EMS activities from an egocentric view in 233 simulated emergency scenarios performed by 62 participants, including 46 EMS professionals. Developed in collaboration with EMS experts and aligned with national standards, EgoEMS is captured using an open-source, low-cost, and replicable data collection system and is annotated with keysteps, timestamped audio transcripts with speaker diarization, action quality metrics, and bounding boxes with segmentation masks. Emphasizing realism, the dataset includes responder-patient interactions reflecting real-world emergency dynamics. We also present a suite of benchmarks for real-time multimodal keystep recognition and action quality estimation, essential for developing AI support tools for EMS. We hope EgoEMS inspires the research community to push the boundaries of intelligent EMS systems and ultimately contribute to improved patient outcomes.
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
Gabriele Lozupone, Alessandro Bria, Francesco Fontanella
et al.
This study presents Latent Diffusion Autoencoder (LDAE), a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging, focusing on Alzheimer disease (AD) using brain MR from the ADNI database as a case study. Unlike conventional diffusion autoencoders operating in image space, LDAE applies the diffusion process in a compressed latent representation, improving computational efficiency and making 3D medical imaging representation learning tractable. To validate the proposed approach, we explore two key hypotheses: (i) LDAE effectively captures meaningful semantic representations on 3D brain MR associated with AD and ageing, and (ii) LDAE achieves high-quality image generation and reconstruction while being computationally efficient. Experimental results support both hypotheses: (i) linear-probe evaluations demonstrate promising diagnostic performance for AD (ROC-AUC: 90%, ACC: 84%) and age prediction (MAE: 4.1 years, RMSE: 5.2 years); (ii) the learned semantic representations enable attribute manipulation, yielding anatomically plausible modifications; (iii) semantic interpolation experiments show strong reconstruction of missing scans, with SSIM of 0.969 (MSE: 0.0019) for a 6-month gap. Even for longer gaps (24 months), the model maintains robust performance (SSIM > 0.93, MSE < 0.004), indicating an ability to capture temporal progression trends; (iv) compared to conventional diffusion autoencoders, LDAE significantly increases inference throughput (20x faster) while also enhancing reconstruction quality. These findings position LDAE as a promising framework for scalable medical imaging applications, with the potential to serve as a foundation model for medical image analysis. Code available at https://github.com/GabrieleLozupone/LDAE
CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention
Xiaomeng Hu, Fei Huang, Chenhan Yuan
et al.
As large language models (LLMs) are increasingly deployed in real-world applications, ensuring the safety of their outputs during decoding has become a critical challenge. However, existing decoding-time interventions, such as Contrastive Decoding, often force a severe trade-off between safety and response quality. In this work, we propose CARE, a novel framework for decoding-time safety alignment that integrates three key components: (1) a guard model for real-time safety monitoring, enabling detection of potentially unsafe content; (2) a rollback mechanism with a token buffer to correct unsafe outputs efficiently at an earlier stage without disrupting the user experience; and (3) a novel introspection-based intervention strategy, where the model generates self-reflective critiques of its previous outputs and incorporates these reflections into the context to guide subsequent decoding steps. The framework achieves a superior safety-quality trade-off by using its guard model for precise interventions, its rollback mechanism for timely corrections, and our novel introspection method for effective self-correction. Experimental results demonstrate that our framework achieves a superior balance of safety, quality, and efficiency, attaining a low harmful response rate and minimal disruption to the user experience while maintaining high response quality.
Successful Treatment of Pulmonary Embolism Causing Cardiac Arrests with Reteplase during Neurosurgery: A Case Report
Ehsan Yousefi-Mazhin, Mojtaba Mojtahedzadeh, Hossein Karballaei-Mirzahosseini
et al.
Pulmonary embolism can cause cardiac arrest. Fibrinolytic therapy and surgical embolectomy can be used to manage it. This case report presents the clinical course of a patient who experienced intraoperative cardiac arrest resulting from massive pulmonary embolism. The patient encountered three instances of cardiac arrest requiring 35 minutes of cardiopulmonary resuscitation. Subsequent treatment involved the administration of reteplase, a thrombolytic agent. Following resuscitation, the patient developed multiple organ dysfunction in the intensive care unit, necessitating the use of diverse medications. Successful resolution of organ dysfunction led to the patient's transfer to the neurosurgery department. This case highlights the complexities involved in managing pulmonary embolism-induced cardiac arrest and subsequent multiorgan dysfunction, emphasizing the significance of a multidisciplinary approach in the comprehensive care and treatment of these patients.
Anesthesiology, Medical emergencies. Critical care. Intensive care. First aid
Research agenda and priorities for Australian and New Zealand paramedicine: A Delphi consensus study
Robin Pap, Nigel Barr, Amy Hutchison
et al.
Introduction: The systematic development of a research agenda is essential for coordinated, collaborative, and efficient research endeavours in any discipline. The aim of this study was to create and prioritise a stakeholder-informed, consensus-derived paramedicine research agenda for Australia and New Zealand. Methods: The study utilised a modified Delphi consensus method consisting of three phases. Phase 1, the findings of which were previously published, consisted of a survey of Australian and New Zealand paramedicine stakeholders to inform the subsequent consensus process. Phase 2 contained three Delphi rounds involving key paramedicine profession stakeholders to generate a research agenda. Panellists were asked to rate their agreement with the inclusion of each item using a 5-point Likert scale. Consensus was defined as 80% agreement signalled by ‘ Strongly Agree ’ and ‘ Agree ’ responses. Phase 3 involved one additional round of voting to determine the importance and thus establish priorities amongst the final list of agenda items. Results: There were 341 responses to the survey in Phase 1 and thematic analysis produced a provisional agenda consisting of 109 perceived research priorities. Sixty-three key paramedicine profession stakeholders were invited to Phases 2 and 3, of which 56 (88.9%) completed all three rounds in Phase 2, and 43 (68.3%) completed the final Phase 3. Thirty-seven items achieved consensus and were subsequently prioritised constituting the final research agenda. Panellists gave the highest priority to ‘Paramedics role in broader healthcare system’, ‘New and emerging roles in for paramedics’, ‘Patient safety’, ‘System improvement’, and ‘Clinical reasoning processes and models’. Conclusion: Using a modified Delphi consensus method and drawing from a broad range of stakeholders, a 37-item Australian and New Zealand paramedicine research agenda with item prioritisation was developed. The agenda serves to inform industry and other key stakeholders to guide their research endeavours ultimately leading to meaningful and tangible impact within the paramedicine profession.
Medical emergencies. Critical care. Intensive care. First aid
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging
Sarah Müller, Louisa Fay, Lisa M. Koch
et al.
Medical imaging cohorts are often confounded by factors such as acquisition devices, hospital sites, patient backgrounds, and many more. As a result, deep learning models tend to learn spurious correlations instead of causally related features, limiting their generalizability to new and unseen data. This problem can be addressed by minimizing dependence measures between intermediate representations of task-related and non-task-related variables. These measures include mutual information, distance correlation, and the performance of adversarial classifiers. Here, we benchmark such dependence measures for the task of preventing shortcut learning. We study a simplified setting using Morpho-MNIST and a medical imaging task with CheXpert chest radiographs. Our results provide insights into how to mitigate confounding factors in medical imaging.
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
Chenming Tang, Zhixiang Wang, Hao Sun
et al.
With the help of in-context learning (ICL), large language models (LLMs) have achieved impressive performance across various tasks. However, the function of descriptive instructions during ICL remains under-explored. In this work, we propose an ensemble prompt framework to describe the selection criteria of multiple in-context examples, and preliminary experiments on machine translation (MT) across six translation directions confirm that this framework boosts ICL performance. But to our surprise, LLMs might not care what the descriptions actually say, and the performance gain is primarily caused by the ensemble format, since it could lead to improvement even with random descriptive nouns. We further apply this new ensemble framework on a range of commonsense, math, logical reasoning and hallucination tasks with three LLMs and achieve promising results, suggesting again that designing a proper prompt format would be much more effective and efficient than paying effort into specific descriptions.
Dual-arm Motion Generation for Repositioning Care based on Deep Predictive Learning with Somatosensory Attention Mechanism
Tamon Miyake, Namiko Saito, Tetsuya Ogata
et al.
Caregiving is a vital role for domestic robots, especially the repositioning care has immense societal value, critically improving the health and quality of life of individuals with limited mobility. However, repositioning task is a challenging area of research, as it requires robots to adapt their motions while interacting flexibly with patients. The task involves several key challenges: (1) applying appropriate force to specific target areas; (2) performing multiple actions seamlessly, each requiring different force application policies; and (3) motion adaptation under uncertain positional conditions. To address these, we propose a deep neural network (DNN)-based architecture utilizing proprioceptive and visual attention mechanisms, along with impedance control to regulate the robot's movements. Using the dual-arm humanoid robot Dry-AIREC, the proposed model successfully generated motions to insert the robot's hand between the bed and a mannequin's back without applying excessive force, and it supported the transition from a supine to a lifted-up position. The project page is here: https://sites.google.com/view/caregiving-robot-airec/repositioning
Instruction-tuned Large Language Models for Machine Translation in the Medical Domain
Miguel Rios
Large Language Models (LLMs) have shown promising results on machine translation for high resource language pairs and domains. However, in specialised domains (e.g. medical) LLMs have shown lower performance compared to standard neural machine translation models. The consistency in the machine translation of terminology is crucial for users, researchers, and translators in specialised domains. In this study, we compare the performance between baseline LLMs and instruction-tuned LLMs in the medical domain. In addition, we introduce terminology from specialised medical dictionaries into the instruction formatted datasets for fine-tuning LLMs. The instruction-tuned LLMs significantly outperform the baseline models with automatic metrics.
ChatGPT for Us: Preserving Data Privacy in ChatGPT via Dialogue Text Ambiguation to Expand Mental Health Care Delivery
Anaelia Ovalle, Mehrab Beikzadeh, Parshan Teimouri
et al.
Large language models have been useful in expanding mental health care delivery. ChatGPT, in particular, has gained popularity for its ability to generate human-like dialogue. However, data-sensitive domains -- including but not limited to healthcare -- face challenges in using ChatGPT due to privacy and data-ownership concerns. To enable its utilization, we propose a text ambiguation framework that preserves user privacy. We ground this in the task of addressing stress prompted by user-provided texts to demonstrate the viability and helpfulness of privacy-preserved generations. Our results suggest that chatGPT recommendations are still able to be moderately helpful and relevant, even when the original user text is not provided.
HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge
Haochun Wang, Chi Liu, Nuwa Xi
et al.
Large Language Models (LLMs), such as the LLaMA model, have demonstrated their effectiveness in various general-domain natural language processing (NLP) tasks. Nevertheless, LLMs have not yet performed optimally in biomedical domain tasks due to the need for medical expertise in the responses. In response to this challenge, we propose HuaTuo, a LLaMA-based model that has been supervised-fine-tuned with generated QA (Question-Answer) instances. The experimental results demonstrate that HuaTuo generates responses that possess more reliable medical knowledge. Our proposed HuaTuo model is accessible at https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese.
Jordi Mancebo, in memoriam (August 06 2022)
Laurent Brochard, Alain Mercat
Medical emergencies. Critical care. Intensive care. First aid
Symmetrical peripheral gangrene triggered by Escherichia coli sepsis
Shenoy Manjunath Mala, Bendigeri Mukhthar Ahmed
Rationale: Symmetrical peripheral gangrene is a rare acute condition triggered by several medical conditions. It needs to be recognized early and treated as an emergency.
Patient’s Concern: A 40-year-old patient without any comorbidities presented with sudden onset of pain, swelling, blistering and bluish discoloration of the fingers and toes associated with fever and constitutional symptoms. Later the fingers and toes turned dark and they were cold to touch.
Diagnosis: Diagnosis of symmetrical peripheral gangrene was made and investigated. Blood culture isolated Escherichia coli indicating a possible role in the causation of symmetrical peripheral gangrene.
Interventions: Patient was managed with antibiotics and anticoagulants along with supportive care.
Outcomes: There was improvement in patient’s general condition along with development of gangrene demarcation line of the fingers and toes.
Lessons: Awareness of this condition is needed and an early management is recommended including recognizing the cause and supportive therapy to prevent complications.
Medical emergencies. Critical care. Intensive care. First aid
Damage control laparotomy in trauma: a pilot randomized controlled trial. The DCL trial
Jon E Tyson, Lillian S Kao, John B Holcomb
et al.
Background Although widely used in treating severe abdominal trauma, damage control laparotomy (DCL) has not been assessed in any randomized controlled trial. We conducted a pilot trial among patients for whom our surgeons had equipoise and hypothesized that definitive laparotomy (DEF) would reduce major abdominal complications (MAC) or death within 30 days compared with DCL.Methods Eligible patients undergoing emergency laparotomy were randomized during surgery to DCL or DEF from July 2016 to May 2019. The primary outcome was MAC or death within 30 days. Prespecified frequentist and Bayesian analyses were performed.Results Of 489 eligible patients, 39 patients were randomized (DCL 18, DEF 21) and included. Groups were similar in demographics and mechanism of injury. The DEF group had a higher Injury Severity Score (DEF median 34 (IQR 20, 43) vs DCL 29 (IQR 22, 41)) and received more prerandomization blood products (DEF median red blood cells 8 units (IQR 6, 11) vs DCL 6 units (IQR 2, 11)). In unadjusted analyses, the DEF group had more MAC or death within 30 days (1.71, 95% CI 0.81 to 3.63, p=0.159) due to more deaths within 30 days (DEF 33% vs DCL 0%, p=0.010). Adjustment for Injury Severity Score and prerandomization blood products reduced the risk ratio for MAC or death within 30 days to 1.54 (95% CI 0.71 to 3.32, p=0.274). The Bayesian probability that DEF increased MAC or death within 30 days was 85% in unadjusted analyses and 66% in adjusted analyses.Conclusion The findings of our single center pilot trial were inconclusive. Outcomes were not worse with DCL and, in fact, may have been better. A randomized clinical trial of DCL is feasible and a larger, multicenter trial is needed to compare DCL and DEF for patients with severe abdominal trauma.Level of evidence Level II.
Surgery, Medical emergencies. Critical care. Intensive care. First aid