A decade of unprecedented progress in artificial intelligence (AI) has demonstrated the potential for many fields—including medicine—to benefit from the insights that AI techniques can extract from data. Here we survey recent progress in the development of modern computer vision techniques—powered by deep learning—for medical applications, focusing on medical imaging, medical video, and clinical deployment. We start by briefly summarizing a decade of progress in convolutional neural networks, including the vision tasks they enable, in the context of healthcare. Next, we discuss several example medical imaging applications that stand to benefit—including cardiology, pathology, dermatology, ophthalmology–and propose new avenues for continued work. We then expand into general medical video, highlighting ways in which clinical workflows can integrate computer vision to enhance care. Finally, we discuss the challenges and hurdles required for real-world clinical deployment of these technologies.
Background The use of artificial intelligence (AI) in medicine will generate numerous application possibilities to improve patient care, provide real-time data analytics, and enable continuous patient monitoring. Clinicians and health informaticians should become familiar with machine learning and deep learning. Additionally, they should have a strong background in data analytics and data visualization to use, evaluate, and develop AI applications in clinical practice. Objective The main objective of this study was to evaluate the current state of AI training and the use of AI tools to enhance the learning experience. Methods A comprehensive systematic review was conducted to analyze the use of AI in medical and health informatics education, and to evaluate existing AI training practices. PRISMA-P (Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols) guidelines were followed. The studies that focused on the use of AI tools to enhance medical education and the studies that investigated teaching AI as a new competency were categorized separately to evaluate recent developments. Results This systematic review revealed that recent publications recommend the integration of AI training into medical and health informatics curricula. Conclusions To the best of our knowledge, this is the first systematic review exploring the current state of AI education in both medicine and health informatics. Since AI curricula have not been standardized and competencies have not been determined, a framework for specialized AI training in medical and health informatics education is proposed.
Artificial intelligence (AI) is driving transformative changes in the field of medicine, with its successful application relying on accurate data and rigorous quality standards. By integrating clinical information, pathology, medical imaging, physiological signals, and omics data, AI significantly enhances the precision of research into disease mechanisms and patient prognoses. AI technologies also demonstrate exceptional potential in drug development, surgical automation, and brain-computer interface (BCI) research. Through the simulation of biological systems and prediction of intervention outcomes, AI enables researchers to rapidly translate innovations into practical clinical applications. While challenges such as computational demands, software development, and ethical considerations persist, the future of AI remains highly promising. AI plays a pivotal role in addressing societal issues like low birth rates and aging populations. AI can contribute to mitigating low birth rate issues through enhanced ovarian reserve evaluation, menopause forecasting, optimization of Assisted Reproductive Technologies (ART), sperm analysis and selection, endometrial receptivity evaluation, fertility forecasting, and remote consultations. In addressing the challenges posed by an aging population, AI can facilitate the development of dementia prediction models, cognitive health monitoring and intervention strategies, early disease screening and prediction systems, AI-driven telemedicine platforms, intelligent health monitoring systems, smart companion robots, and smart environments for aging-in-place. AI profoundly shapes the future of medicine.
Social biases in generative models have gained increasing attention. This paper proposes an automatic evaluation protocol for text-to-image generation, examining how gender bias originates and perpetuates in the generation process of Stable Diffusion. Using triplet prompts that vary by gender indicators, we trace presentations at several stages of the generation process and explore dependencies between prompts and images. Our findings reveal the bias persists throughout all internal stages of the generating process and manifests in the entire images. For instance, differences in object presence, such as different instruments and outfit preferences, are observed across genders and extend to overall image layouts. Moreover, our experiments demonstrate that neutral prompts tend to produce images more closely aligned with those from masculine prompts than with their female counterparts. We also investigate prompt-image dependencies to further understand how bias is embedded in the generated content. Finally, we offer recommendations for developers and users to mitigate this effect in text-to-image generation.
Photography, Computer applications to medicine. Medical informatics
BackgroundThe internet and social media have become essential sources of health information for patients and citizens; however, they often disseminate misinformation that lacks scientific evidence. Health-related misinformation can undermine evidence-based treatment, weaken patient-provider relationships, and contribute to adverse health outcomes. Although narratives have been proposed as a promising approach to countering misinformation, their effectiveness remains inconsistent and influenced by various factors.
ObjectiveThe aim of this study is to assess the effectiveness of narrative messages in correcting health-related misinformation compared to nonnarrative messages. It also seeks to identify message-, sender-, and recipient-related factors that influence the effectiveness of narrative-based corrections.
MethodsThis systematic review will follow the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Comprehensive searches will be conducted across databases, including PubMed, MEDLINE, CINAHL, PsycINFO, and Web of Science, using keywords related to narratives and correction of health-related misinformation. This review will include quantitative studies evaluating narrative-based corrections for health-related misinformation in experimental and quasi-experimental studies. Studies unrelated to health misinformation or where the full text is unavailable will be excluded. No restrictions on publication year will apply. Only papers written in English will be included. Two independent reviewers will screen the papers using Rayyan QCRI software, with disagreements resolved by a third reviewer. Data extraction will cover health topics (eg, vaccination, tobacco), study characteristics (eg, author, publication year), narrative characteristics (eg, definition of narrative, theoretical foundation), participant characteristics (eg, sociodemographic), methodology (eg, study design, content of interventions and comparators, outcomes and measures, moderating and mediating factors), main results, and discussion. The quality of the eligible studies will be assessed using the Cochrane Risk of Bias 2 tool and the Risk of Bias In Non-randomized Studies - of Interventions tool.
ResultsThe results will be summarized in tables and presented as a descriptive review addressing the effectiveness of narrative corrections in health-related misinformation and the factors influencing their success. The implications of these results for future studies and practices will be elucidated. The findings of this review will be presented at a relevant conference and submitted to a peer-reviewed journal for publication. The aim is to complete the submission process by the northern summer of 2025.
ConclusionsNarrative messages represent a theoretically promising strategy for countering health-related misinformation; however, their effectiveness is context-dependent. This review will offer critical insights into the factors that influence the success of narrative corrections for health-related misinformation, contributing to the development of improved correction strategies and a theoretical understanding of narrative corrections.
International Registered Report Identifier (IRRID)DERR1-10.2196/69414
Medicine, Computer applications to medicine. Medical informatics
Abstract
BackgroundHealth care systems are increasingly facing challenges posed by the aging of populations. In particular, hospitalization, both initial and subsequent, is often observed among older adult patients. However, research suggests that nearly 23% of all hospitalizations could be avoided. In this perspective, remote patient monitoring (RPM) systems are emerging as a promising solution, enabling professionals to detect and manage patient complexities early within home-based care settings.
ObjectiveThis study aims to provide additional analyses regarding the impact of the EPOCA RPM system for polypathological older adult patients on the total number of unplanned hospitalization days and admissions, as well as emergency department (ED) visits. In a prior study, we evaluated the impact when the operator of the RPM system is a geriatrician. In this study, we assess the impact when the general practitioner is the operator.
MethodsWe used a retrospective, before-and-after cohort design. Polypathological older adult patients aged 70 and older, who benefited from the EPOCA RPM system for at least 1 year (between February 2022 and August 2024), were included in the analysis. We compared the outcomes between the previous year (Y–1) and the follow-up year (Y) by the EPOCA RPM system. Statistical analyses were significant at P
ResultsIn total, 80 patients were included in the analysis, with an average age of 87. The results showed a significant reduction (PP
ConclusionsOur findings are consistent with our previous results regarding the potential benefits of the EPOCA RPM system in managing care for polypathological older adult patients, this time with general practitioners as system operators. They also support existing evidence on the promise of RPM in improving care and health outcomes for older adult patients while alleviating hospital burdens by reducing unplanned hospitalizations and ED visits. It is, therefore, essential to incorporate reimbursement policies for these RPM initiatives so as to facilitate their adoption within health care systems and enhance their impact on health outcomes.
Computer applications to medicine. Medical informatics, Public aspects of medicine
Background Monitoring early childhood growth is vital, as growth faltering could indicate nutritional or health issues requiring prompt intervention. Our study’s aim was to assess the performance of a length-weight artificial intelligence (LWAI) tool for predicting children’s length and weight from smartphone images.Methods This observational, single-centre study recruited children aged 0–18 months. Investigators measured length and weight in clinic using WHO standard recommendations and captured six images per child in a supine position, while parents took six similar images at home. Within each image, LWAI identifies specific body landmarks and a reference object, then extracts and uses image features to predict the child’s length and weight. The LWAI’s performance was assessed by comparing length/weight prediction versus actual measurements. User experience was collected through questionnaires.Results A total of 215 participants (mean age 6.1 months) were included, and length/weight predictions were generated for 98% (2184/2224) of the images. The mean absolute error (MAE) and mean absolute percentage error (MAPE) for length were 2.47 cm (4.04%) for individual images and 1.89 cm (3.18%) for grouped images (participants with ≥9 images). The corresponding MAE/MAPE for weight were 0.69 kg (11.68%) and 0.56 kg (9.02%), respectively. Regarding usability, 97% of parents who reported not routinely measuring their child’s growth indicated that they would start doing so regularly if a digital tool was available to them.Conclusions The LWAI tool can predict length and weight in children ≤18 months, offering a practical, convenient, artificial intelligence-powered alternative for growth monitoring in home and clinical settings.Trial registration number NCT05079776.
Computer applications to medicine. Medical informatics
Objective: To develop and evaluate a scalable methodology for harmonizing inconsistent units in large-scale clinical datasets, addressing a key barrier to data interoperability. Materials and Methods: We designed a novel unit harmonization system combining BM25, sentence embeddings, Bayesian optimization, and a bidirectional transformer based binary classifier for retrieving and matching laboratory test entries. The system was evaluated using the Optum Clinformatics Datamart dataset (7.5 billion entries). We implemented a multi-stage pipeline: filtering, identification, harmonization proposal generation, automated re-ranking, and manual validation. Performance was assessed using Mean Reciprocal Rank (MRR) and other standard information retrieval metrics. Results: Our hybrid retrieval approach combining BM25 and sentence embeddings (MRR: 0.8833) significantly outperformed both lexical-only (MRR: 0.7985) and embedding-only (MRR: 0.5277) approaches. The transformer-based reranker further improved performance (absolute MRR improvement: 0.10), bringing the final system MRR to 0.9833. The system achieved 83.39\% precision at rank 1 and 94.66\% recall at rank 5. Discussion: The hybrid architecture effectively leverages the complementary strengths of lexical and semantic approaches. The reranker addresses cases where initial retrieval components make errors due to complex semantic relationships in medical terminology. Conclusion: Our framework provides an efficient, scalable solution for unit harmonization in clinical datasets, reducing manual effort while improving accuracy. Once harmonized, data can be reused seamlessly in different analyses, ensuring consistency across healthcare systems and enabling more reliable multi-institutional studies and meta-analyses.
What does Artificial Intelligence (AI) have to contribute to health care? And what should we be looking out for if we are worried about its risks? In this paper we offer a survey, and initial evaluation, of hopes and fears about the applications of artificial intelligence in medicine. AI clearly has enormous potential as a research tool, in genomics and public health especially, as well as a diagnostic aid. It's also highly likely to impact on the organisational and business practices of healthcare systems in ways that are perhaps under-appreciated. Enthusiasts for AI have held out the prospect that it will free physicians up to spend more time attending to what really matters to them and their patients. We will argue that this claim depends upon implausible assumptions about the institutional and economic imperatives operating in contemporary healthcare settings. We will also highlight important concerns about privacy, surveillance, and bias in big data, as well as the risks of over trust in machines, the challenges of transparency, the deskilling of healthcare practitioners, the way AI reframes healthcare, and the implications of AI for the distribution of power in healthcare institutions. We will suggest that two questions, in particular, are deserving of further attention from philosophers and bioethicists. What does care look like when one is dealing with data as much as people? And, what weight should we give to the advice of machines in our own deliberations about medical decisions?
Mohammed Tahri Sqalli, Begali Aslonov, M. Gafurov
et al.
Eye tracking technology has emerged as a valuable tool in the field of medicine, offering a wide range of applications across various disciplines. This perspective article aims to provide a comprehensive overview of the diverse applications of eye tracking technology in medical practice. By summarizing the latest research findings, this article explores the potential of eye tracking technology in enhancing diagnostic accuracy, assessing and improving medical performance, as well as improving rehabilitation outcomes. Additionally, it highlights the role of eye tracking in neurology, cardiology, pathology, surgery, as well as rehabilitation, offering objective measures for various medical conditions. Furthermore, the article discusses the utility of eye tracking in autism spectrum disorders, attention-deficit/hyperactivity disorder (ADHD), and human-computer interaction in medical simulations and training. Ultimately, this perspective article underscores the transformative impact of eye tracking technology on medical practice and suggests future directions for its continued development and integration.
Siân Lowri Griffiths, Graham K Murray, Yanakan Logeswaran
et al.
BackgroundEarly intervention in psychosis (EIP) services are nationally mandated in England to provide multidisciplinary care to people experiencing first-episode psychosis, which disproportionately affects deprived and ethnic minority youth. Quality of service provision varies by region, and people from historically underserved populations have unequal access. In other disease areas, including stroke and dementia, national digital registries coupled with clinical decision support systems (CDSSs) have revolutionized the delivery of equitable, evidence-based interventions to transform patient outcomes and reduce population-level disparities in care. Given psychosis is ranked the third most burdensome mental health condition by the World Health Organization, it is essential that we achieve the same parity of health improvements.
ObjectiveThis paper reports the protocol for the program development phase of this study, in which we aimed to co-design and produce an evidence-based, stakeholder-informed framework for the building, implementation, piloting, and evaluation of a national integrated digital registry and CDSS for psychosis, known as EPICare (Early Psychosis Informatics into Care).
MethodsWe conducted 3 concurrent work packages, with reciprocal knowledge exchange between each. In work package 1, using a participatory co-design framework, key stakeholders (clinicians, academics, policy makers, and patient and public contributors) engaged in 4 workshops to review, refine, and identify a core set of essential and desirable measures and features of the EPICare registry and CDSS. Using a modified Delphi approach, we then developed a consensus of data priorities. In work package 2, we collaborated with National Health Service (NHS) informatics teams to identify relevant data currently captured in electronic health records, understand data retrieval methods, and design the software architecture and data model to inform future implementation. In work package 3, observations of stakeholder workshops and individual interviews with representative stakeholders (n=10) were subject to interpretative qualitative analysis, guided by normalization process theory, to identify factors likely to influence the adoption and implementation of EPICare into routine practice.
ResultsStage 1 of the EPICare study took place between December 2021 and September 2022. The next steps include stage 2 building, piloting, implementation, and evaluation of EPICare in 5 demonstrator NHS Trusts serving underserved and diverse populations with substantial need for EIP care in England. If successful, this will be followed by stage 3, in which we will seek NHS adoption of EPICare for rollout to all EIP services in England.
ConclusionsBy establishing a multistakeholder network and engaging them in an iterative co-design process, we have identified essential and desirable elements of the EPICare registry and CDSS; proactively identified and minimized potential challenges and barriers to uptake and implementation; and addressed key questions related to informatics architecture, infrastructure, governance, and integration in diverse NHS Trusts, enabling us to proceed with the building, piloting, implementation, and evaluation of EPICare.
International Registered Report Identifier (IRRID)DERR1-10.2196/50177
Medicine, Computer applications to medicine. Medical informatics
In healthcare intelligence, the ability to fuse heterogeneous, multi-intent information from diverse clinical sources is fundamental to building reliable decision-making systems. Large Language Model (LLM)-driven information interaction systems currently showing potential promise in the healthcare domain. Nevertheless, they often suffer from information redundancy and coupling when dealing with complex medical intents, leading to severe hallucinations and performance bottlenecks. To this end, we propose MedAide, an LLM-based medical multi-agent collaboration framework designed to enable intent-aware information fusion and coordinated reasoning across specialized healthcare domains. Specifically, we introduce a regularization-guided module that combines syntactic constraints with retrieval augmented generation to decompose complex queries into structured representations, facilitating fine-grained clinical information fusion and intent resolution. Additionally, a dynamic intent prototype matching module is proposed to utilize dynamic prototype representation with a semantic similarity matching mechanism to achieve adaptive recognition and updating of the agent's intent in multi-round healthcare dialogues. Ultimately, we design a rotation agent collaboration mechanism that introduces dynamic role rotation and decision-level information fusion across specialized medical agents. Extensive experiments are conducted on four medical benchmarks with composite intents. Experimental results from automated metrics and expert doctor evaluations show that MedAide outperforms current LLMs and improves their medical proficiency and strategic reasoning.
There is a lack of benchmarks for evaluating large language models (LLMs) in long-form medical question answering (QA). Most existing medical QA evaluation benchmarks focus on automatic metrics and multiple-choice questions. While valuable, these benchmarks fail to fully capture or assess the complexities of real-world clinical applications where LLMs are being deployed. Furthermore, existing studies on evaluating long-form answer generation in medical QA are primarily closed-source, lacking access to human medical expert annotations, which makes it difficult to reproduce results and enhance existing baselines. In this work, we introduce a new publicly available benchmark featuring real-world consumer medical questions with long-form answer evaluations annotated by medical doctors. We performed pairwise comparisons of responses from various open and closed-source medical and general-purpose LLMs based on criteria such as correctness, helpfulness, harmfulness, and bias. Additionally, we performed a comprehensive LLM-as-a-judge analysis to study the alignment between human judgments and LLMs. Our preliminary results highlight the strong potential of open LLMs in medical QA compared to leading closed models. Code & Data: https://github.com/lavita-ai/medical-eval-sphere
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Their applications are versatile and have the potential to improve diagnostic accuracy and decision-making for individual patients while also contributing to enhanced public health monitoring, disease surveillance, and policy-making through more efficient analysis of large data sets. MVLMS integrate natural language processing with medical images to enable a more comprehensive and contextual understanding of medical images alongside their corresponding textual information. Unlike general vision-and-language models trained on diverse, non-specialized datasets, MVLMs are purpose-built for the medical domain, automatically extracting and interpreting critical information from medical images and textual reports to support clinical decision-making. Popular clinical applications of MVLMs include automated medical report generation, medical visual question answering, medical multimodal segmentation, diagnosis and prognosis and medical image-text retrieval. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We conduct a detailed analysis of various vision-and-language model architectures, focusing on their distinct strategies for cross-modal integration/exploitation of medical visual and textual features. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics. Furthermore, we highlight potential challenges and summarize future research trends and directions. The full collection of papers and codes is available at: https://github.com/YtongXie/Medical-Vision-and-Language-Tasks-and-Methodologies-A-Survey.
Medical image analysis is essential to clinical diagnosis and treatment, which is increasingly supported by multi-modal large language models (MLLMs). However, previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information. This paper aims to advance 3D medical image analysis with MLLMs. To this end, we present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs specifically tailored for various 3D medical tasks, such as image-text retrieval, report generation, visual question answering, positioning, and segmentation. Additionally, we propose M3D-LaMed, a versatile multi-modal large language model for 3D medical image analysis. Furthermore, we introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks. Through comprehensive evaluation, our method proves to be a robust model for 3D medical image analysis, outperforming existing solutions. All code, data, and models are publicly available at: https://github.com/BAAI-DCAI/M3D.
Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.
# Background
Closed claims are frequently used in outcomes research studies. Lately, the availability of open claims has increased the possibility of obtaining information faster and on a larger scale. However, because of the possibility of missing claims and duplications, these data sets have not been highly utilized in medical research.
# Objective
To compare frequently used healthcare utilization measures between closed claims and open claims to analyze if the possibility of missing claims in open claims data creates a downward bias in the estimates.
# Methods
We identified 18 different diseases using 2022 data from 2 closed claims data sets (MarketScan® and PharMetrics® Plus) and 1 open claims database (Kythera). After applying an algorithm that removes possible duplications from open claims data, we compared healthcare utilizations such as inpatient, emergency department, and outpatient use and length of stay among these 3 data sets. We applied standardized differences to compare the medians for each outcome.
# Results
The sample size of the open claims data sets was 10 to 65 times larger than closed claims data sets depending on disease type. For each disease, the estimates of healthcare utilization were similar between the open claims and closed claims data. The difference was statistically insignificant.
# Conclusions
Open claims data with a bigger sample size and more current available information provide essential advantages for healthcare outcomes research studies. Therefore, especially for new medications and rare diseases, open claims data can provide information much earlier than closed claims, which usually have a time lag of 6 to 8 months.
Computer applications to medicine. Medical informatics
Healthcare applications with the Internet of Things (IoT) are often safety-critical, thus, require extensive testing. Such applications are often connected to smart medical devices from various vendors. System-level testing of such applications requires test infrastructures physically integrating medical devices, which is time and monetary-wise expensive. Moreover, applications continuously evolve, e.g., introducing new devices and users and updating software. Nevertheless, a test infrastructure enabling testing with a few devices is insufficient for testing healthcare IoT systems, hence compromising their dependability. In this paper, we propose a model-based approach for the creation and operation of digital twins (DTs) of medicine dispensers as a replacement for physical devices to support the automated testing of IoT applications at scale. We evaluate our approach with an industrial IoT system with medicine dispensers in the context of Oslo City and its industrial partners, providing healthcare services to its residents. We study the fidelity of DTs in terms of their functional similarities with their physical counterparts: medicine dispensers. Results show that the DTs behave more than 92% similar to the physical medicine dispensers, providing a faithful replacement for the dispenser.
Christos Matsoukas, Johan Fredin Haslum, Moein Sorkhei
et al.
Convolutional Neural Networks (CNNs) have reigned for a decade as the de facto approach to automated medical image diagnosis, pushing the state-of-the-art in classification, detection and segmentation tasks. Over the last years, vision transformers (ViTs) have appeared as a competitive alternative to CNNs, yielding impressive levels of performance in the natural image domain, while possessing several interesting properties that could prove beneficial for medical imaging tasks. In this work, we explore the benefits and drawbacks of transformer-based models for medical image classification. We conduct a series of experiments on several standard 2D medical image benchmark datasets and tasks. Our findings show that, while CNNs perform better if trained from scratch, off-the-shelf vision transformers can perform on par with CNNs when pretrained on ImageNet, both in a supervised and self-supervised setting, rendering them as a viable alternative to CNNs.