Paula Romina Putallaz, Myrna Marti, Lucas Gago-Galvagno
et al.
Resumo Apresenta-se uma análise crítica sobre o impacto da inteligência artificial na educação, com ênfase em sua capacidade de otimizar o processo por meio da personalização de experiências e do processamento de dados em tempo real. Além disso, considera-se a relação entre a inteligência artificial e o “efeito Pigmaleão”, fenômeno pelo qual as expectativas influenciam o desempenho dos estudantes. Considerando que essa ferramenta tecnológica, ao fornecer feedback personalizado em tempo real, pode moldar essas expectativas, influenciando tanto o desempenho acadêmico quanto a interação com os docentes, refletimos sobre seu uso ao propor desafios éticos como a privacidade, os vieses algorítmicos e o acesso desigual à tecnologia, que podem aumentar as brechas existentes. Destaca-se a importância da formação para o uso da inteligência artificial e a necessidade de uma abordagem ética para evitar desigualdades e discriminação.
Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.
Chongcong Jiang, Tianxingjian Ding, Chuhan Song
et al.
Promptable segmentation foundation models such as SAM3 have demonstrated strong generalization capabilities through interactive and concept-based prompting. However, their direct applicability to medical image segmentation remains limited by severe domain shifts, the absence of privileged spatial prompts, and the need to reason over complex anatomical and volumetric structures. Here we present Medical SAM3, a foundation model for universal prompt-driven medical image segmentation, obtained by fully fine-tuning SAM3 on large-scale, heterogeneous 2D and 3D medical imaging datasets with paired segmentation masks and text prompts. Through a systematic analysis of vanilla SAM3, we observe that its performance degrades substantially on medical data, with its apparent competitiveness largely relying on strong geometric priors such as ground-truth-derived bounding boxes. These findings motivate full model adaptation beyond prompt engineering alone. By fine-tuning SAM3's model parameters on 33 datasets spanning 10 medical imaging modalities, Medical SAM3 acquires robust domain-specific representations while preserving prompt-driven flexibility. Extensive experiments across organs, imaging modalities, and dimensionalities demonstrate consistent and significant performance gains, particularly in challenging scenarios characterized by semantic ambiguity, complex morphology, and long-range 3D context. Our results establish Medical SAM3 as a universal, text-guided segmentation foundation model for medical imaging and highlight the importance of holistic model adaptation for achieving robust prompt-driven segmentation under severe domain shift. Code and model will be made available at https://github.com/AIM-Research-Lab/Medical-SAM3.
Liquid xenon time projection chambers offer a homogeneous detection medium with excellent intrinsic energy resolution, fast scintillation, and true three-dimensional position sensitivity, making them an attractive alternative to crystal-based detectors for positron emission tomography (PET). In this work, we present a new single-phase liquid xenon time projection chamber (TPC) concept optimized for medical imaging, employing combined scintillation and electroluminescence-based ionization readout to enable low-noise signal amplification and intrinsic depth-of-interaction measurement. We evaluate the system-level performance of this detector concept using Monte Carlo simulations based on OpenGATE and Geant4, with direct comparison to conventional LYSO-based PET systems. The study focuses on detection sensitivity, energy-based event selection efficiency, and reconstructed spatial resolution. While LYSO detectors provide higher absolute stopping efficiency due to their higher density, liquid xenon detectors exhibit improved photopeak purity as a result of superior intrinsic energy resolution, leading to enhanced rejection of scattered events. Point-source reconstruction studies demonstrate that the intrinsic three-dimensional position sensitivity of the liquid xenon TPC translates into a reconstructed spatial resolution of approximately 1~mm full width at half maximum (FWHM) at the system level, compared to approximately 4~mm for LYSO-based systems under comparable conditions. These results indicate that liquid-xenon-based PET detectors can achieve competitive or superior imaging performance, particularly for applications requiring high spatial resolution, large axial acceptance, and scalable detector geometries.
In the early days of the People's Republic of China (PRC), faced with the severe situation of epidemic disease, the Communist Party of China (CPC) actively carried out a series of mass mobilizations for vaccination, including top-down organizational mobilization, flexible and diverse publicity mobilization, and emotional mobilization through new and old comparisons. Due to the effective leadership and proper methods of mass mobilization of the CPC, large-scale vaccination was successfully implemented in the early days of PRC, which enhanced the political identity of the people and enriched the practice of mass mobilization of the CPC. These valuable experiences have important value for better carrying out mass mobilization in the new era to increase the vaccination rate of the entire population, and to modernize the national health management capabilities.
This letter responds to a recent study on violence among schizophrenic inpatients at Al-Rashad Training Hospital. While commending the authors’ important contribution, the letter highlights key areas needing further attention to strengthen the findings. It discusses the exclusion of female patients, the limitations of a cross-sectional design, and the need to analyze systemic factors such as staffing, training, and ward environment. The letter advocates for the use of comprehensive diagnostic and risk assessment tools beyond standard criteria to better understand symptom patterns and predict aggression. Suggestions include using validated scales like PANSS, BPRS, and HCR-20. The author calls for broader, longitudinal research that considers gender differences, symptom severity, and institutional variables to improve patient care and safety. By addressing these gaps, future studies can provide stronger evidence to guide clinical practice and policy in psychiatric inpatient settings.
History of medicine. Medical expeditions, General works
Brain imaging technologies has become an increasingly popular technique among neuroscientists and clinicians for aiding diagnosis, prognosis, and guiding the treatment of brain diseases. However, with the widespread application of brain imaging technologies, the ethical issues of incidental findings in brain imaging research has gradually emerged, becoming a central topic in bioethical discourse. Key concerns include the insufficiency of informed consent, the ambiguity of disclosure responsibilities, and the complexity of subsequent actions. Addressing incidental findings requires adherence to medical ethical principles and the implementation of a comprehensive, multidimensional strategy. This should include considerations of informed consent processes, disclosure guidelines, post-disclosure protocols, and interdisciplinary collaboration, with the aim of promoting the responsible and sustainable development of brain imaging research.
Medical benchmarks are indispensable for evaluating the capabilities of language models in healthcare for non-English-speaking communities,therefore help ensuring the quality of real-life applications. However, not every community has sufficient resources and standardized methods to effectively build and design such benchmark, and available non-English medical data is normally fragmented and difficult to verify. We developed an approach to tackle this problem and applied it to create the first Vietnamese medical question benchmark, featuring 14,000 multiple-choice questions across 34 medical specialties. Our benchmark was constructed using various verifiable sources, including carefully curated medical exams and clinical records, and eventually annotated by medical experts. The benchmark includes four difficulty levels, ranging from foundational biological knowledge commonly found in textbooks to typical clinical case studies that require advanced reasoning. This design enables assessment of both the breadth and depth of language models' medical understanding in the target language thanks to its extensive coverage and in-depth subject-specific expertise. We release the benchmark in three parts: a sample public set (4k questions), a full public set (10k questions), and a private set (2k questions) used for leaderboard evaluation. Each set contains all medical subfields and difficulty levels. Our approach is scalable to other languages, and we open-source our data construction pipeline to support the development of future multilingual benchmarks in the medical domain.
Medical consultation dialogues contain critical clinical information, yet their unstructured nature hinders effective utilization in diagnosis and treatment. Traditional methods, relying on rule-based or shallow machine learning techniques, struggle to capture deep and implicit semantics. Recently, large pre-trained language models and Low-Rank Adaptation (LoRA), a lightweight fine-tuning method, have shown promise for structured information extraction. We propose EMRModel, a novel approach that integrates LoRA-based fine-tuning with code-style prompt design, aiming to efficiently convert medical consultation dialogues into structured electronic medical records (EMRs). Additionally, we construct a high-quality, realistically grounded dataset of medical consultation dialogues with detailed annotations. Furthermore, we introduce a fine-grained evaluation benchmark for medical consultation information extraction and provide a systematic evaluation methodology, advancing the optimization of medical natural language processing (NLP) models. Experimental results show EMRModel achieves an F1 score of 88.1%, improving by49.5% over standard pre-trained models. Compared to traditional LoRA fine-tuning methods, our model shows superior performance, highlighting its effectiveness in structured medical record extraction tasks.
Mahrokh Javaherforooshzadeh, Parvin Ehteshamzadeh, Farzaneh Hooman
et al.
Background and Objectives: Type 2 diabetes is a chronic condition that can significantly impact emotional well-being. Individuals often struggle with emotional regulation, leading to decreased life satisfaction. Therefore, this study aimed to determine the effect of spiritual therapy on cognitive emotion regulation (CER) and life satisfaction in individuals with type 2 diabetes.
Methods: This study employed a quasi-experimental design utilizing a pre-test and post-test approach with a control group. The target population encompassed all individuals diagnosed with type 2 diabetes working within the Mashhad education sector in 2023. A convenience sampling method was employed to recruit a sample of 24 participants (equally divided into two groups) who met the study’s inclusion criteria. Random assignment allocated participants to either the experimental or control group. Both groups completed the CER questionnaire and the satisfaction with life scale at both the pre-test and post-test stages. The experimental group received spiritual therapy delivered in eight sessions, each lasting 90 minutes. Data analysis was conducted using analysis of covariance (ANCOVA).
Results: The findings revealed a significant reduction in maladaptive CER scores in the post-test stage for the group receiving spiritual therapy compared to the control group (P<0.001). The results also showed that spiritual therapy led to an increase in adaptive CER and life satisfaction at post-test (P<0.001).
Conclusion: The findings demonstrated that spiritual therapy yielded significant benefits for the intervention group. These results suggest that spiritual therapy can be a valuable complementary approach for managing emotional challenges and improving overall well-being in individuals with type 2 diabetes.
Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion. This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark. In the multilingual medical benchmark, the released Apollo models, at various relatively-small sizes (i.e., 0.5B, 1.8B, 2B, 6B, and 7B), achieve the best performance among models of equivalent size. Especially, Apollo-7B is the state-of-the-art multilingual medical LLMs up to 70B. Additionally, these lite models could be used to improve the multi-lingual medical capabilities of larger models without fine-tuning in a proxy-tuning fashion. We will open-source training corpora, code, model weights and evaluation benchmark.
Accurate prediction of medical conditions with straight past clinical evidence is a long-sought topic in the medical management and health insurance field. Although great progress has been made with machine learning algorithms, the medical community is still skeptical about the model accuracy and interpretability. This paper presents an innovative hierarchical attention deep learning model to achieve better prediction and clear interpretability that can be easily understood by medical professionals. This paper developed an Interpretable Hierarchical Attention Network (IHAN). IHAN uses a hierarchical attention structure that matches naturally with the medical history data structure and reflects patients encounter (date of service) sequence. The model attention structure consists of 3 levels: (1) attention on the medical code types (diagnosis codes, procedure codes, lab test results, and prescription drugs), (2) attention on the sequential medical encounters within a type, (3) attention on the individual medical codes within an encounter and type. This model is applied to predict the occurrence of stage 3 chronic kidney disease (CKD), using three years medical history of Medicare Advantage (MA) members from an American nationwide health insurance company. The model takes members medical events, both claims and Electronic Medical Records (EMR) data, as input, makes a prediction of stage 3 CKD and calculates contribution from individual events to the predicted outcome.
Visual task adaptation has been demonstrated to be effective in adapting pre-trained Vision Transformers (ViTs) to general downstream visual tasks using specialized learnable layers or tokens. However, there is yet a large-scale benchmark to fully explore the effect of visual task adaptation on the realistic and important medical domain, particularly across diverse medical visual modalities, such as color images, X-ray, and CT. To close this gap, we present Med-VTAB, a large-scale Medical Visual Task Adaptation Benchmark consisting of 1.68 million medical images for diverse organs, modalities, and adaptation approaches. Based on Med-VTAB, we explore the scaling law of medical prompt tuning concerning tunable parameters and the generalizability of medical visual adaptation using non-medical/medical pre-train weights. Besides, we study the impact of patient ID out-of-distribution on medical visual adaptation, which is a real and challenging scenario. Furthermore, results from Med-VTAB indicate that a single pre-trained model falls short in medical task adaptation. Therefore, we introduce GMoE-Adapter, a novel method that combines medical and general pre-training weights through a gated mixture-of-experts adapter, achieving state-of-the-art results in medical visual task adaptation.
There is increasing interest in the application large language models (LLMs) to the medical field, in part because of their impressive performance on medical exam questions. While promising, exam questions do not reflect the complexity of real patient-doctor interactions. In reality, physicians' decisions are shaped by many complex factors, such as patient compliance, personal experience, ethical beliefs, and cognitive bias. Taking a step toward understanding this, our hypothesis posits that when LLMs are confronted with clinical questions containing cognitive biases, they will yield significantly less accurate responses compared to the same questions presented without such biases. In this study, we developed BiasMedQA, a benchmark for evaluating cognitive biases in LLMs applied to medical tasks. Using BiasMedQA we evaluated six LLMs, namely GPT-4, Mixtral-8x70B, GPT-3.5, PaLM-2, Llama 2 70B-chat, and the medically specialized PMC Llama 13B. We tested these models on 1,273 questions from the US Medical Licensing Exam (USMLE) Steps 1, 2, and 3, modified to replicate common clinically-relevant cognitive biases. Our analysis revealed varying effects for biases on these LLMs, with GPT-4 standing out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which were disproportionately affected by cognitive bias. Our findings highlight the critical need for bias mitigation in the development of medical LLMs, pointing towards safer and more reliable applications in healthcare.
Objective: To investigate GPT-3.5 in generating and coding medical documents with ICD-10 codes for data augmentation on low-resources labels. Materials and Methods: Employing GPT-3.5 we generated and coded 9,606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on a MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices were employed to determine within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated both on prompt-guided self-generated data and real MIMIC-IV data. Clinical professionals evaluated the clinical acceptability of the generated documents. Results: Augmentation slightly hinders the overall performance of the models but improves performance for the generation candidate codes and their families, including one unseen in the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 can identify ICD-10 codes by the prompted descriptions, but performs poorly on real data. Evaluators note the correctness of generated concepts while suffering in variety, supporting information, and narrative. Discussion and Conclusion: GPT-3.5 alone is unsuitable for ICD-10 coding. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Discharge summaries generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives. They are unsuitable for clinical practice.
Academics (should) strive to submit to journals which are academically sound and scholarly. To achieve this, they could either submit to journals that appear exclusively on safelists (occasionally referred to as whitelists, although this term tends to be avoided), or avoid submitting to journals on watchlists (occasionally referred to as blacklists, although this term tends to be avoided). The most well-known of these lists was curated by Jeffrey Beall. Beall’s Lists (there are two, one for stand-alone journals and one for publishers) were taken offline by Beall himself in January 2017. Prior to 2017, Beall’s Lists were widely cited and utilized, including to make quantitative claims about scholarly publishing. Even after Beall’s Lists became obsolete (they have not been maintained for the past six years), they continue to be widely cited and used. This paper argues that the use of Beall’s Lists, pre- and post-2017, may constitute a methodological error and, even if papers carry a disclaimer or limitations section noting this weakness, their conclusions cannot always be relied upon. This paper also argues for the need to conduct a detailed post-publication assessment of reports in the literature that used Beall’s Lists to validate their findings and conclusions, assuming that it becomes accepted that Beall’s Lists are not a reliable resource for scientific investigation. Finally, this paper contends that any papers that have identified methodological errors should be corrected. Several lists that were cloned from Beall’s Lists have also emerged and are also being cited. These should also be included in any post-publication investigation that is conducted.
Edinaldo Rodrigues da Silva Júnior, Rossana Karla Gois Ferreira, Priscilla Alves Nobrega Gambarra Souto
Resumo Este estudo investiga a transmissão de más notícias em contexto infantil por meio de revisão integrativa da literatura. As buscam nas bases de dados científicas compreenderam trabalhos publicados de 2015 a 2022 e os resultados indicaram que a transmissão de más notícias deve ocorrer de forma empática, objetiva e franca, envolvendo tanto a criança como os acompanhantes, mas, no caso de crianças, a comunicação deve ser parcial, com adequação do conteúdo ao entendimento ou maturidade. Por fim, este estudo visou trazer sugestões e evidências científicas sobre a transmissão das más notícias na infância, contribuindo ainda para enriquecer o conhecimento sobre o assunto, principalmente para os profissionais de saúde que lidam diretamente com esse tipo de situação.
Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods.
Tim J. Adler, Jan-Hinrich Nölke, Annika Reinke
et al.
Current deep learning-based solutions for image analysis tasks are commonly incapable of handling problems to which multiple different plausible solutions exist. In response, posterior-based methods such as conditional Diffusion Models and Invertible Neural Networks have emerged; however, their translation is hampered by a lack of research on adequate validation. In other words, the way progress is measured often does not reflect the needs of the driving practical application. Closing this gap in the literature, we present the first systematic framework for the application-driven validation of posterior-based methods in inverse problems. As a methodological novelty, it adopts key principles from the field of object detection validation, which has a long history of addressing the question of how to locate and match multiple object instances in an image. Treating modes as instances enables us to perform mode-centric validation, using well-interpretable metrics from the application perspective. We demonstrate the value of our framework through instantiations for a synthetic toy example and two medical vision use cases: pose estimation in surgery and imaging-based quantification of functional tissue parameters for diagnostics. Our framework offers key advantages over common approaches to posterior validation in all three examples and could thus revolutionize performance assessment in inverse problems.