Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan
et al.
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.
4050 sitasi
en
Computer Science
Supporting best practice in reflexive thematic analysis reporting in Palliative Medicine: A review of published research and introduction to the Reflexive Thematic Analysis Reporting Guidelines (RTARG)
Virginia Braun, Victoria Clarke
Background: Reflexive thematic analysis is widely used in qualitative research published in Palliative Medicine, and in the broader field of health research. However, this approach is often not used well. Common problems in published reflexive thematic analysis in general include assuming thematic analysis is a singular approach, rather than a family of methods, confusing themes and topics, and treating and reporting reflexive thematic analysis as if it is atheoretical. Purpose: We reviewed 20 papers published in Palliative Medicine between 2014 and 2022 that cited Braun and Clarke, identified using the search term ‘thematic analysis’ and the default ‘relevance’ setting on the journal webpage. The aim of the review was to identify common problems and instances of good practice. Problems centred around a lack of methodological coherence, and a lack of reflexive openness, clarity and detail in reporting. We considered contributors to these common problems, including the use of reporting checklists that are not coherent with the values of reflexive thematic analysis. To support qualitative researchers in producing coherent and reflexively open reports of reflexive thematic analysis we have developed the Reflexive Thematic Analysis Reporting Guidelines (the RTARG; in Supplemental Materials) informed by this review, other reviews we have done and our values and experience as qualitative researchers. The RTARG is also intended for use by peer reviewers to encourage methodologically coherent reviewing. Key learning points: Methodological incoherence and a lack of transparency are common problems in reflexive thematic analysis research published in Palliative Medicine. Coherence can be facilitated by researchers and reviewers striving to be knowing – thoughtful, deliberative, reflexive and theoretically aware – practitioners and appraisers of reflexive thematic analysis and developing an understanding of the diversity within the thematic analysis family of methods.
Dermatology in general medicine
T. Fitzpatrick
Introduction biology and pathophysiology of skin disorders presenting in the skin and mucous membranes dermatology and internal medicine diseases due to microbial agents therapeutics paediatric and geriatric dermatology.
Capabilities of Gemini Models in Medicine
Khaled Saab, Tao Tu, Wei-Hung Weng
et al.
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health&medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
329 sitasi
en
Computer Science
Artificial intelligence and machine learning overview in pathology & laboratory medicine: A general review of data preprocessing and basic supervised concepts.
Samer Albahra, Tom Gorbett, Scott A. Robertson
et al.
Machine learning (ML) is becoming an integral aspect of several domains in medicine. Yet, most pathologists and laboratory professionals remain unfamiliar with such tools and are unprepared for their inevitable integration. To bridge this knowledge gap, we present an overview of key elements within this emerging data science discipline. First, we will cover general, well-established concepts within ML, such as data type concepts, data preprocessing methods, and ML study design. We will describe common supervised and unsupervised learning algorithms and their associated common machine learning terms (provided within a comprehensive glossary of terms that are discussed within this review). Overall, this review will offer a broad overview of the key concepts and algorithms in machine learning, with a focus on pathology and laboratory medicine. The objective is to provide an updated useful reference for those new to this field or those who require a refresher.
Abstracts from the 2023 Annual Meeting of the Society of General Internal Medicine
Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models
Jihoon Jeong
Model Medicine is the science of understanding, diagnosing, treating, and preventing disorders in AI models, grounded in the principle that AI models -- like biological organisms -- have internal structures, dynamic processes, heritable traits, observable symptoms, classifiable conditions, and treatable states. This paper introduces Model Medicine as a research program, bridging the gap between current AI interpretability research (anatomical observation) and the systematic clinical practice that complex AI systems increasingly require. We present five contributions: (1) a discipline taxonomy organizing 15 subdisciplines across four divisions -- Basic Model Sciences, Clinical Model Sciences, Model Public Health, and Model Architectural Medicine; (2) the Four Shell Model (v3.3), a behavioral genetics framework empirically grounded in 720 agents and 24,923 decisions from the Agora-12 program, explaining how model behavior emerges from Core--Shell interaction; (3) Neural MRI (Model Resonance Imaging), a working open-source diagnostic tool mapping five medical neuroimaging modalities to AI interpretability techniques, validated through four clinical cases demonstrating imaging, comparison, localization, and predictive capability; (4) a five-layer diagnostic framework for comprehensive model assessment; and (5) clinical model sciences including the Model Temperament Index for behavioral profiling, Model Semiology for symptom description, and M-CARE for standardized case reporting. We additionally propose the Layered Core Hypothesis -- a biologically-inspired three-layer parameter architecture -- and a therapeutic framework connecting diagnosis to treatment.
Prevalence of Use of Traditional, Complementary and Alternative Medicine by the General Population: A Systematic Review of National Studies Published from 2010 to 2019
E. L. Lee, N. Richards, J. Harrison
et al.
Traditional, complementary and alternative medicine (TCAM) refers to a broad range of health practices and products typically not part of the 'conventional medicine' system, and its use is substantial among the general population. TCAM products and therapies may be used in addition to, or instead of, conventional medicine approaches, and some have been associated with adverse reactions or other harms. The aims of this systematic review were to identify and examine recently published national studies globally on the prevalence of TCAM use in the general population, to review the research methods used in these studies and to propose best practices for future studies exploring prevalence of use of TCAM. MEDLINE, Embase, CINAHL, PsycINFO and AMED were searched to identify relevant studies published since 2010. Articles/reports describing the prevalence of TCAM use in a national study among the general population were included. The quality of included studies was assessed using a risk of bias tool developed by Hoy et al. Relevant data were extracted and summarised. Forty studies from 14 countries, comprising 21 national surveys and one cross-national survey, were included. Studies explored the use of TCAM products (e.g. herbal medicines), TCAM practitioners/therapies, or both. Included studies used different TCAM definitions, prevalence time frames and data collection tools, methods and analyses, thereby limiting comparability across studies. The reported prevalence of use of TCAM (products and/or practitioners/therapies) over the previous 12 months was 24–71.3%. The reported prevalence of use of TCAM (products and/or practitioners/therapies) is high, but may underestimate use. Published prevalence data varied considerably, at least in part because studies utilise different data collection tools, methods and operational definitions, limiting cross-study comparisons and study reproducibility. For best practice, comprehensive, detailed data on TCAM exposures are needed, and studies should report an operational definition (including the context of TCAM use, products/practices/therapies included and excluded), publish survey questions and describe the data-coding criteria and analysis approach used. Traditional, complementary and alternative medicine (TCAM) includes products (e.g. herbal medicines, dietary supplements) and therapies/practices (e.g. chiropractic, acupuncture), and is a popular healthcare choice for many people. This study systematically reviewed national surveys of TCAM use around the world. We identified studies carried out in 14 different countries and one continent (Europe) on the extent of use of TCAM in the general population. TCAM use was found to be substantial, ranging from 24 to 71.3% in different countries. National surveys use different methods and different survey questionnaires. Some studies did not publish the survey questionnaire that they used and/or did not describe the types of TCAM included in the study. This means that it is not possible to compare the results between countries or to do further data analysis. For example, the survey questions from different countries asked people if they had ‘used’ or ‘seen a practitioner’ for a specific therapy, such as homeopathy. These questions look similar, but could elicit different answers from people. This means that the answers to these questions cannot be pooled together or compared directly. Also, some studies collected information on use of a category of TCAM products, such as herbal medicines, but other studies collected information on use of specific herbal medicines, such as St John’s wort. New surveys of the extent of use of TCAM should provide full information on the types of TCAM products, practices and therapies included in the study and consider collecting comprehensive information on use of specific TCAM products, practices and therapies.
Holistic Artificial Intelligence in Medicine; improved performance and explainability
Periklis Petridis, Georgios Margaritis, Vasiliki Stoumpou
et al.
With the increasing interest in deploying Artificial Intelligence in medicine, we previously introduced HAIM (Holistic AI in Medicine), a framework that fuses multimodal data to solve downstream clinical tasks. However, HAIM uses data in a task-agnostic manner and lacks explainability. To address these limitations, we introduce xHAIM (Explainable HAIM), a novel framework leveraging Generative AI to enhance both prediction and explainability through four structured steps: (1) automatically identifying task-relevant patient data across modalities, (2) generating comprehensive patient summaries, (3) using these summaries for improved predictive modeling, and (4) providing clinical explanations by linking predictions to patient-specific medical knowledge. Evaluated on the HAIM-MIMIC-MM dataset, xHAIM improves average AUC from 79.9% to 90.3% across chest pathology and operative tasks. Importantly, xHAIM transforms AI from a black-box predictor into an explainable decision support system, enabling clinicians to interactively trace predictions back to relevant patient data, bridging AI advancements with clinical utility.
Refuting "Debunking the GAMLSS Myth: Simplicity Reigns in Pulmonary Function Diagnostics"
Robert A. Rigby, Mikis D. Stasinopoulos, Achim Zeileis
et al.
We read with interest the above article by Zavorsky (2025, Respiratory Medicine, doi:10.1016/j.rmed.2024.107836) concerning reference equations for pulmonary function testing. The author compares a Generalized Additive Model for Location, Scale, and Shape (GAMLSS), which is the standard adopted by the Global Lung Function Initiative (GLI), with a segmented linear regression (SLR) model, for pulmonary function variables. The author presents an interesting comparison; however there are some fundamental issues with the approach. We welcome this opportunity for discussion of the issues that it raises. The author's contention is that (1) SLR provides "prediction accuracies on par with GAMLSS"; and (2) the GAMLSS model equations are "complicated and require supplementary spline tables", whereas the SLR is "more straightforward, parsimonious, and accessible to a broader audience". We respectfully disagree with both of these points.
The promise and perils of AI in medicine
Robert Sparrow, Joshua Hatherley
What does Artificial Intelligence (AI) have to contribute to health care? And what should we be looking out for if we are worried about its risks? In this paper we offer a survey, and initial evaluation, of hopes and fears about the applications of artificial intelligence in medicine. AI clearly has enormous potential as a research tool, in genomics and public health especially, as well as a diagnostic aid. It's also highly likely to impact on the organisational and business practices of healthcare systems in ways that are perhaps under-appreciated. Enthusiasts for AI have held out the prospect that it will free physicians up to spend more time attending to what really matters to them and their patients. We will argue that this claim depends upon implausible assumptions about the institutional and economic imperatives operating in contemporary healthcare settings. We will also highlight important concerns about privacy, surveillance, and bias in big data, as well as the risks of over trust in machines, the challenges of transparency, the deskilling of healthcare practitioners, the way AI reframes healthcare, and the implications of AI for the distribution of power in healthcare institutions. We will suggest that two questions, in particular, are deserving of further attention from philosophers and bioethicists. What does care look like when one is dealing with data as much as people? And, what weight should we give to the advice of machines in our own deliberations about medical decisions?
The Utilization of Point of Care Ultrasound (POCUS) for the Confirmation of Gastric and Post-Pyloric Feeding Tube Placement in a Pediatric Intensive Care Unit
Alonso Marron, Michael Wolf, Marla Levine
et al.
The aim of this study was to investigate the role of point of care ultrasound (POCUS) as an alternative imaging modality to confirm the location of gastric and post-pyloric feeding tubes in patients admitted to the pediatric intensive care unit (PICU). This was a prospective descriptive study performed at a tertiary care children’s hospital. Patients from birth to 17 years of age in whom the medical team placed a temporary enteral feeding tube were eligible for enrollment. The study physician, who was blinded to the radiographic findings, performed a POCUS study of the abdomen. An abdominal radiograph was obtained to confirm the placement in all patients. A total of 13 patients were enrolled, and 14 abdominal POCUS exams were completed. POCUS accurately identified the location of the enteral feeding tube in 10 of the 14 cases. POCUS had a sensitivity and specificity of 85.7% and 57.1%, respectively, in identifying gastric tubes. It had a sensitivity and specificity of 66.7% and 87.5%, respectively, in identifying post-pyloric tubes. No adverse events were reported. This study showed that POCUS had moderate sensitivity and specificity and was, overall, safe. Further studies can assess the level of training needed for improvement in accuracy, and larger studies can help support the findings of this data that POCUS is a safe and accurate alternative to radiographs for enteral feeding tube placement confirmation.
Internal medicine, Medical technology
Phyto-derived interferons: a promising frontier in antiviral therapy development
Baskar Venkidasamy, Ashok Kumar Balaraman, Muthu Thiruvengadan
Neoplasms. Tumors. Oncology. Including cancer and carcinogens, Biology (General)
CHOLANGIOCARCINOMA IN INDIVIDUALS WITH CHRONIC LIVER DISEASE IS DIAGNOSED EARLIER, LEADING TO BETTER PROGNOSIS
Laura Izquierdo Sanchez, Julen Matin Robles, Jone Narbaiza
et al.
Introduction and Objectives: Cholangiocarcinoma (CCA) incidence and mortality are rising globally. Chronic liver diseases (CLD) are recognized risk factors. This study aimed to compare the clinical presentation and outcomes of CCA in patients with and without CLD, using data from the International CCA Registry. Patients and Methods: The international CCA Registry is a multicenter observational study enrolling cases from 54 centers across Latin America, Europe, and Asia (2010–2024). Results: Among 3,693 patients enrolled, 916 had CLD and 2,777 did not. Common CLD conditions were fatty liver disease, cirrhosis, viral hepatitis, and primary sclerosing cholangitis. Compared to non-CLD patients, those with CLD were more often male (69% vs. 53%), younger at diagnosis (63 vs. 66 years), and had higher rates of metabolic risk factors, alcohol use, and smoking. Intrahepatic CCA was more frequent in CLD patients (64% vs. 43%), whereas distal CCA was more common in non-CLD cases (20% vs. 9%). CLD patients had better performance status (ECOG 0: 53% vs. 35%), lower CA19-9 levels (59.0 vs. 134.5 U/mL), and more localized disease (56% vs. 48%). Curative-intent surgery was more frequent in the CLD group (59% vs. 48%), translating into longer median overall survival (12.3 vs. 11.0 months) and higher 5-year survival (OR = 1.67; p < 0.001). The benefit was especially evident in intrahepatic CCA. Treatment responses were comparable between groups. Conclusions: CCA is diagnosed at earlier stages in individuals with CLD, likely due to certain clinical surveillance, leading to better prognosis. Prospective validation and standardized surveillance protocols are warrant.
Specialties of internal medicine
Evaluation of Interleukin-10, Vascular Endothelial Growth Factor Levels, and Bone Marrow Parameters in Multiple Myeloma Patients at Diagnosis and After Treatment
Fulya Memis, Meryem Yalvac Kandefer, Sonay Aydin
et al.
<b>Background:</b> Interleukin-10 (IL-10) and vascular endothelial growth factor (VEGF) are believed to possess a role in the pathophysiology of multiple myeloma (MM). We aimed to assess the significance of these parameters in the diagnosis, monitoring, and prognosis of the disease by examining them in patients at diagnosis and post-treatment and comparing the findings with those of healthy individuals. <b>Methods:</b> We conducted blood sampling from 35 patients diagnosed with MM at the time of diagnosis and from 15 of these patients post-treatment. We additionally assessed similar serum markers in a control group of 15 healthy individuals. Furthermore, we documented laboratory results, organ involvement, comorbidities, and CD27-CD81 levels assessed using flow cytometry in the bone marrow, along with treatments and patient responses. We also examined the quantity of cells collected during mobilization in patients who had autologous stem cell transplantation. <b>Results:</b> We found a positive correlation (<i>p</i> = 0.028/<i>p</i> = 0.035) between IL-10 and VEGF with the international staging score. In patients with renal involvement, IL-10 levels were higher and VEGF levels were lower than those without renal involvement (<i>p</i> = 0.011/<i>p</i> = 0.012). We showed that VEGF levels decreased significantly with treatment (<i>p</i> = 0.001). We found no statistically significant correlation between treatment responses and IL-10 and VEGF. The number of CD34 cells collected by mobilization showed a negative correlation with CD27 and a positive correlation with VEGF (<i>p</i> = 0.007/<i>p</i> = 0.032). <b>Conclusions:</b> Serum IL-10 level is associated with ISS and renal involvement in MM patients. There is a positive correlation between serum VEGF levels and the number of stem cells collected during mobilization. As CD27 expression increases, the number of stem cells collected in mobilization decreases.
Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research
Jinyoung Moon, Yongseok Mun
Abstract Background The US National Health and Nutrition Examination Survey (NHANES) dataset does not include a specific question or laboratory test to confirm a history of cancer diagnosis. However, if straightforward variables for cancer history are introduced, US NHANES could be effectively utilized in future cancer epidemiology studies. To address this gap, the authors developed a cancer patient database from the US NHANES datasets by employing multiple R programming codes. Methods To illustrate the practical application of this methodology to a real-world problem, the authors extracted the R codes applied in an academic paper published in another journal on January 30th, 2024 ( https://doi.org/10.1016/j.heliyon.2024.e24337 ). This paper will focus on the construction of the database and analysis using R codes. Entire. Results In the first example, the urine concentration of monocarboxynonyl phthalate, monocarboxyoctyl phthalate, mono-2-ethyl-5-carboxypentyl phthalate, and mono-2-hydroxy-iso-butyl phthalate (all ng/mL) were used as the independent variable, instead of the serum concentration of perfluorooctanoic acid (PFOA), perfluorooctane sulfonic acid (PFOS), perfluorohexane sulfonic acid (PFHxS), and perfluorononanoic acid (PFNA), respectively. In the second example, the serum concentration of 2,3,3’,4,4’-Pentachlorobiphenyl (PCB105), 2,3,4,4´,5-Pentachlorobiphenyl (PCB114), 2,3’,4,4’,5-Pentachlorobiphenyl (PCB118), and 2,2’,3,4,4’,5’- and 2,3,3’,4,4’,6-Hexachlorobiphenyl (PCB138) were used as the independent variable, instead of the serum concentration of PFOA, PFOS, PFHxS, and PFNA, respectively. Discussion This research offers a comprehensive set of R codes aimed at creating a single, user-friendly variable that encapsulates the history of each type of cancer while also considering the age at which the diagnosis was made. The US NHANES provides a wealth of critical data on environmental toxicant exposures. By employing these R codes, researchers can potentially discover numerous new associations between environmental toxicant exposures and cancer diagnoses. Ultimately, these codes could significantly advance the field of cancer epidemiology in relation to environmental toxicant exposure.
Comparative Analysis of Impacted Mandibular Third Molar Root Proximity to the Mandibular Canal using Orthopantomography and Cone-beam Computed Tomography Imaging Modalities: A Pilot Study
Jigar Joshi, Bhavin Dudhia, Dhaval Mehta
et al.
Introduction:
Fully detect risks of nerve damage, which can lead to temporary or permanent issues. Cone-beam computed tomography (CBCT) offers a three-dimensional (3D) view, providing more detailed visualisation of anatomical structures and their spatial relationships, which improves the accuracy of predicting nerve exposure. The study aims to evaluate and compare these imaging techniques’ effectiveness in categorising the relationship between third molars and the inferior alveolar canal, emphasising the importance of precise imaging for safer surgical outcomes.
Materials and Methods:
A pilot study involving 20 patients, representing 10% of the total sample size of 200, was conducted at Ahmedabad Dental College’s Department of Oral Medicine and Radiology. Investigators, trained to interpret radiological images from orthopantomography (OPG) and CBCT, compared their interpretations with those of two experts. A high inter-rater reliability was confirmed with a kappa statistic of 0.98. Following ethical approval, data were retrospectively collected from 20 cases, with digital OPG and CBCT images analysed and classified according to established criteria.
Results:
The results revealed a significant association between the results diagnosed through OPG and CBCT indicating similarity in their diagnosis. It was also seen that there was no bias towards the gender and the distribution was similar in case of diagnosis through OPG or CBCT.
Conclusion:
CBCT demands an in-depth understanding of anatomy and pathology, coupled with proficiency in operating imaging software and the ability to identify abnormalities in cross-sectional images. When executed and interpreted accurately, CBCT proves to be an exceptionally valuable tool in clinical dental practice. Its detailed 3D imaging capabilities enhance the assessment of complex cases, such as those involving intricate anatomical structures and pathologies. By providing comprehensive views that surpass traditional two-dimensional imaging, CBCT aids in precise diagnosis and treatment planning, making it an indispensable resource for addressing various dental conditions effectively.
Bibliometric Analysis of Neurology Articles Published in General Medicine Journals
Mitch Wilson, M. Sampson, N. Barrowman
et al.
Key Points Question What are the publication patterns for neurology publications in general medicine journals, and how do they compare with other specialties? Findings In this cross-sectional bibliometric analysis of the top 5 most cited general medicine journals, the New England Journal of Medicine (NEJM) published more neurology articles than other journals. In the top 5 general medicine journals, there were more publications in neurology than in immunology, endocrinology, gastroenterology, or pulmonology. Meaning In this study, neurology articles were published most often in NEJM, and general medicine journals published more articles in neurology than in other medical specialties.
Telehealth Policy, Practice, and Education: a Position Statement of the Society of General Internal Medicine
Anders Chen, Mariam H Ayub, R. Mishuris
et al.
Telehealth services, specifically telemedicine audio-video and audio-only patient encounters, expanded dramatically during the COVID-19 pandemic through temporary waivers and flexibilities tied to the public health emergency. Early studies demonstrate significant potential to advance the quintuple aim (patient experience, health outcomes, cost, clinician well-being, and equity). Supported well, telemedicine can particularly improve patient satisfaction, health outcomes, and equity. Implemented poorly, telemedicine can facilitate unsafe care, worsen disparities, and waste resources. Without further action from lawmakers and agencies, payment will end for many telemedicine services currently used by millions of Americans at the end of 2024. Policymakers, health systems, clinicians, and educators must decide how to support, implement, and sustain telemedicine, and long-term studies and clinical practice guidelines are emerging to provide direction. In this position statement, we use clinical vignettes to review relevant literature and highlight where key actions are needed. These include areas where telemedicine must be expanded (e.g., to support chronic disease management) and where guidelines are needed (e.g., to prevent inequitable offering of telemedicine services and prevent unsafe or low-value care). We provide policy, clinical practice, and education recommendations for telemedicine on behalf of the Society of General Internal Medicine. Policy recommendations include ending geographic and site restrictions, expanding the definition of telemedicine to include audio-only services, establishing appropriate telemedicine service codes, and expanding broadband access to all Americans. Clinical practice recommendations include ensuring appropriate telemedicine use (for limited acute care situations or in conjunction with in-person services to extend longitudinal care relationships), that the choice of modality be done through patient-clinician shared decision-making, and that health systems design telemedicine services through community partnerships to ensure equitable implementation. Education recommendations include developing telemedicine-specific educational strategies for trainees that align with accreditation body competencies and providing educators with protected time and faculty development resources.
Assessing Foundation Models' Transferability to Physiological Signals in Precision Medicine
Matthias Christenson, Cove Geary, Brian Locke
et al.
The success of precision medicine requires computational models that can effectively process and interpret diverse physiological signals across heterogeneous patient populations. While foundation models have demonstrated remarkable transfer capabilities across various domains, their effectiveness in handling individual-specific physiological signals - crucial for precision medicine - remains largely unexplored. This work introduces a systematic pipeline for rapidly and efficiently evaluating foundation models' transfer capabilities in medical contexts. Our pipeline employs a three-stage approach. First, it leverages physiological simulation software to generate diverse, clinically relevant scenarios, particularly focusing on data-scarce medical conditions. This simulation-based approach enables both targeted capability assessment and subsequent model fine-tuning. Second, the pipeline projects these simulated signals through the foundation model to obtain embeddings, which are then evaluated using linear methods. This evaluation quantifies the model's ability to capture three critical aspects: physiological feature independence, temporal dynamics preservation, and medical scenario differentiation. Finally, the pipeline validates these representations through specific downstream medical tasks. Initial testing of our pipeline on the Moirai time series foundation model revealed significant limitations in physiological signal processing, including feature entanglement, temporal dynamics distortion, and reduced scenario discrimination. These findings suggest that current foundation models may require substantial architectural modifications or targeted fine-tuning before deployment in clinical settings.