The black-box nature of current artificial intelligence (AI) has caused some to question whether AI must be explainable to be used in high-stakes scenarios such as medicine. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias. In this Viewpoint, we argue that this argument represents a false hope for explainable AI and that current explainability methods are unlikely to achieve these goals for patient-level decision support. We provide an overview of current explainability techniques and highlight how various failure cases can cause problems for decision making for individual patients. In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.
a Department of Pediatrics, National Taiwan University Children’s Hospital and National Taiwan University College of Medicine, Taipei, Taiwan b Department of Pediatrics, New Taipei City Hospital, New Taipei City, Taiwan c Department of Pediatrics, Section of Infection, Taichung Veterans General Hospital, Taichung, Taiwan d Department of Pediatrics, Chang Gung Memorial Hospital and Chang Gung University College of Medicine at Linkou, Taoyuan, Taiwan e Department of Laboratory Medicine, National Taiwan University Hospital, National Taiwan University College of Medicine, Taipei, Taiwan f Department of Internal Medicine, National Taiwan University Hospital, National Taiwan University College of Medicine, Taipei, Taiwan
Chih-Wei Yang, Shang-Jyh Hwang, Bi-Cheng Liu, Jiang-Hua Chen and Vivekanand Jha Department of Internal Medicine, School of Clinical Medicine, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa; Department of Nephrology, Chang Gung Memorial Hospital, College of Medicine, Chang Gung University, Taoyuan, Taiwan; Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan; Institute of Nephrology, Zhong Da Hospital, Southeast University School of Medicine, Nanjing, China; Kidney Disease Center, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China; George Institute for Global Health India, UNSW, New Delhi, India; and Manipal Academy of Higher Education, Manipal, India
ContextStudies documenting racial/ethnic disparities in health care frequently implicate physicians’ unconscious biases. No study to date has measured physicians’ unconscious racial bias to test whether this predicts physicians’ clinical decisions.ObjectiveTo test whether physicians show implicit race bias and whether the magnitude of such bias predicts thrombolysis recommendations for black and white patients with acute coronary syndromes.Design, Setting, and ParticipantsAn internet-based tool comprising a clinical vignette of a patient presenting to the emergency department with an acute coronary syndrome, followed by a questionnaire and three Implicit Association Tests (IATs). Study invitations were e-mailed to all internal medicine and emergency medicine residents at four academic medical centers in Atlanta and Boston; 287 completed the study, met inclusion criteria, and were randomized to either a black or white vignette patient.Main Outcome MeasuresIAT scores (normal continuous variable) measuring physicians’ implicit race preference and perceptions of cooperativeness. Physicians’ attribution of symptoms to coronary artery disease for vignette patients with randomly assigned race, and their decisions about thrombolysis. Assessment of physicians’ explicit racial biases by questionnaire.ResultsPhysicians reported no explicit preference for white versus black patients or differences in perceived cooperativeness. In contrast, IATs revealed implicit preference favoring white Americans (mean IAT score = 0.36, P < .001, one-sample t test) and implicit stereotypes of black Americans as less cooperative with medical procedures (mean IAT score 0.22, P < .001), and less cooperative generally (mean IAT score 0.30, P < .001). As physicians’ prowhite implicit bias increased, so did their likelihood of treating white patients and not treating black patients with thrombolysis (P = .009).ConclusionsThis study represents the first evidence of unconscious (implicit) race bias among physicians, its dissociation from conscious (explicit) bias, and its predictive validity. Results suggest that physicians’ unconscious biases may contribute to racial/ethnic disparities in use of medical procedures such as thrombolysis for myocardial infarction.
Key Points Question Does the use of a large language model (LLM) improve diagnostic reasoning performance among physicians in family medicine, internal medicine, or emergency medicine compared with conventional resources? Findings In a randomized clinical trial including 50 physicians, the use of an LLM did not significantly enhance diagnostic reasoning performance compared with the availability of only conventional resources. Meaning In this study, the use of an LLM did not necessarily enhance diagnostic reasoning of physicians beyond conventional resources; further development is needed to effectively integrate LLMs into clinical practice.