Towards Conversational Diagnostic AI

At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for
accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable
of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating
clinicians’ expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical
Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling
learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating
clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management
reasoning, communication skills, and empathy. We compared AMIE’s performance to that of primary
care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with
validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study
included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison
with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater
diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of
26 axes according to patient actors. Our research has several limitations and should be interpreted with
appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale
LLM-patient interactions but is not representative of usual clinical practice. While further research is
required before AMIE could be translated to real-world settings, the results represent a milestone towards
conversational diagnostic AI.