Emergency Medicine
If headlines are to be believed, AI is outpacing doctors left and right in the most doctorly activity there is: diagnosis. In a narrow sense, it’s true: If large language models are given a sanitized test question, they can outperform human doctors in deducing the “correct” answer.
However, neatly packaged case libraries — often over-weighted toward rare diagnoses that stump human pattern-matching — are no substitute for the messy realities of medicine. And diagnosing “zebra” cases isn’t the area where doctors have the most need for AI.
AI has the potential to solve our worsening physician shortage, but existing AI applications in healthcare are mostly limited to back-office tasks like documentation and coding. To truly revolutionize our ability to provide all patients with high-quality, affordable care, AI needs to play a role in the delivery of clinical care — but first we need to train and test it correctly.
Rather than jump straight to advanced skills like diagnosis, I believe clinical AI agents should be trained progressively. That means starting with foundational abilities such as bedside manner and history-taking, and then moving on to more complex tasks like forming clinical assessments. Among other reasons, stepwise progression will help human physicians learn to trust their AI counterparts — and AI-enabled care can’t scale without trust.
Fortunately, we already have a proven framework for giving novice agents progressively more responsibility in patient care: medical education. Healthcare educators have perfected the process of turning first-time doctors into self-sufficient clinicians, and we should tap into their expertise. As AI agents increasingly enter clinical workflows, they should be viewed less like simple software tools — or even like “co-pilots,” as one recent NYT op-ed argued — and more like medical trainees learning how to do the job of a physician.
It's clear why AI agents that ace test questions can stumble in real-life scenarios: Real medicine involves real people, with unscripted histories and nuanced social circumstances.
At the risk of oversimplification, clinical training can be divided into four stages, summarized by the acronym RIME: Reporter, Interpreter, Manager, and Educator. Each stage evaluates the trainee in a particular skill-set that is core to the eventual self-sufficient practice of medicine.
Reporter: Trainees first learn how to talk to patients — and, more importantly, how to listen to them. Patients don’t arrive with ready-made case reports. As with human trainees, we must evaluate an AI’s ability to uncover pertinent positives (the symptoms that guide us) and pertinent negatives (the symptoms a patient does not have, which can be equally important).
In addition to gathering the right information efficiently, AI should also be assessed on building trust and using an appropriate conversational tone. For example, a 2022 study found that patients gave AI agents higher empathy scores than human physicians in answering single patient questions. The questions came from a Reddit thread; similar studies should be replicated in more conversational settings across multiple care modalities (i.e., text, audio, and video).
Interpreter: Next, trainees learn to interpret a mix of subjective (patient-reported symptoms) and objective (exam findings, labs, imaging, etc.) information. They reconcile conflicting data — e.g., a patient who feels feverish despite having a normal temperature — and present it to a senior trainee or attending physician. A successful AI Interpreter must synthesize disparate pieces of data into a coherent patient summary, create a rank-ordered list of problems, and propose a precise differential diagnosis.
Manager: Trainees then learn how to act on all of that information. They discover that there’s rarely a single “right” way to manage a patient, and that beyond the biological complexities of illness lies an even more intricate social context. After all, a patient can’t take medication they can’t afford or attend appointments if they don’t have transportation. Just like med students, AI “trainees” must be rigorously tested on not only how to reach the correct clinical conclusion, but also how to persuade patients to follow through with recommendations — whether that means adhering to new medication, changing a behavior, or seeking additional therapy.
Educator: Finally, the student becomes the teacher. A fully competent physician mentors the next generation, and AI should be evaluated on its capacity to do the same thing. With infinite patience and superhuman pattern-matching, AI could, for example, review 1,000 patient charts written by human trainees, compare them with actual clinical outcomes, and identify the top opportunities for improvement.
Viewed through this lens, it’s clear why AI agents that ace test questions can stumble in real-life scenarios: Real medicine involves real people, with unscripted histories and nuanced social circumstances. The job of a physician is far more expansive than excelling at rare diagnoses in a medical trivia-esque format. It calls for strong conversational skills, high emotional intelligence, advanced reasoning to prioritize medical issues based on each patient’s unique situation, and ultimately the ability to craft an accurate assessment and convince a patient to follow it.
We have no doubt that AI will eventually exceed human performance in many of these dimensions. After all, AI systems have several advantages over us mortals: They have perfect memories, are always available, and can spend endless hours listening to patients. If we apply the same staged approach we use in medical education — starting with training proficient “Reporters” — we can unlock a future in which AI doctors are fundamentally strong, and most importantly, useful in the real world.