AI Won’t Replace Doctors

On August 15, 2025, OpenEvidence announced a feat that seemed to mark the dawn of a new medical era: its AI became the first in history to score a perfect 100% on the United States Medical Licensing Exam (USMLE). Not only did it answer every question correctly, it explained why its answers were right, citing articles from the New England Journal of Medicine and JAMA. In a profession built on tests, guidelines, and checklists, the message felt almost mythic: machines had matched, and even surpassed, the human physician on medicine’s ultimate examination.

But the triumph landed against a backdrop of more sobering stories. Just months earlier, a team of researchers at Harvard and Stanford put several leading AI models through 2,000 real clinical cases across multiple specialties. Here the exam prodigies stumbled. What looked brilliant on paper became brittle at the bedside. The models often missed subtle contextual clues, failed to integrate patient history with present symptoms, and in some cases recommended unsafe next steps. They were, as one researcher summarized, “excellent test-takers, but poor clinicians.” Anyone who has rounded on patients knows the feeling: the textbook answer collapses once the patient starts talking.

And then came the embarrassment for Google’s Med-Gemini. In a widely circulated preprint, the system diagnosed an infarct in the “basilar ganglia.” The problem? No such structure exists. The basal ganglia are real; the “basilar ganglia” are a hallucination, a non-existent anatomical invention. Astonishingly, the error slipped through layers of review appearing in the preprint, in figures, and even in Google’s official blog post. It was the sort of blunder no human neurologist, and indeed no medical student, would make. It wasn’t just wrong. It was the kind of wrong that makes every clinician wince.

These stories frame the paradox. AI is becoming uncannily good at explicit knowledge: the kind tested on standardized exams, the kind that can be codified in rules, tables, and facts. But clinical practice also depends on tacit knowledge: the background know-how and embodied sense of plausibility that guides judgment. A neurologist doesn’t just “know” that there is no basilar ganglia; she feels its absence, grounded in years of lived immersion in clinical talk and practice. That tacit backdrop is the foundation of clinical judgment.

So the contrast is striking. OpenEvidence shows what AI can achieve when trained rigorously on explicit medical knowledge. The Harvard/Stanford and Med-Gemini cases remind us of what it still lacks: the tacit wisdom, cultivated in conversation and practice, that turns knowledge into safe, humane medicine.

The Knowledge We Can Test vs. the Knowledge We Live

In the late 19th century, Sir William Osler famously told his students: “Listen to the patient. He is telling you the diagnosis.”

On its face, this advice sounds obvious, even quaint. But it was Osler’s way of marking a distinction that runs deep in medicine: the difference between the knowledge we can codify and test, and the knowledge that emerges only in lived experience.

Explicit knowledge fills textbooks and exams. It is the anatomy of the basal ganglia, the pharmacokinetics of digoxin, the diagnostic criteria for lupus. This is the kind of knowledge an AI like OpenEvidence can master spectacularly, as its perfect USMLE score demonstrated.

But Osler’s point was that medicine is not reducible to that. The real art of diagnosis often depends on tacit knowledge: the ability to hear in a patient’s halting description the pattern that isn’t obvious; to notice a subtle hesitation that signals depression rather than fatigue; to feel that the story “doesn’t fit” even when the labs look normal.

Tacit knowledge is acquired the way Osler taught, at the bedside, in apprenticeship, through immersion in the language and rhythms of practice. It is why two doctors can listen to the same heart and one hears only “a murmur” while the other senses, almost bodily, the cadence that means danger. It is why a seasoned internist, after decades of hearing patients describe chest pain, can distinguish “typical angina” from something stranger before the ECG is even printed.

This is also why the Med-Gemini hallucination was so jarring. A human neurologist would not simply “recall” anatomy but would sense, tacitly, that “basilar ganglia” was implausible, that no such phrase had ever echoed through lectures, rounds, or case conferences. Tacit knowledge acts as a plausibility filter. Machines, brilliant though they are at explicit recall, lack that backdrop.

From Osler to the Types of Knowledge

What Osler intuited in the clinic, philosophers of knowledge have been mapping for decades. To understand why AI dazzles in some settings and stumbles in others, we need to distinguish between explicit knowledge, tacit knowledge, and wisdom.

Explicit knowledge is the kind we can codify, teach, and test. It lives in anatomy diagrams, diagnostic criteria, and treatment guidelines. It is “knowing that” something is true. An AI can be trained on millions of pages of this material and reproduce it flawlessly, which is why it can ace the USMLE.

Tacit knowledge, by contrast, is the “know-how” that resists easy capture. Michael Polanyi described it with the phrase, “We know more than we can tell.” It includes the embodied skills of a surgeon’s hand, the intuitive sense of plausibility a neurologist brings to anatomy, or the seasoned ear that distinguishes benign from malignant murmurs. Tacit knowledge is social and embodied. It lives in communities of practice, not just in books. This is what Osler meant when he told his students to listen: the diagnosis is not simply in the facts, it is in the lived exchange.

And above both lies wisdom. Wisdom is not simply more knowledge; it is knowledge placed in context, weighed against values, and directed toward the good. In medicine, wisdom is deciding not only what can be done, but what should be done. It is balancing treatment against quality of life, or choosing when to reassure rather than to intervene. As philosopher Russell Ackoff framed it, wisdom is the top of the ladder that runs from data, to information, to knowledge, to wisdom.

Why We’re Still Smarter

Osler’s lesson, and the taxonomy of knowledge that followed, highlight the paradox of AI in medicine. Machines can dominate the explicit domain, but they falter where tacit judgment and wisdom are essential. The failures of Google’s Med-Gemini and the struggles on real-world cases weren’t anomalies; they revealed the ceiling of systems trained only on codified facts and structured outcomes.

What medicine actually runs on is more layered. It is the explicit knowledge of textbooks, yes, but also the tacit sense of plausibility built through experience, and the wisdom to weigh competing goods in the face of uncertainty. These are not neatly reducible to rules or test answers. They live in practice, in culture, in values.

This is why even small differences in knowledge can make enormous functional differences. A chess master sees patterns an amateur cannot, though both know the rules. Humans and chimpanzees share almost all their DNA, yet the tiny margin has meant language, art, and civilization. In medicine, the leap from memorizing criteria to sensing that a patient’s story doesn’t fit is of the same order: a narrow margin in knowledge, but a gulf in judgment and safety.

AI, as it stands, cannot cross that threshold on its own. It can simulate tacit reasoning when trained well, but it does not embody it. It can retrieve knowledge, but it cannot feel implausibility in its bones. Wisdom, the highest tier, remains out of its reach entirely.

If history is a guide, the deepest advances in medicine have come not from replacing the physician, but from augmenting her senses: the stethoscope amplifying the chest, the MRI opening the brain. AI is another such tool. Its destiny is not to stand in for clinical judgment, but to extend our reach, making the explicit more accessible, while leaving the tacit and the wise where they belong: in human hands.

AI Won’t Replace Doctors

The Knowledge We Can Test vs. the Knowledge We Live

From Osler to the Types of Knowledge

Why We’re Still Smarter

Leave a Reply Cancel reply