Skip to content
On BuildingOperating Model

Pilot Purgatory Is an Operating-Model Failure, Not a Data Problem

"The data wasn't ready" is the universal cause of death for AI pilots. It is wrong almost every time. Pilots die in the gap between "it works" and "we changed how the work happens."

Argument in three

  1. 1

    "The data wasn't ready" is the universal cause of death on the AI certificate — a diagnosis that protects the most people, and is wrong almost every time.

  2. 2

    Pilots run in a controlled reality. Production fails on the operating-model question nobody scoped: who owns the decision now, and what happens to the role that owns it today.

  3. 3

    The test that fits on an index card: show the kill list. What gets turned off when this works — which process is retired, which report dies, who decides differently.

Raj Bhatia · April 15, 2026 · 5 min read · 800 words

Every institution I talk to has the same graveyard. A row of AI pilots that worked — genuinely worked, in the demo — and never made it to production. And every institution has the same explanation ready: the data wasn't clean enough. Data quality is the universal cause of death on the AI certificate. I think it is wrong almost every time.

I have stood on both sides of this. I have led transformation programs that reached production and changed how thousands of people worked, and I have advised on programs, at banks and insurers, where the pilot was a triumph and the rollout quietly died. The difference was never the technology and rarely the data. It was whether anyone had built the thing a pilot doesn't need and production can't survive without: an operating model.

Why Data Gets the Blame

"The data wasn't ready" is a comfortable diagnosis because it is nobody's fault and everybody's problem. Fixing enterprise data is a multi-year program that belongs to a different team, with a different budget, on a different floor. Declare data the villain and everyone in the room gets to agree the pilot was sound, the strategy was right, and the failure was structural. It is the diagnosis that protects the most people. That is exactly why I distrust it.

Pilots run in a controlled reality. The inputs are curated. The users volunteered. A senior sponsor is watching, so the edge cases get hand-waved and the exceptions get escalated to a human who happens to be paying attention. Production has to survive the real world — the exception nobody mapped because nobody knew it existed, the handoff between two teams that have never shared an incentive, and the question no pilot ever has to answer: when this output is wrong, who is accountable, and what happens to the person whose job it used to be to get it right?

The Question That Kills Pilots

That last question kills more pilots than anything else, and it has nothing to do with data. A pilot can output a recommendation. Production requires that somebody act on it, and the moment a real decision rides on the model you are no longer in a technology project. You are in an operating-model project, and almost nobody scoped it that way. Who owns the decision now? What happens to the role that owns it today? What is the escalation path when the model is confidently wrong? If those questions get a shrug and "we'll work that out if the pilot succeeds," the pilot has already failed. The invoice just hasn't arrived.

The transformation I led that worked best — years before anyone said AI — did not work because we added something clever. We removed things. We retired more than 25 legacy applications and killed more than 700 reports. The old way of working was not asked to coexist politely with the new one. It was taken away. People worked in the new model because the old model no longer existed to retreat into. And the capacity we freed was real, because the work that consumed it was genuinely gone, not merely accelerated. That is the heart of every productivity gain that ever reached a bottom line, and it lives in the operating model, not the technology stack.

The Index-Card Test

So here is the test I would apply to any AI program before it leaves the lab. It fits on an index card. Show me the kill list. What gets turned off when this works — which process is retired, which report dies, which step stops existing? Then: who owns the decision this model informs, and what happens to the role that owns it today? Honest answers to those two questions tell you, before another dollar goes in, whether you have a transformation or a science experiment. A program that can name what it will remove and who will decide differently has a real chance. A program that can only describe what it will add has already chosen its plot in the graveyard.

This matters beyond any single pilot because the institutions clearing the production gap do not have better technology or cleaner data. They are the ones willing to do the organizational work first — to decide what gets removed and who decides differently before they fall in love with the demo. Everyone else runs pilots in a controlled reality and is surprised when the real one rejects them. The controlled reality was always the easy part.

The demo proves the technology works. That was never in much doubt. What the demo cannot prove, and what almost nobody builds until it is too late, is that an operating model exists for the technology to live inside. That is where pilots die. Not in the data. In the gap between "it works" and "we changed how the work happens."

I advise financial institutions on the problems these essays describe — diagnosing and redesigning how organizations actually run. If this is the conversation you're having internally, it's worth 30 minutes.

Schedule a conversation

About the Author

Raj Bhatia writes on AI and the operating models that decide whether it works — drawn from 25 years building and refining functions inside GE Capital, Moody's, Deloitte, and Code and Theory. Founder of SigmaArc.