How companies get AI wrong: a field map.
There's a new genre of horror story in operator group chats: the AI bill that outran payroll. A team automates a workflow to save a salary, and six months later the model invoice is bigger than the salary was — and the person is still there, because somebody has to check the AI's work.
We run AI in production, on real stores, every day. We pay these bills. So let us say plainly what most AI vendors won't: the horror stories are real, they are common — a widely-cited MIT study found roughly 95% of enterprise GenAI pilots produce no measurable P&L return — and almost all of them follow the same map. Here it is.
The seven failure modes
01The wrong altitude
A frontier model doing a cron job's work. Companies pay per-token prices for tasks a database query, a spreadsheet formula, or a fifteen-line script does deterministically, instantly, and for free — forever. If the task has one right answer and never changes shape, it isn't an AI problem. The most expensive sentence in automation is "let's just have the model do it."
02The demo mirage
The demo ran on ten happy-path examples and everyone clapped. Production met real customers — typos, edge cases, anger, ambiguity — with no evaluation suite and no failure plan. So a human cleanup crew quietly forms behind the AI, reviewing everything it touches. Now you're paying the model and the people, which is how an automation project increases headcount.
03Painting the rubble
AI bolted on top of broken operations. If the catalog data is wrong, the chatbot now tells customers wrong things — confidently, politely, at scale. Automating chaos doesn't fix chaos; it raises its velocity. The intelligence layer inherits whatever the data layer is, which is why the unglamorous work comes first.
04Token incontinence
This is the anatomy of the bill that beats payroll. The whole conversation history resent on every message. A two-hundred-page PDF stuffed into every prompt instead of referenced once. No prompt caching, so the same instructions are re-processed at full price thousands of times a day. Agent loops with no budget, retrying their way to the moon. None of this is the model's fault — it's plumbing nobody owned. The same workload, engineered with caching, routing, and bounded loops, routinely costs a tenth as much.
05Pilot purgatory
Thirty proofs-of-concept, zero in production. AI run by an innovation team that doesn't own a P&L, demoed quarterly to executives, touched by no operator. A pilot that never ships isn't cautious — it's a subscription to looking modern.
06The sacrifice play
Firing the people who held the judgment, then asking the model to have judgment nobody encoded. The fifteen-year ops manager knew which supplier lies about lead times and which customer is worth losing money on. Walk that out the door and the AI doesn't replace it — it confidently improvises in its absence. We wrote a whole essay on this one: AI is a multiplier, not a human sacrifice.
07Counting tokens, not outcomes
The dashboard tracks spend per month. Nobody can say spend per outcome — per order recovered, per listing fixed, per lead captured, per hour returned to the founder. Without a denominator, every AI budget is simultaneously too big and too small, and the project dies in the next cost review regardless of whether it worked.
The pattern under all seven: treating AI as a thing you buy, instead of an operation you run.
The honest playbook
- Deterministic first, AI second. Exhaust the boring tools — scripts, schedules, rules — and reserve the model for what genuinely needs language and judgment. The cheapest token is the one you never spend.
- Right-size, cache, and bound everything. Match the model to the task, cache what repeats, put hard budgets on every loop. Cost per request is an engineering choice, not a weather event.
- Fix the data before the intelligence. Clean, consistent, machine-legible catalogs and records. This is also, not coincidentally, what makes you visible to AI shopping agents — the same hygiene pays twice.
- Give it one owner who runs operations. Not a committee, not a lab. The machine reports to an operator, the way the warehouse does.
- Price the outcome, not the token. Our brand interview costs us a few dollars of inference and produces a strategy document and a qualified lead. We'd pay ten times that without blinking. That's the math that matters — outcomes per dollar, not dollars per month.
- Start where failure is cheap and reversible. Catalog monitoring before customer-facing chat. Compliance checks before pricing. Earn trust with the low-stakes wins, then advance.
None of this is exotic. It's the same discipline operators have always applied to a new machine on the floor: know what job it does, measure what it produces, keep a human responsible. The companies whose AI bills beat their payroll didn't fail at artificial intelligence. They failed at operations — and called it AI.