The End of the AI Subsidy Era and the Rise of ROI Per Workflow
May 30, 2026
First, I wanted to thank all of you for embarking on the What’s 🔥 journey with me as we just hit issue #500 🤯!
That’s 500 weeks in a row. So many times I was wiped out, sick, traveling, or just exhausted, but I kept plugging away because writing this newsletter has been invaluable for me personally.
It started simply as notes to myself shared with friends, and over time it evolved into something much more meaningful. I now find it to be a cathartic way to reflect on the week, synthesize my thoughts, connect trends, and think longer term about where the world and technology may be heading.
Thank you all for reading, sharing, and being part of this journey. Here’s to many more issues ahead!
Now back to the regularly scheduled programming…
Ok all, let’s not overreact to every single AI related news announcement. This week it was all about tokenomics and insane costs. As always the reality is more nuanced.
Measuring tokens consumed is measuring effort, not output, and it’s the same mistake enterprises made a decade ago measuring lines of code or hours logged. You get what you measure, and if you measure tokens, you’ll get engineers finding creative ways to burn them.
Here’s the real story. We’re entering Phase 2 of enterprise AI.
Phase 1 was the subsidy era, where frontier labs aggressively absorbed costs to drive adoption and enterprises experimented freely because tokens were effectively subsidized.
Phase 2 is the consumption era, where vendors charge real money, CFOs ask harder questions, and procurement re-enters every renewal discussion.
The winners from here will:
intelligently route workloads across models
lean into open source for the 80% of workloads that don’t require frontier intelligence
reserve expensive inference for workflows that actually move revenue, efficiency, or customer outcomes
This reminds me a bit of how overblown the “Claude Mythos destroys cybersecurity” narrative became earlier this year. Immediate market reactions tend to overshoot reality as Palo Alto Networks, for example, has already bounced back in stock price!
Gary Marcus shares a few more takes from this tokenomics discussion.
The important point many are missing from Gary Marcus’ comments is that he’s not saying agents don’t work. He’s saying enterprises will become more disciplined in how they measure ROI and deploy AI systems. The conversation shifts from token consumption to business outcomes.
And despite all the noise, token consumption itself is still exploding.
The real question is: where does the value accrue?
Regardless of routing strategy, GPU and compute demand still compounds. Look at this chart from Goldman Sachs showing insane token growth in the next few years at 24x!
Sierra recognized this shift early by focusing on outcomes-based pricing rather than token-based pricing.
What’s becoming increasingly clear is that the best enterprise AI companies are already building:
routing layers
swappable model infrastructure with an emphasis on open weight
proprietary orchestration
hybrid deployment architectures
on-prem support for sensitive enterprise environments
Some are even deploying open-weight models on-prem for deep customization and control before selectively invoking frontier models when necessary. Harvey, for example, has used its insane growth to build a powerful proprietary data flywheel and is showing how this gets done.
This is exactly where proprietary enterprise context becomes incredibly valuable.
As open-weight models improve, we’ll increasingly see more and more software vendors and enterprises train and customize models directly on private internal data, routing across a mixture of open and frontier systems depending on sensitivity, latency, and economics.
That’s exactly what Larry Ellison has been talking about.
#as I’ve written before - What’s 🔥 #498, larger companies are not only using a constellation of models (SOTA, open source) for token costs but also for creating differentiation - why give my employees brains to a model when I can have my own - this will be a bigger and bigger opportunity over time and also may offset billable hours to outcome based pricing?
#🤯 insane growth for inference providers who don’t own their own GPUs - thinner margins but slick software on top to optimize inference, etc - all of them growing 📈