Tokens Aren't Going to Zero — Here's the Math the Industry Is Skipping

There’s a phrase I keep hearing in boardrooms, on earnings calls, and in the LinkedIn confidence of people whose job is to make AI sound inevitable: “tokens will trend to zero.”

It gets said with the certainty of a thing that’s already happened. It’s the foundation under strategy decks, multi-year vendor commitments, and the case for replacing engineering organizations with autonomous agents (a trade I covered earlier this year in The Tokens-for-Engineers Trade Has Three Holes). It shows up in board presentations as a downward-sloping line that nobody is asked to defend, because everyone in the room has agreed in advance that it’s true.

And it’s a forecast nobody currently making it can actually defend.

I’m not predicting that token prices will rise. The argument here is narrower: the prediction that they’ll fall to zero, or even stay close to today’s levels through the rest of the decade, doesn’t hold up against the structural facts of the system producing those tokens. The people drawing the line down haven’t actually shown their work. The math underneath the line pushes the other direction.

The cheap-AI consensus is skipping three pieces of math. Capacity isn’t built, and nobody can tell you when it will be. The labs propping up today’s prices are burning cash, and they can’t burn forever. The famous price curve from GPT-4 to GPT-4o has already broken at the frontier. Any one of those facts should slow a leader down. Three of them together make “cheap AI as a planning constant” a strange thing to commit to.

A few months back I wrote about what happens when your AI bill becomes your biggest line item. That post was about cost. This one is about the assumption underneath the cost, the one that lets every cheap-AI plan stay cheap on paper.

The Capacity Isn’t Built

Cheap-AI forecasts assume that compute capacity expands fast enough to keep pace with demand, or to outrun it. That’s a claim about the physical world. The physical world isn’t cooperating.

Start with Stargate. Announced from the White House in January 2025 with a headline of $500B over four years. The actual constructed footprint at the time of the announcement was a single site in Abilene, Texas, roughly 1.2 GW of a stated 10 GW commitment. The other 90% was a press release. Meta’s Hyperion campus in Louisiana, the largest single announced AI buildout outside of Stargate, has its first phase scheduled for 2027 and a full buildout extending to 2030 on Meta’s own published timeline. The biggest AI infrastructure announcements of the last 24 months are, by the announcers’ own scheduling, multi-year delivery curves with the bulk of capacity landing past the horizon every cheap-AI forecast is using.

The biggest cloud capex spender in the world has been pulling back. TD Cowen flagged it in early 2025, with Bloomberg and The Information corroborating: Microsoft had let lapse or paused leases on more than 2 GW of datacenter capacity, mostly across the US and Europe. The reason matters. Microsoft wasn’t easing off because demand softened. It eased off because it couldn’t get power to the sites it had already committed to. When the largest buyer in the market can’t solve the upstream bottleneck inside its own balance sheet, the bottleneck is real.

Then there’s the grid. ERCOT’s interconnection queue for new large loads sits at 150 to 180 GW, with the largest connections waiting 3 to 5 years. PJM’s queue looks similar. Large power transformers, the boring physical hardware that has to sit in a substation before a megawatt is useful, run 2 to 4 years from order. The IEA’s Energy and AI report has global datacenter electricity demand roughly doubling by 2030 to around 945 TWh, with AI driving more than half the increase. CBRE put North American datacenter vacancy at roughly 1.9% in H1 2025. That kind of vacancy historically precedes pricing pressure in every real-asset market on earth. Northern Virginia, the largest market in the United States, has been preleasing 80%+ of its 2025-2027 pipeline before construction finishes. The practical reality: no spot capacity for new AI workloads in the biggest US market until at least 2028, and that’s assuming everything currently announced actually delivers.

AI capacity will get built. The question, the one nobody answers, is when. Nobody can credibly tell you when 945 TWh of new capacity arrives, what the delivered-megawatt schedule looks like quarter by quarter, or which announced gigawatts actually make it from press release to powered-on rack. Not the lab CEOs and not the hyperscaler CFOs, and certainly not the analyst pulling forward a TAM chart on a podcast. The queue is opaque. The upstream bottleneck sits outside any single company’s control. Confident pricing forecasts built on uncertain capacity are guesses with conviction.

The people promising abundance own the burden of proof here. The receipts say constraint.

The Labs Can’t Burn Cash Forever

The second piece of math is the simplest, and the one that should reach the CFO seat fastest.

Today’s token prices are subsidized. The labs producing them are selling below cost, sometimes by meaningful margins, and the gap is filled with investor capital. Investor capital is finite by definition.

OpenAI did about $3.7B in revenue in 2024 against reported losses around $5B. Losses ran wider than revenue. The $40B round the company closed in the first half of 2025 at a roughly $300B valuation, the biggest private fundraise in history, was needed to keep operating against the current cost structure. The Information reported internal projections through 2025 that pushed break-even out to 2029. That projection assumes continued revenue scaling and no major shocks on the cost side over four years.

Anthropic’s economics, smaller in absolute terms, follow the same shape: revenue growing fast, valuation stepping up repeatedly over the last 18 months, unit economics dependent on capital inflows continuing through the build phase. Microsoft’s AI segment is reported as running deeply negative against the compute cost it’s deploying. xAI burned over a billion a month at peak training in 2024-2025, funded through repeated private injections from Musk’s network and Middle Eastern sovereign wealth.

The argument is more boring than it sounds. I’m not predicting that capital markets pull the plug on AI. They might, they might not. The timing isn’t something I or anyone else can call with confidence. I’m not saying the subsidy ends in any specific quarter either. The point is simpler. You can’t sell a thing below its cost of production indefinitely. Prices reset toward the actual cost of producing intelligence, eventually. Or the businesses doing the producing stop existing in their current form. Possibly both.

“Cheap AI forever” implicitly assumes the burn never ends. It assumes the labs keep raising indefinitely, or they reach profitability at today’s price points despite the current evidence pointing the other way, or the underlying cost curve falls faster than usage grows (which I’ll come back to in the next section). That’s three assumptions stacked, each one carrying its own evidence burden. The leadership question is whether any of those bets is confident enough to sit in a 5-year strategy deck as a constant.

I’d argue it isn’t. Serious operators have never built a five-year plan around the assumption of paying below cost forever, not in cloud, not in software, and not in any market that’s tried subsidized pricing as a permanent strategy. Starting that experiment with intelligence would be strange.

The Frontier Price Curve Has Already Broken

The cheap-AI consensus rests almost entirely on a single visual: the inference price chart. The one showing GPT-4 at $30/$60 per million input/output tokens in March 2023, GPT-4 Turbo at $10/$30 by November of that year, GPT-4o at $5/$15 by mid-2024. A 6x decline at the output tier in 14 months. Extrapolate that line and tokens do, eventually, trend toward zero.

What the chart leaves out matters more than what’s on it.

Each step on that curve was a previous-generation model getting commoditized. GPT-4 didn’t get cheaper. It got replaced by a new, cheaper model in a new family. The distinction looks pedantic on a chart, but it changes the planning math underneath the chart. The model your CFO budgeted off in 2023 did in fact become cheaper over time. The model your competitors are using to outpace you in 2026 isn’t the same model. Frontier-to-frontier prices haven’t followed the previous-to-previous price step.

Look at the top of the curve, not the middle.

Anthropic’s Opus 3, the frontier reasoning model of early 2024, was priced at $15 input and $75 output per million tokens. The Sonnet line came in much cheaper, but Sonnet is a different and smaller model class. That’s the comparison the cheap-AI chart wants you to anchor on. The frontier-to-frontier comparison, Opus 3 to Opus 4 to whatever sits at the premium tier today, hasn’t been a 6x decline. Premium-tier output pricing has held flat or stepped up as model classes evolved. OpenAI shows the same pattern: GPT-5’s premium output tier reportedly prices at or above GPT-4o’s premium output tier, not below it. Reasoning models, agentic-tuned models, and long-context premium tiers across providers are generally priced above the chat-grade frontier of 12 months earlier.

The honest read is that the cheap end of the market gets cheaper while the expensive end gets more expensive. That’s the standard pattern of every maturing technology market. Last year’s premium becomes this year’s commodity, and this year’s premium is priced at what the market will bear. The chart shows a normal segmentation curve. It doesn’t show tokens trending to zero. Pretending it does is how strategy decks end up wrong.

The CFO question, and this is where the chart breaks the spreadsheet, is which end of that curve your organization is actually buying from. If you’re running production agentic workloads on the current frontier reasoning model, you’re buying from the expensive end, and that end is moving up. The “cheap tokens” everyone is pointing at exist. They’re for last year’s model doing last year’s tasks. The workloads that move competitive advantage forward (long-context reasoning, tool use, multi-step agentic workflows) live at the top of the curve, where the price line is heading the other way.

You can build a real strategy on commodity tokens. It just becomes a strategy at the level of last year’s frontier, which may or may not be the strategy you meant to commit to.

What the Realistic Case Looks Like

None of this is an argument against using AI. Refusing to deploy frontier models in 2026 is its own kind of management malpractice, and I’ve said as much before. The argument here is against betting your strategy on a forecast nobody can defend. Cheap AI sits in the same planning category as “demand always grows” and “rates stay low forever” and “this time is different.” Each of those has ended careers. Cheap AI is shaping up to end the next batch.

The realistic case looks like this:

Capacity is constrained for the rest of this decade. Plan for it. Negotiate multi-year inference commitments where you can. Vendors are now offering reserved-capacity pricing meaningfully better than spot, and the spread between reserved and spot is going to widen as supply tightens. Build relationships with multiple providers across multiple geographies, so you have substitution options when one region’s grid gets tight or one vendor’s allocation gets squeezed.

Subsidized pricing is a temporary condition. Budget for token spend as a permanent and growing operating cost, not a transitional one. Stand up FinOps discipline around AI spend now, while the line item is still small enough to instrument cleanly. Track tokens per outcome (an accepted PR, a processed ticket, a resolved query) rather than just total spend. The organizations with that telemetry already in place will know which workloads to defend and which to cut when the repricing event lands. The ones without it will be staring at a single unbroken-down line that’s tripled.

The frontier costs what the frontier costs. Architect for substitutability. The model you’re committed to today shouldn’t be the only model that can run your workload. Build prompt and orchestration layers that can swap between providers and tiers, including dropping down to open-weights models for workloads where commodity-grade is good enough. The leverage isn’t in the cheapest model, it’s in being able to move when the price changes.

The leadership move. Stop letting “tokens will trend to zero” do heavy lifting in your strategy deck. Strike it. Replace it with the assumption that intelligence has economics, and operate as if those economics are real. That one change to the underlying assumption rewires every downstream decision: vendor contracts, headcount choices, architectural commitments, runway math. It also separates you from the leaders who are about to discover that the curve they planned on doesn’t bend the way they were told.

The Bottom Line

The industry has made one confident, repeated, load-bearing claim: tokens trend to zero. That claim doesn’t survive contact with the structural facts of the system producing tokens. Capacity to support cheap, abundant AI isn’t built, and nobody can credibly promise the timeline the forecast requires. The capital underwriting today’s prices is finite. The part of the price curve everyone points at to prove the forecast (the falling middle of the inference chart) has already broken at the frontier, where the workloads that actually matter live.

You don’t need to predict when prices rise. The point is just that the case for them falling to zero is being made on credit, and letting that case do strategic work in your planning is a choice you can stop making.

The leaders who get through the next two years are the ones already pricing intelligence honestly. They’ve built leverage and substitutability into their plans. They refused to commit the organization to a future they couldn’t verify. Forecasts get revised. The leaders who never planned on the unrevised version don’t have to walk anything back.

That’s the math being skipped. The confident people drawing the line to zero aren’t doing it defensibly, and the earnest people planning on top of that line aren’t either. The realistic case is harder to put on a slide and less fun to present. It’s the one I’d bet a 5-year plan against, given the chance.

Most strategy decks I’ve seen this year have the cheap-AI line in them, drawn with confidence, projected out to 2030. Ask the person who drew it where the gigawatts come from. Watch what happens.