LAYER 09 AI LAB → MODEL tokens / s · gross margin
How tokens inherit the cost of everything underneath them.
The lab is where the hardware supply chain turns into a user-facing service, with compute cost flowing into every token served.
tokens / s
native unit
~70%
inference gross margin
3
compute regimes
$ → ¢
cost flow
tokens / s · gross margin
At the end of the supply chain, token economics are shaped by inference efficiency and compute cost.
FIG. L09 · SIGNATURE AI LAB WHERE IT BECOMES A SENTENCE
What this layer does
The AI lab is where the physical chain becomes visible to users. A prompt turns into work on hardware, then into tokens returned over an API or product surface. This layer ties the economic story back to the machine story.
The frontier labs trade through their landlords
Public investors cannot buy OpenAI or Anthropic directly. The two largest commercial compute buyers reach the tape only through Microsoft and Amazon, whose balance sheets carry the GPUs the labs rent.
The ratios are lopsided. Anthropic sits at roughly $25 B ARR (per Reuters, late 2025) against the $8 B Amazon has put in and a custom Trainium fleet built around the partnership. Microsoft’s FY26 Q3 prepared remarks guide calendar-2026 capex toward $190 B, much of it Azure capacity for OpenAI alone.
The labs book the revenue line. The hyperscalers book the depreciation. The token at the top of this chapter is a public-market security, indirectly.
A token is a fraction of a watt-hour, a bandwidth-second, and a depreciation slice
The chapter’s native unit is tokens per second. The cost recovery unit underneath it is roughly 70% inference gross margin — the spread that compounds into the next training run. Anything below that line is borrowed time on someone else’s balance sheet.
Trace a single token downward. It draws a fraction of a watt-hour at the substation and the rack, a fraction of an HBM-bandwidth-second across the package’s TSVs, and a fraction of a Hopper-hour off the cloud’s depreciation schedule. Every layer of this site shows up as a sliver in the cost of one sentence.
Three compute regimes share that stack: frontier training, fine-tune, and online inference. The hardware is the same; only the duty cycle and the margin differ. Gross margin on inference is what tells you whether the training run that produced the model was worth doing.
Every frontier model has a public-market landlord
Microsoft and Amazon are not the whole landlord list. Alphabet captive-funds DeepMind off $180–$190 B of FY2026 capex (Q1 2026 earnings call). Meta funds FAIR and the Llama family off $125–$145 B of FY2026 capex (Q1 2026 outlook), most of it the Hyperion campus build.
Oracle is the third leg. Its remaining performance obligation reached $553 B at FY26 Q3, up 325% year over year (March 2026 release), almost entirely the Stargate consortium with OpenAI, SoftBank, and MGX. Microsoft’s commercial RPO now stands at $627 B (FY26 Q3).
Outside the US, SoftBank funds Arm and Stargate from the same balance sheet, and Alibaba pushes RMB126B of FY2026 capex into Cloud Intelligence to host Qwen (May 2026 results). Combined, the four US hyperscalers’ ~$600 B in 2026 capex is roughly the entire public-market AI-lab proxy. The labs run the inference; the tape owns the iron.
Tokens are the final output, but their cost was shaped all the way down the chain.