LAYER 05 ACCELERATOR DIE transistors · tFLOPS
Why modern accelerators are becoming multi-die systems.
The accelerator is no longer just one rectangle of silicon. It is a tightly integrated package of logic, memory interfaces, and die-to-die links.
2+
logic dies
HBM
co-packaged memory
D2D
local links
reticle
scaling limit
transistors · tFLOPS
Large accelerators are constrained by reticle area, package layout, and the yield of multi-die assembly.
FIG. L05 · SIGNATURE ACCELERATOR WHERE COMPUTE LIVES
What this layer does
Modern accelerators combine compute, cache, memory interfaces, and package constraints in one design problem. Multi-die assembly expands what the chip can do, but it also makes layout and yield harder to manage.
That shift matters because the word “chip” now hides a lot. Some leading accelerators are really multiple reticle-scale logic pieces, dense package interconnect, and large memory systems presented to software as one product.
Read the architecture
Modern accelerators scale by dividing the silicon, then rebuilding the illusion of one chip.
The signature figure compares finished products. This guide explains the architectural move underneath them: bigger AI accelerators are becoming coordinated assemblies of dies, memory, and local interconnect.
One die hits the reticle wall.
Once the useful accelerator outgrows what fits cleanly in one lithography field, scaling by “just make the die bigger” breaks down.
Chiplets recover manufacturability.
Splitting logic across multiple dies can improve yield and preserve architecture growth, but only if the package reunites them well.
Die-to-die links must feel local.
NVIDIA’s Blackwell Ultra uses two reticle-sized dies connected by a 10 TB/s die-to-die interface so software still sees one accelerator.
Memory and network keep score.
Google’s Ironwood exposes dual chiplets with dedicated HBM, while HBM bandwidth and scale-up links determine whether all that compute stays busy.
Blackwell Ultra
10 TB/s D2D · up to 8 TB/s HBM
Ironwood
7.38 TB/s HBM · D2D link 6× 1D ICI
Peak FLOPs alone hide memory, packaging, and software-exposed topology.
Dies per accelerator, HBM bandwidth, and die-to-die links tell a truer story.
The “chip” is now a small compute system packaged as one product.
The package now carries the whole market
NVIDIA’s Q1 FY2027 data-center revenue was $75.2 billion in a single quarter. That is roughly thirteen times AMD’s $5.775 billion data-center quarter (Q1 FY2026, up 57% year over year) and nearly nine times Broadcom’s $8.4 billion AI semiconductor quarter (Q1 FY2026, up 106% year over year).
The competitive set is also narrower than the technical menu suggests. Marvell’s FY2026 data-center revenue was $6.10 billion, or 74% of total company revenue, almost entirely custom silicon for a few hyperscalers.
Reticle limits forced multi-die assembly. Multi-die assembly then concentrated the work in the few firms that can finance interposer supply, HBM allocation, and substrate capacity in lockstep.
One wafer can also be the whole chip
The chapter’s argument is that reticle limits forced multi-die packaging. Cerebras is the live counterexample. The WSE-3 is a single 46,225 mm² die — one chip per wafer, with no interposer and no die-to-die links.
The public market just priced that bet. On May 15, 2026 Cerebras closed its Nasdaq IPO of 34,500,000 Class A shares at $185.00, for $6.4 billion of gross proceeds, and now trades as CBRS (Cerebras closing 8-K). FY2025 revenue was $510 million, up 76% year over year (Cerebras 424B4).
Multi-die assembly is still the dominant architecture. Wafer-scale is the listed exception.
The “not-NVIDIA” accelerator stack runs through two suppliers
TPU, Trainium, Maia, and MTIA are designed inside hyperscalers but fabricated outside them. Broadcom and Marvell are the public-market proxies for that custom-ASIC layer.
Broadcom’s Q1 FY2026 AI revenue was $8.4 billion, up 106% year over year, and the company guided Q2 AI semiconductor revenue to $10.7 billion (Broadcom Q1 FY2026 release). Marvell’s FY2026 data-center revenue was $6.10 billion, 74% of total company revenue, almost entirely custom silicon for a few hyperscalers (Marvell FY2026 10-K).
Each firm serves only a handful of hyperscaler customers. Two semiconductor companies have quietly built a multi-billion-dollar AI silicon business that scales with hyperscaler capex rather than with merchant GPU demand.
The accelerator is where many earlier constraints become one physical product.