LAYER 04 HBM MEMORY GB/s · TSVs · 12-high
How HBM gets to 2.5 TB/s.
Stacked DRAM, a base die, and thousands of IO lines add up to the bandwidth modern accelerators depend on.
2.5 TB/s
per stack
12-high
DRAM dies stacked
2 048
IO per stack
30%
of hyperscaler capex
GB/s · TSVs · 12-high
HBM bandwidth depends on how much memory interface fits around the package.
FIG. L04 · SIGNATURE HBM 2.5 TB/s DERIVATION
What this layer does
HBM matters because accelerators need more than raw arithmetic. They need weights and activations to arrive quickly enough to keep the compute busy. This layer explains where that bandwidth comes from and why it is difficult to keep increasing.
The stack looks dense because the design problem is dense: memory needs to be close, wide, fast, and packageable at the same time.
Read the bandwidth math
HBM reaches terabytes per second by multiplying width, not by chasing one magic number.
The signature figure gives the equation. This guide breaks the same answer into four physical ingredients, then shows why that result matters to an accelerator trying to stay busy.
≈ 2.5 TB/s
12
Capacity gets stacked vertically.
HBM grows upward for capacity. Stack height does not multiply the external IO width.
8
The stack is split into parallel lanes.
More channels let the accelerator move many words at the same time.
256
Each lane is unusually wide.
HBM wins by combining a short distance with an extremely wide interface.
10 GT/s
The wide bus also runs quickly.
Speed per line matters, but width is what makes the final bandwidth explode.
HBM is stacked DRAM placed right beside the accelerator so thousands of short TSV-linked paths can feed it at once. The result is not just more memory. It is more memory reaching the chip in time.
More arithmetic units raise the penalty when memory arrives late.
Longer prompts and KV caches increase repeated memory reads per token.
HBM still has to fit around the accelerator with enough interface shoreline.
Scarcity has made memory the highest-margin link
The 30% of hyperscaler capex now flowing into HBM does not just buy chips. It buys margins that look more like a frontier accelerator than a commodity memory. SK Hynix’s Q1 FY2026 release booked KRW 37.6 trillion of operating income on KRW 52.6 trillion of revenue — a 72% operating margin, and a record Q1 for the company.
Micron’s FQ2 FY2026 release shows the same shape from the other side of the Pacific: a 74.4% GAAP gross margin on $23.9 billion of revenue, with the next quarter guided to roughly 81%. Samsung’s 1Q26 Device Solutions segment pulled KRW 53.7 trillion of operating profit out of KRW 81.7 trillion of memory-led revenue.
Three suppliers, three filings, the same read: HBM scarcity is being priced like the bandwidth it unlocks, not like the DRAM it is built from.
HBM4 already pushes past 2.5 TB/s — the chapter’s anchor number is the floor, not the ceiling
On March 18, 2026, AMD and Samsung announced HBM4 for the AMD Instinct MI455X built on 1c DRAM and a 4 nm logic base die, with speeds up to 13 GT/s and per-stack bandwidth of 3.3 TB/s. The math is the same as the signature figure, only faster: 2 048 IO × 13 GT/s ÷ 8 = 3.328 TB/s.
The 2.5 TB/s in the signature was computed at 10 GT/s. The disclosed HBM4 ceiling on a real product is now meaningfully higher, and the base die is a logic chip — collapsing the memory/foundry boundary that the rest of the stack still treats as separate.
The scarcity is mechanical: HBM consumes DRAM wafers that smartphones also want
The bottleneck tracker has HBM Memory in the lead through 2025 and 2026 — not because suppliers can’t make memory, but because smartphones bid for the same DRAM wafers. SK Hynix tripled HBM prices in 2025 as long-context inference met sparse MoE and KV caches grew with every token.
New DRAM fabs are the only real fix. Capacity from 2025 fab decisions does not arrive in volume until late 2027. The scarcity isn’t artificial — it’s a wafer-allocation problem with a two-year minimum fix.
More compute is only useful when memory can feed it fast enough.