Skip to content

LAYER 04 HBM MEMORY GB/s · TSVs · 12-high

Last revised · MAY 13, 2026 binding · today

How HBM gets to 2.5 TB/s.

Stacked DRAM, a base die, and thousands of IO lines add up to the bandwidth modern accelerators depend on.

2.5 TB/s

per stack

12-high

DRAM dies stacked

2 048

IO per stack

30%

of hyperscaler capex

Native unit

GB/s · TSVs · 12-high

What constrains it

HBM bandwidth depends on how much memory interface fits around the package.

FIG. L04 · SIGNATURE HBM 2.5 TB/s DERIVATION

Fit overview · pinch to zoom

PHYSICAL STACK · 13 DIES PER HBM4
D12
DRAM · 2 GB
D11
DRAM · 2 GB
D10
DRAM · 2 GB
D09
DRAM · 2 GB
D08
DRAM · 2 GB
D07
DRAM · 2 GB
D06
DRAM · 2 GB
D05
DRAM · 2 GB
D04
DRAM · 2 GB
D03
DRAM · 2 GB
D02
DRAM · 2 GB
D01
DRAM · 2 GB
HBM4 BASE DIE
logic · I/O controller · 2 048 IO
16 channels × 128 IO lines per channel = 2 048 IO lines total
BANDWIDTH FORMULA · WIDTH × RATE ÷ ENCODING
WIDTH
2 048
IO lines
from the figure
×
RATE
10
GT/s per pin
HBM4 spec
÷
ENCODING
8
bits per byte
unit conversion
=
2 560 GB/s
≈ 2.5 TB/s per HBM4 stack
WHAT THIS BUYS YOU
A Rubin GPU pulls roughly 20 TB/s from its 8 HBM4 stacks. That is enough memory bandwidth to read a 100-billion-parameter model from HBM about a hundred times every second — the operation that produces a single token. Memory bandwidth, not compute, is what limits how fast a model can think.
HBM4 stack: 12 DRAM dies for capacity, but bandwidth comes from the external interface: 8 channels × 256 IO/channel = 2 048 IO. At 10 GT/s, ÷ 8 bits/byte = 2.5 TB/s. Orange marks the IO bus.

What this layer does

HBM matters because accelerators need more than raw arithmetic. They need weights and activations to arrive quickly enough to keep the compute busy. This layer explains where that bandwidth comes from and why it is difficult to keep increasing.

The stack looks dense because the design problem is dense: memory needs to be close, wide, fast, and packageable at the same time.

Read the bandwidth math

Read bandwidth as stacked multiplication

HBM reaches terabytes per second by multiplying width, not by chasing one magic number.

The signature figure gives the equation. This guide breaks the same answer into four physical ingredients, then shows why that result matters to an accelerator trying to stay busy.

8 × 256 × 10 ÷ 8

Per stack

≈ 2.5 TB/s

12

DRAM dies

Capacity gets stacked vertically.

HBM grows upward for capacity. Stack height does not multiply the external IO width.

8

channels

The stack is split into parallel lanes.

More channels let the accelerator move many words at the same time.

256

IO / channel

Each lane is unusually wide.

HBM wins by combining a short distance with an extremely wide interface.

10 GT/s

line rate

The wide bus also runs quickly.

Speed per line matters, but width is what makes the final bandwidth explode.

Plain English

HBM is stacked DRAM placed right beside the accelerator so thousands of short TSV-linked paths can feed it at once. The result is not just more memory. It is more memory reaching the chip in time.

Compute pressure

More arithmetic units raise the penalty when memory arrives late.

Context pressure

Longer prompts and KV caches increase repeated memory reads per token.

Package pressure

HBM still has to fit around the accelerator with enough interface shoreline.

The 30% of hyperscaler capex now flowing into HBM does not just buy chips. It buys margins that look more like a frontier accelerator than a commodity memory. SK Hynix’s Q1 FY2026 release booked KRW 37.6 trillion of operating income on KRW 52.6 trillion of revenue — a 72% operating margin, and a record Q1 for the company.

Micron’s FQ2 FY2026 release shows the same shape from the other side of the Pacific: a 74.4% GAAP gross margin on $23.9 billion of revenue, with the next quarter guided to roughly 81%. Samsung’s 1Q26 Device Solutions segment pulled KRW 53.7 trillion of operating profit out of KRW 81.7 trillion of memory-led revenue.

Three suppliers, three filings, the same read: HBM scarcity is being priced like the bandwidth it unlocks, not like the DRAM it is built from.

HBM4 already pushes past 2.5 TB/s — the chapter’s anchor number is the floor, not the ceiling

On March 18, 2026, AMD and Samsung announced HBM4 for the AMD Instinct MI455X built on 1c DRAM and a 4 nm logic base die, with speeds up to 13 GT/s and per-stack bandwidth of 3.3 TB/s. The math is the same as the signature figure, only faster: 2 048 IO × 13 GT/s ÷ 8 = 3.328 TB/s.

The 2.5 TB/s in the signature was computed at 10 GT/s. The disclosed HBM4 ceiling on a real product is now meaningfully higher, and the base die is a logic chip — collapsing the memory/foundry boundary that the rest of the stack still treats as separate.

The scarcity is mechanical: HBM consumes DRAM wafers that smartphones also want

The bottleneck tracker has HBM Memory in the lead through 2025 and 2026 — not because suppliers can’t make memory, but because smartphones bid for the same DRAM wafers. SK Hynix tripled HBM prices in 2025 as long-context inference met sparse MoE and KV caches grew with every token.

New DRAM fabs are the only real fix. Capacity from 2025 fab decisions does not arrive in volume until late 2027. The scarcity isn’t artificial — it’s a wafer-allocation problem with a two-year minimum fix.

More compute is only useful when memory can feed it fast enough.