Latency model: HBM PC striping + chunk-loop drain (ADR-0033)

Previous model double-counted slow-upstream paths (e.g., 64KB via UCIe
128 GB/s was ~2x pessimistic). HBM CTRL now distributes bursts across
8 pseudo-channels via global round-robin, with per-chunk commit timing
that pipelines correctly against the bottleneck link's data arrival.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 21:59:07 -07:00
parent f6d262e359
commit 5fdb6f8797
11 changed files with 1192 additions and 52 deletions
@@ -33,12 +33,17 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
- This guarantee is modeled by:
- a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point,
- while still incurring non-zero latency along explicitly modeled components.
- HBM CTRL internal modeling (PC striping, cut-through, scheduling fidelity)
is consolidated in ADR-0033 (Latency Model: Assumptions and Known
Simplifications). The aggregate BW guarantee here remains the contract;
ADR-0033 documents how the per-PC model realizes it and which scheduler
effects are intentionally simplified.
### D3. Remote PE HBM semantics (intra-cube)
- A PE that accesses another PE's local HBM traverses the router mesh:
- PE_DMA → local router → (mesh hops) → target PE's router → HBM_CTRL
- Router mesh bandwidth and hop count may limit remote HBM access relative to local access.
- A PE that accesses another PE's local HBM traverses the NOC:
- PE_DMA → NOC → (fabric hops) → target PE's NOC port → HBM_CTRL
- NOC bandwidth and hop count may limit remote HBM access relative to local access.
### D4. Non-local HBM semantics (inter-cube / inter-SIP)