# ADR-0004: Memory Semantics & Local-HBM Bandwidth Guarantee ## Status Accepted ## Context Accurately modeling PE↔HBM behavior is essential for kernel latency estimation. Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth, independent of intervening on-die fabric bandwidth. ## Decision ### D1. Local HBM definition - Each PE is assigned a logically defined “local HBM” region. - Local HBM corresponds to the pseudo-channel subset directly attached to that PE’s DMA path via the XBAR (top or bottom, depending on PE corner placement). - The path is: PE_DMA → XBAR.top/bottom → HBM_CTRL. - The mapping (HBM pseudo-channels → PE local regions) is derived from topology configuration. ### D2. Local HBM bandwidth guarantee contract - Accesses from a PE to its local HBM MUST guarantee full HBM read/write bandwidth independent of intervening fabric bandwidth limits. - This guarantee is modeled by: - a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point, - while still incurring non-zero latency along explicitly modeled components. ### D3. Cross-half HBM semantics - A PE connected to XBAR.bottom that accesses HBM pseudo-channels on the XBAR.top half (or vice versa) traverses a bridge: - PE_DMA → XBAR.bottom → bridge → XBAR.top → HBM_CTRL - Bridge bandwidth may limit cross-half HBM access relative to local-half access. ### D4. Non-local HBM semantics (inter-cube / inter-SIP) - Accesses from a PE to HBM in a different cube or SIP MAY be limited by: - NOC bandwidth within the cube, - inter-cube UCIe links, - inter-SIP fabric (PCIe/UAL). - These paths MUST be explicit and traceable. ### D5. Shared SRAM semantics - Each CUBE contains a shared SRAM accessible by all PEs in that CUBE. - Access path: PE_DMA → NOC → shared SRAM. - Shared SRAM bandwidth is limited by the NOC↔SRAM link bandwidth. - Shared SRAM is not part of the HBM address space; it is a separate memory domain. ## Verification Notes Tests should cover: - local-HBM case: BW matches HBM BW regardless of fabric BW parameter - cross-half HBM case: latency includes bridge traversal - non-local cases (inter-cube/inter-SIP): BW/latency respond to fabric/link parameters - shared SRAM case: access via NOC with correct BW ## Links - SPEC R2/R5 - ADR-0002 (distance/order & explicit bypass)