# ADR-0004: Memory Semantics & Local-HBM Bandwidth Guarantee ## Status Accepted ## Context Accurately modeling PE↔HBM behavior is essential for kernel latency estimation. Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth, independent of intervening on-die fabric bandwidth. ## Decision ### D1. Local HBM definition - Each PE is assigned a logically defined “local HBM” region. - Local HBM corresponds to the pseudo-channel subset directly attached to that PE’s router in the NOC mesh (ADR-0017 D4). - The path is: PE_DMA → local router → HBM_CTRL (switching overhead only, 0 mesh hops). - The mapping (HBM pseudo-channels → PE local regions) is derived from topology configuration. ### D2. Local HBM bandwidth guarantee contract - Accesses from a PE to its local HBM MUST guarantee full effective HBM read/write bandwidth independent of intervening fabric bandwidth limits. - Effective HBM bandwidth = spec bandwidth x efficiency factor. The efficiency factor (configured via `hbm_ctrl.attrs.efficiency`, default 0.8) models real-world DRAM inefficiencies (refresh cycles, bank conflicts, page misses). For example: 256 GB/s spec x 0.8 = 204.8 GB/s effective. - The topology builder applies the efficiency factor to router-to-hbm edge bandwidth at graph construction time, so all downstream routing and latency computation uses the effective value. - This guarantee is modeled by: - a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point, - while still incurring non-zero latency along explicitly modeled components. - HBM CTRL internal modeling (PC striping, cut-through, scheduling fidelity) is consolidated in ADR-0033 (Latency Model: Assumptions and Known Simplifications). The aggregate BW guarantee here remains the contract; ADR-0033 documents how the per-PC model realizes it and which scheduler effects are intentionally simplified. ### D3. Remote PE HBM semantics (intra-cube) - A PE that accesses another PE's local HBM traverses the NOC: - PE_DMA → NOC → (fabric hops) → target PE's NOC port → HBM_CTRL - NOC bandwidth and hop count may limit remote HBM access relative to local access. ### D4. Non-local HBM semantics (inter-cube / inter-SIP) - Accesses from a PE to HBM in a different cube or SIP MAY be limited by: - NOC bandwidth within the cube, - inter-cube UCIe links, - inter-SIP fabric (PCIe/UAL). - These paths MUST be explicit and traceable. ### D5. Shared SRAM semantics - Each CUBE contains a shared SRAM accessible by all PEs in that CUBE. - Access path: PE_DMA → NOC → shared SRAM. - Shared SRAM bandwidth is limited by the NOC↔SRAM link bandwidth. - Shared SRAM is not part of the HBM address space; it is a separate memory domain. ## Verification Notes Tests should cover: - local-HBM case: BW matches HBM BW regardless of fabric BW parameter - remote PE HBM case: latency includes mesh hop traversal - non-local cases (inter-cube/inter-SIP): BW/latency respond to fabric/link parameters - shared SRAM case: access via NOC with correct BW ## Links - SPEC R2/R5 - ADR-0002 (distance/order & explicit bypass) - ADR-0017 D7 (PE DMA data paths through NOC to HBM)