ADR-0023 D9.7+: charge PE↔bank fabric hop for SRAM/HBM IPCQ slots
Cube SRAM and HBM live on the cube NoC behind router-attached links (sram_to_router_bw_gbs=128, hbm_to_router_bw_gbs=256). Previously the slot-IO model treated them as if they were per-PE local, so the buffer_kind sweep showed TCM ≈ SRAM at 64 KB / PE. pe_ipcq._handle_recv and pe_dma._handle_ipcq_inbound now charge a PE→bank compute_drain_ns on top of the intrinsic slot-IO for SRAM/HBM. TCM stays free of this hop. Adds an internal IpcqRecvCmd.consume field that gates the recv-side hop+slot-IO charges (used by a follow-up diagnostic API; default True keeps current behavior). Post-fix at 64 KB / PE: TCM 12.0 µs < HBM 21.4 µs < SRAM 24.3 µs. SRAM is slowest because its 128 GB/s bank link is the narrowest in the system — narrower than HBM's 256 GB/s. The existing ordering test is rewritten from tcm<sram<hbm to tcm<hbm<sram and a new test_ipcq_buffer_kind_locations adds 3 invariants on the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -135,6 +135,13 @@ class IpcqRecvCmd:
|
||||
"return_slot" — return slot address as-is (default, zero-copy).
|
||||
Kernel uses the slot memory directly.
|
||||
"copy_to_dst" — copy slot data to dst_addr, then return.
|
||||
|
||||
``consume`` (DIAGNOSTIC ONLY): when False, recv still blocks until the
|
||||
payload lands in the slot, but skips the slot-read latency charge
|
||||
(slot-IO + PE↔bank fabric drain for SRAM/HBM tiers). This exists
|
||||
solely so the pe2pe overview plot can compare apples-to-apples
|
||||
against tl.store (a one-sided write that pays no read on DST). Real
|
||||
kernels always need the data they receive — leave this True.
|
||||
"""
|
||||
|
||||
direction: str | None # None → round-robin (weak fairness, D4)
|
||||
@@ -146,6 +153,7 @@ class IpcqRecvCmd:
|
||||
dst_space: str = "" # used only when recv_mode == "copy_to_dst"
|
||||
blocking: bool = True
|
||||
data_op: bool = True
|
||||
consume: bool = True # DIAGNOSTIC: see docstring
|
||||
|
||||
|
||||
# ── D12: IpcqDmaToken (PE_IPCQ → PE_DMA, vc_comm) ───────────────────
|
||||
|
||||
Reference in New Issue
Block a user