ADR-0023 D9.7+: charge PE↔bank fabric hop for SRAM/HBM IPCQ slots

Cube SRAM and HBM live on the cube NoC behind router-attached links
(sram_to_router_bw_gbs=128, hbm_to_router_bw_gbs=256). Previously the
slot-IO model treated them as if they were per-PE local, so the
buffer_kind sweep showed TCM ≈ SRAM at 64 KB / PE.

pe_ipcq._handle_recv and pe_dma._handle_ipcq_inbound now charge a
PE→bank compute_drain_ns on top of the intrinsic slot-IO for SRAM/HBM.
TCM stays free of this hop. Adds an internal IpcqRecvCmd.consume field
that gates the recv-side hop+slot-IO charges (used by a follow-up
diagnostic API; default True keeps current behavior).

Post-fix at 64 KB / PE: TCM 12.0 µs < HBM 21.4 µs < SRAM 24.3 µs.
SRAM is slowest because its 128 GB/s bank link is the narrowest in
the system — narrower than HBM's 256 GB/s. The existing ordering test
is rewritten from tcm<sram<hbm to tcm<hbm<sram and a new
test_ipcq_buffer_kind_locations adds 3 invariants on the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-28 18:20:28 -07:00
parent 533e699299
commit 9c129d6131
7 changed files with 317 additions and 44 deletions
@@ -1,12 +1,12 @@
buffer_kind,sip_topology,n_sips,n_elem,bytes_per_pe,latency_ns
hbm,torus_2d,6,128,256,2002.0399999999827
hbm,torus_2d,6,1024,2048,3541.0399999999827
hbm,torus_2d,6,8192,16384,15889.03999999999
hbm,torus_2d,6,32768,65536,58225.03999999998
sram,torus_2d,6,128,256,1762.0399999999827
sram,torus_2d,6,1024,2048,2293.0399999999827
sram,torus_2d,6,8192,16384,6577.039999999986
sram,torus_2d,6,32768,65536,21265.03999999992
hbm,torus_2d,6,128,256,1858.0399999999827
hbm,torus_2d,6,1024,2048,2389.0399999999827
hbm,torus_2d,6,8192,16384,6673.039999999986
hbm,torus_2d,6,32768,65536,21361.03999999992
sram,torus_2d,6,128,256,1774.0399999999827
sram,torus_2d,6,1024,2048,2389.0399999999827
sram,torus_2d,6,8192,16384,7345.039999999986
sram,torus_2d,6,32768,65536,24337.039999999935
tcm,torus_2d,6,128,256,1678.0399999999827
tcm,torus_2d,6,1024,2048,1957.0399999999827
tcm,torus_2d,6,8192,16384,4225.039999999986
1 buffer_kind sip_topology n_sips n_elem bytes_per_pe latency_ns
2 hbm torus_2d 6 128 256 2002.0399999999827 1858.0399999999827
3 hbm torus_2d 6 1024 2048 3541.0399999999827 2389.0399999999827
4 hbm torus_2d 6 8192 16384 15889.03999999999 6673.039999999986
5 hbm torus_2d 6 32768 65536 58225.03999999998 21361.03999999992
6 sram torus_2d 6 128 256 1762.0399999999827 1774.0399999999827
7 sram torus_2d 6 1024 2048 2293.0399999999827 2389.0399999999827
8 sram torus_2d 6 8192 16384 6577.039999999986 7345.039999999986
9 sram torus_2d 6 32768 65536 21265.03999999992 24337.039999999935
10 tcm torus_2d 6 128 256 1678.0399999999827
11 tcm torus_2d 6 1024 2048 1957.0399999999827
12 tcm torus_2d 6 8192 16384 4225.039999999986
Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 74 KiB