Files
kernbench2/docs/diagrams/pe2pe_latency_plots/summary.csv
T
ywkang b8213d43a9 ADR-0019 D1/D4: per-PE HBM CTRL partitioning
Restores per-PE HBM controller partitioning that was lost in
commit 5917b34 ("Replace xbar/bridge/single-NOC with explicit
router mesh"), which had over-consolidated the per-slice HBM CTRL
into a single cube-wide ``hbm_ctrl`` connected to every router —
the opposite of what ADR-0019 D1/D4 specifies.

Builder splits ``hbm_ctrl`` into 8 ``hbm_ctrl.pe{X}`` instances per
cube, each reachable ONLY through PE_X's attaching router via the
existing ``peX.hbm`` attach metadata from cube_mesh.yaml. Cube
aggregate BW now matches the spec (8 PEs × 8 PCs × 32 GB/s =
2048 GB/s) instead of collapsing to 256 GB/s.

AddressResolver decodes the target PE from the HBM PA's hbm_offset
(``offset // slice_size``) and returns ``hbm_ctrl.pe{X}``. PathRouter
uses the existing ``_adj_local`` adjacency for same-cube PE_DMA so
the cube's own UCIe port can no longer appear as a zero-distance
shortcut between routers — local PE_DMA now traverses the mesh,
restoring the ADR-0019 D4 worked example
``PE0.pe_dma → r0c0 → … → r1c4 → hbm_ctrl``.

Tests:
- New tests/test_per_pe_hbm_partition.py: 14 tests covering
  topology shape, per-PE router exclusivity, PA resolution,
  single-hop local path, cross-PE mesh traversal, and end-to-end
  latency monotonicity. Probe CLI now reports
  pe-local < pe-same-half < pe-cross-half (was uniform 141ns).
- Existing tests updated for new node ids and replaced two
  assertions that locked in the wrong consolidation:
  test_noc_mesh.test_hbm_connects_to_all_routers and
  test_topology_compile.test_hbm_ctrl_connects_all_routers are
  now per-PE exclusivity assertions; test_routing
  .test_all_pe_hbm_equidistant becomes
  test_cross_pe_hbm_distance_increases_with_mesh_hops.
- test_ipcq_buffer_kind_locations.test_hbm_pe_hop_charged_at_large_payload
  threshold recalibrated 4000→1500 ns: the prior figure reflected
  serialization on the over-consolidated single hbm_ctrl; per-PE
  partitioning removes that artificial contention so the gap
  shrinks to the genuine PE↔HBM-hop cost.

Full suite: 645 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 01:04:30 -07:00

82 lines
6.8 KiB
CSV

hop,label,size_bytes,path,total_ns
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),128,ipcq,24.88749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),128,raw,33.57999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),256,ipcq,28.13749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),256,raw,36.07999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),384,ipcq,29.88749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),384,raw,37.07999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),512,ipcq,31.63749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),512,raw,38.07999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),768,ipcq,35.13749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),768,raw,40.07999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),1024,ipcq,38.63749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),1024,raw,42.07999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),2048,ipcq,52.63749999999891
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),2048,raw,50.07999999999811
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),4096,ipcq,80.63750000000073
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),4096,raw,66.08000000000175
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),8192,ipcq,136.63750000000073
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),8192,raw,98.08000000000175
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),10240,ipcq,164.63750000000073
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),10240,raw,114.08000000000175
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),128,ipcq,38.49749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),128,raw,47.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),256,ipcq,43.24749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),256,raw,51.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),384,ipcq,44.99749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),384,raw,52.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),512,ipcq,46.74749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),512,raw,53.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),768,ipcq,50.24749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),768,raw,55.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),1024,ipcq,53.74749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),1024,raw,57.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),2048,ipcq,67.74749999999585
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),2048,raw,65.18999999999505
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),4096,ipcq,95.74750000000131
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),4096,raw,81.19000000000233
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),8192,ipcq,151.7475000000013
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),8192,raw,113.19000000000233
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),10240,ipcq,179.7475000000013
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),10240,raw,129.19000000000233
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),128,ipcq,81.15999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),128,raw,89.28999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),256,ipcq,88.65999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),256,raw,95.53999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),384,ipcq,90.90999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),384,raw,96.53999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),512,ipcq,93.15999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),512,raw,97.53999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),768,ipcq,97.65999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),768,raw,99.53999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),1024,ipcq,103.15999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),1024,raw,102.53999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),2048,ipcq,125.15999999999804
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),2048,raw,114.53999999999724
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),4096,ipcq,169.15999999999985
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),4096,raw,138.54000000000087
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),8192,ipcq,257.15999999999985
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),8192,raw,186.54000000000087
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),10240,ipcq,301.15999999999985
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),10240,raw,210.54000000000087
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),128,ipcq,103.15999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),128,raw,111.28999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),256,ipcq,112.65999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),256,raw,119.53999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),384,ipcq,114.90999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),384,raw,120.53999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),512,ipcq,117.15999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),512,raw,121.53999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),768,ipcq,121.65999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),768,raw,123.53999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),1024,ipcq,127.15999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),1024,raw,126.53999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),2048,ipcq,149.15999999999804
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),2048,raw,138.53999999999724
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),4096,ipcq,193.15999999999985
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),4096,raw,162.54000000000087
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),8192,ipcq,281.15999999999985
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),8192,raw,210.54000000000087
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),10240,ipcq,325.15999999999985
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),10240,raw,234.54000000000087