b8213d43a9
Restores per-PE HBM controller partitioning that was lost in
commit 5917b34 ("Replace xbar/bridge/single-NOC with explicit
router mesh"), which had over-consolidated the per-slice HBM CTRL
into a single cube-wide ``hbm_ctrl`` connected to every router —
the opposite of what ADR-0019 D1/D4 specifies.
Builder splits ``hbm_ctrl`` into 8 ``hbm_ctrl.pe{X}`` instances per
cube, each reachable ONLY through PE_X's attaching router via the
existing ``peX.hbm`` attach metadata from cube_mesh.yaml. Cube
aggregate BW now matches the spec (8 PEs × 8 PCs × 32 GB/s =
2048 GB/s) instead of collapsing to 256 GB/s.
AddressResolver decodes the target PE from the HBM PA's hbm_offset
(``offset // slice_size``) and returns ``hbm_ctrl.pe{X}``. PathRouter
uses the existing ``_adj_local`` adjacency for same-cube PE_DMA so
the cube's own UCIe port can no longer appear as a zero-distance
shortcut between routers — local PE_DMA now traverses the mesh,
restoring the ADR-0019 D4 worked example
``PE0.pe_dma → r0c0 → … → r1c4 → hbm_ctrl``.
Tests:
- New tests/test_per_pe_hbm_partition.py: 14 tests covering
topology shape, per-PE router exclusivity, PA resolution,
single-hop local path, cross-PE mesh traversal, and end-to-end
latency monotonicity. Probe CLI now reports
pe-local < pe-same-half < pe-cross-half (was uniform 141ns).
- Existing tests updated for new node ids and replaced two
assertions that locked in the wrong consolidation:
test_noc_mesh.test_hbm_connects_to_all_routers and
test_topology_compile.test_hbm_ctrl_connects_all_routers are
now per-PE exclusivity assertions; test_routing
.test_all_pe_hbm_equidistant becomes
test_cross_pe_hbm_distance_increases_with_mesh_hops.
- test_ipcq_buffer_kind_locations.test_hbm_pe_hop_charged_at_large_payload
threshold recalibrated 4000→1500 ns: the prior figure reflected
serialization on the over-consolidated single hbm_ctrl; per-PE
partitioning removes that artificial contention so the gap
shrinks to the genuine PE↔HBM-hop cost.
Full suite: 645 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.8 KiB
6.8 KiB
| 1 | hop | label | size_bytes | path | total_ns |
|---|---|---|---|---|---|
| 2 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 128 | ipcq | 24.88749999999891 |
| 3 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 128 | raw | 33.57999999999811 |
| 4 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 256 | ipcq | 28.13749999999891 |
| 5 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 256 | raw | 36.07999999999811 |
| 6 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 384 | ipcq | 29.88749999999891 |
| 7 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 384 | raw | 37.07999999999811 |
| 8 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 512 | ipcq | 31.63749999999891 |
| 9 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 512 | raw | 38.07999999999811 |
| 10 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 768 | ipcq | 35.13749999999891 |
| 11 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 768 | raw | 40.07999999999811 |
| 12 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 1024 | ipcq | 38.63749999999891 |
| 13 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 1024 | raw | 42.07999999999811 |
| 14 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 2048 | ipcq | 52.63749999999891 |
| 15 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 2048 | raw | 50.07999999999811 |
| 16 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 4096 | ipcq | 80.63750000000073 |
| 17 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 4096 | raw | 66.08000000000175 |
| 18 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 8192 | ipcq | 136.63750000000073 |
| 19 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 8192 | raw | 98.08000000000175 |
| 20 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 10240 | ipcq | 164.63750000000073 |
| 21 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 10240 | raw | 114.08000000000175 |
| 22 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 128 | ipcq | 38.49749999999585 |
| 23 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 128 | raw | 47.18999999999505 |
| 24 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 256 | ipcq | 43.24749999999585 |
| 25 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 256 | raw | 51.18999999999505 |
| 26 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 384 | ipcq | 44.99749999999585 |
| 27 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 384 | raw | 52.18999999999505 |
| 28 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 512 | ipcq | 46.74749999999585 |
| 29 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 512 | raw | 53.18999999999505 |
| 30 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 768 | ipcq | 50.24749999999585 |
| 31 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 768 | raw | 55.18999999999505 |
| 32 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 1024 | ipcq | 53.74749999999585 |
| 33 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 1024 | raw | 57.18999999999505 |
| 34 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 2048 | ipcq | 67.74749999999585 |
| 35 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 2048 | raw | 65.18999999999505 |
| 36 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 4096 | ipcq | 95.74750000000131 |
| 37 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 4096 | raw | 81.19000000000233 |
| 38 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 8192 | ipcq | 151.7475000000013 |
| 39 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 8192 | raw | 113.19000000000233 |
| 40 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 10240 | ipcq | 179.7475000000013 |
| 41 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 10240 | raw | 129.19000000000233 |
| 42 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 128 | ipcq | 81.15999999999804 |
| 43 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 128 | raw | 89.28999999999724 |
| 44 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 256 | ipcq | 88.65999999999804 |
| 45 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 256 | raw | 95.53999999999724 |
| 46 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 384 | ipcq | 90.90999999999804 |
| 47 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 384 | raw | 96.53999999999724 |
| 48 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 512 | ipcq | 93.15999999999804 |
| 49 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 512 | raw | 97.53999999999724 |
| 50 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 768 | ipcq | 97.65999999999804 |
| 51 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 768 | raw | 99.53999999999724 |
| 52 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 1024 | ipcq | 103.15999999999804 |
| 53 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 1024 | raw | 102.53999999999724 |
| 54 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 2048 | ipcq | 125.15999999999804 |
| 55 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 2048 | raw | 114.53999999999724 |
| 56 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 4096 | ipcq | 169.15999999999985 |
| 57 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 4096 | raw | 138.54000000000087 |
| 58 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 8192 | ipcq | 257.15999999999985 |
| 59 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 8192 | raw | 186.54000000000087 |
| 60 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 10240 | ipcq | 301.15999999999985 |
| 61 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 10240 | raw | 210.54000000000087 |
| 62 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 128 | ipcq | 103.15999999999804 |
| 63 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 128 | raw | 111.28999999999724 |
| 64 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 256 | ipcq | 112.65999999999804 |
| 65 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 256 | raw | 119.53999999999724 |
| 66 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 384 | ipcq | 114.90999999999804 |
| 67 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 384 | raw | 120.53999999999724 |
| 68 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 512 | ipcq | 117.15999999999804 |
| 69 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 512 | raw | 121.53999999999724 |
| 70 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 768 | ipcq | 121.65999999999804 |
| 71 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 768 | raw | 123.53999999999724 |
| 72 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 1024 | ipcq | 127.15999999999804 |
| 73 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 1024 | raw | 126.53999999999724 |
| 74 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 2048 | ipcq | 149.15999999999804 |
| 75 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 2048 | raw | 138.53999999999724 |
| 76 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 4096 | ipcq | 193.15999999999985 |
| 77 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 4096 | raw | 162.54000000000087 |
| 78 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 8192 | ipcq | 281.15999999999985 |
| 79 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 8192 | raw | 210.54000000000087 |
| 80 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 10240 | ipcq | 325.15999999999985 |
| 81 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 10240 | raw | 234.54000000000087 |