Restores per-PE HBM controller partitioning that was lost in
commit 5917b34 ("Replace xbar/bridge/single-NOC with explicit
router mesh"), which had over-consolidated the per-slice HBM CTRL
into a single cube-wide ``hbm_ctrl`` connected to every router —
the opposite of what ADR-0019 D1/D4 specifies.
Builder splits ``hbm_ctrl`` into 8 ``hbm_ctrl.pe{X}`` instances per
cube, each reachable ONLY through PE_X's attaching router via the
existing ``peX.hbm`` attach metadata from cube_mesh.yaml. Cube
aggregate BW now matches the spec (8 PEs × 8 PCs × 32 GB/s =
2048 GB/s) instead of collapsing to 256 GB/s.
AddressResolver decodes the target PE from the HBM PA's hbm_offset
(``offset // slice_size``) and returns ``hbm_ctrl.pe{X}``. PathRouter
uses the existing ``_adj_local`` adjacency for same-cube PE_DMA so
the cube's own UCIe port can no longer appear as a zero-distance
shortcut between routers — local PE_DMA now traverses the mesh,
restoring the ADR-0019 D4 worked example
``PE0.pe_dma → r0c0 → … → r1c4 → hbm_ctrl``.
Tests:
- New tests/test_per_pe_hbm_partition.py: 14 tests covering
topology shape, per-PE router exclusivity, PA resolution,
single-hop local path, cross-PE mesh traversal, and end-to-end
latency monotonicity. Probe CLI now reports
pe-local < pe-same-half < pe-cross-half (was uniform 141ns).
- Existing tests updated for new node ids and replaced two
assertions that locked in the wrong consolidation:
test_noc_mesh.test_hbm_connects_to_all_routers and
test_topology_compile.test_hbm_ctrl_connects_all_routers are
now per-PE exclusivity assertions; test_routing
.test_all_pe_hbm_equidistant becomes
test_cross_pe_hbm_distance_increases_with_mesh_hops.
- test_ipcq_buffer_kind_locations.test_hbm_pe_hop_charged_at_large_payload
threshold recalibrated 4000→1500 ns: the prior figure reflected
serialization on the over-consolidated single hbm_ctrl; per-PE
partitioning removes that artificial contention so the gap
shrinks to the genuine PE↔HBM-hop cost.
Full suite: 645 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous model double-counted slow-upstream paths (e.g., 64KB via UCIe
128 GB/s was ~2x pessimistic). HBM CTRL now distributes bursts across
8 pseudo-channels via global round-robin, with per-chunk commit timing
that pipelines correctly against the bottleneck link's data arrival.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PE nodes are shifted 1.2mm above (top half) or below (bottom half)
their assigned router position. PE size reduced to 1.4x0.7mm.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cube_view now renders all 32 router nodes from cube_mesh.yaml
instead of collapsed "router_mesh" placeholder
- Fix mesh_gen row Y position overlap (r1/r2 and r3/r4 had same Y)
by adding hbm_gap spacing between PE rows and HBM zone
- Add noc_router to visualizer KIND_SIZE for proper sizing
- Update cube view tests for individual router nodes
339 passed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add router → PE_MMU edge so MmuMapMsg can reach PE_MMU via
the router mesh. Unskip all PE_MMU fabric tests.
339 passed, 0 skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove xbar_top/bot, bridge, single noc node from topology
- Each cube_mesh.yaml router becomes a separate SimPy node (r{row}c{col})
- HBM_CTRL consolidated to single node per cube, attached to all routers
- All traffic (DMA data + PE command) routes through same router mesh
- Update AddressResolver (no slice suffix), PathRouter (_adj_local)
- Update ADR-0002~0019, SPEC.md to remove xbar/bridge references
- Regenerate SVG diagrams for new topology structure
- Skip cross-SIP PE_TCM and PE_MMU routing tests (not yet wired)
326 passed, 13 skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model fabric response hop latency for PE-internal operations:
- HBM_CTRL sends PeDmaMsg response on reverse path instead of direct done signal
- PE_CPU sends ResponseMsg via NOC→M_CPU on kernel completion
- Add NOC→PE_DMA and PE_CPU→NOC edges in topology builder
- Make HBM BW test assertions dynamic based on topology efficiency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>