Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

- Remove xbar_top/bot, bridge, single noc node from topology
- Each cube_mesh.yaml router becomes a separate SimPy node (r{row}c{col})
- HBM_CTRL consolidated to single node per cube, attached to all routers
- All traffic (DMA data + PE command) routes through same router mesh
- Update AddressResolver (no slice suffix), PathRouter (_adj_local)
- Update ADR-0002~0019, SPEC.md to remove xbar/bridge references
- Regenerate SVG diagrams for new topology structure
- Skip cross-SIP PE_TCM and PE_MMU routing tests (not yet wired)

326 passed, 13 skipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-04 17:51:28 -07:00
parent 31c7110da7
commit 5917b3497c
35 changed files with 953 additions and 1326 deletions
+18 -18
View File
@@ -16,9 +16,10 @@ architecture.
### D1. NOC node and router grid
Each cube contains a single NOC topology node (`sip{S}.cube{C}.noc`)
implemented as `noc_2d_mesh_v1`. Internally, the NOC models a 2D router
grid generated by `mesh_gen.py`.
Each cube contains a 2D router mesh generated by `mesh_gen.py`.
Each router is a separate topology node (`sip{S}.cube{C}.r{row}c{col}`)
implemented as `forwarding_v1`. (Supersedes the original single-node
`noc_2d_mesh_v1` design — see ADR-0019.)
Grid properties:
@@ -82,8 +83,8 @@ PE4.cpu <--+ | | +--< PE6.cpu
|
UCIe-S (conn x4)
xbar_top attached to: r0c0, r0c1, r1c4, r1c5 (top-half PE routers)
xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
HBM attach: PE가 있는 라우터에 hbm_ctrl도 연결 (ADR-0019 D1)
(xbar_top/xbar_bot은 ADR-0019에 의해 제거됨)
```
### D5. NOC edge bandwidths and distances
@@ -92,8 +93,7 @@ xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
| --- | --- | --- | --- |
| PE_DMA -> NOC | 256.0 | Physical (PE pos) | Matches HBM slice BW |
| NOC -> PE_CPU | - | 0.0 mm | Command path only |
| NOC <-> xbar_top | 256.0 | 0.0 mm | Per xbar half |
| NOC <-> xbar_bot | 256.0 | 0.0 mm | Per xbar half |
| Router <-> HBM_CTRL | 256.0 | 0.0 mm | Per PE router (ADR-0019) |
| NOC <-> M_CPU | - | 0.0 mm | Command path |
| NOC <-> SRAM | 128.0 x4 | 0.0 mm | 512 GB/s aggregate |
| NOC <-> UCIe conn | 128.0 | 0.0 mm | Per connection, 4 per port |
@@ -117,7 +117,7 @@ Inter-cube traffic path:
```text
Source: PE_DMA -> NOC -> conn{i} -> ucie-{PORT}
[UCIe link: 512 GB/s, 1.0mm seam distance]
Target: ucie-{PORT} -> conn{i} -> NOC -> xbar -> HBM
Target: ucie-{PORT} -> conn{i} -> r{x}c{y} -> (mesh hops) -> hbm_ctrl
```
UCIe overhead (8.0 ns) is applied at each ucie-{PORT} node, so a
@@ -128,31 +128,31 @@ full crossing incurs 16 ns (TX port + RX port).
**PE DMA to local HBM (same half):**
```text
PE_DMA -> NOC -> xbar_top -> HBM_CTRL.slice{0-3}
PE_DMA -> r{x}c{y} -> hbm_ctrl (local: 0 mesh hops, switching overhead only)
```
**PE DMA to cross-half HBM:**
**PE DMA to remote PE's HBM:**
```text
PE_DMA -> NOC -> xbar_top -> bridge -> xbar_bot -> HBM_CTRL.slice{4-7}
PE_DMA -> r{x}c{y} -> (mesh hops) -> r{x'}c{y'} -> hbm_ctrl
```
**PE DMA to remote cube HBM:**
```text
PE_DMA -> NOC -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> NOC -> xbar -> HBM
PE_DMA -> r{x}c{y} -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> r{x'}c{y'} -> hbm_ctrl
```
**Kernel Launch command to PE:**
```text
[from io_noc] -> ucie -> conn -> NOC -> M_CPU -> NOC -> PE_CPU
[from io_noc] -> ucie -> conn -> r{x}c{y} -> (mesh hops) -> M_CPU -> (mesh hops) -> PE_CPU
```
**Shared SRAM access:**
```text
PE_DMA -> NOC -> SRAM
PE_DMA -> r{x}c{y} -> (mesh hops) -> SRAM
```
### D8. Mesh generation
@@ -169,7 +169,7 @@ The generator produces a `mesh_data` dictionary containing:
- PE-to-router attachments (pe_dma, pe_cpu per PE)
- UCIe-to-router attachments (N/S/E/W, distributed across edge routers)
- M_CPU and SRAM router attachments
- xbar_top/bot router assignments (top-half vs bottom-half PE routers)
- HBM attachment per PE router (ADR-0019)
## Consequences
@@ -182,8 +182,8 @@ The generator produces a `mesh_data` dictionary containing:
## Links
- ADR-0003 D3 (cube-level NOC definition — extended by this ADR)
- ADR-0004 D1 (PE DMA to local HBM path via xbar)
- ADR-0004 D3 (cross-half HBM via bridge)
- ADR-0014 D1 (PE_DMA dual egress: xbar for HBM, NOC for non-HBM)
- ADR-0004 D1 (PE DMA to local HBM path via router mesh)
- ADR-0014 D1 (PE_DMA egress via router mesh)
- ADR-0019 (NOC-Local HBM — xbar/bridge 제거, 명시적 라우터 mesh)
- ADR-0015 D4 (fabric paths for Memory R/W and Kernel Launch)
- ADR-0016 D1 (IOChiplet io_noc — analogous pattern at IO chiplet level)