Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)
- Remove xbar_top/bot, bridge, single noc node from topology
- Each cube_mesh.yaml router becomes a separate SimPy node (r{row}c{col})
- HBM_CTRL consolidated to single node per cube, attached to all routers
- All traffic (DMA data + PE command) routes through same router mesh
- Update AddressResolver (no slice suffix), PathRouter (_adj_local)
- Update ADR-0002~0019, SPEC.md to remove xbar/bridge references
- Regenerate SVG diagrams for new topology structure
- Skip cross-SIP PE_TCM and PE_MMU routing tests (not yet wired)
326 passed, 13 skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -16,9 +16,10 @@ architecture.
|
||||
|
||||
### D1. NOC node and router grid
|
||||
|
||||
Each cube contains a single NOC topology node (`sip{S}.cube{C}.noc`)
|
||||
implemented as `noc_2d_mesh_v1`. Internally, the NOC models a 2D router
|
||||
grid generated by `mesh_gen.py`.
|
||||
Each cube contains a 2D router mesh generated by `mesh_gen.py`.
|
||||
Each router is a separate topology node (`sip{S}.cube{C}.r{row}c{col}`)
|
||||
implemented as `forwarding_v1`. (Supersedes the original single-node
|
||||
`noc_2d_mesh_v1` design — see ADR-0019.)
|
||||
|
||||
Grid properties:
|
||||
|
||||
@@ -82,8 +83,8 @@ PE4.cpu <--+ | | +--< PE6.cpu
|
||||
|
|
||||
UCIe-S (conn x4)
|
||||
|
||||
xbar_top attached to: r0c0, r0c1, r1c4, r1c5 (top-half PE routers)
|
||||
xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
|
||||
HBM attach: PE가 있는 라우터에 hbm_ctrl도 연결 (ADR-0019 D1)
|
||||
(xbar_top/xbar_bot은 ADR-0019에 의해 제거됨)
|
||||
```
|
||||
|
||||
### D5. NOC edge bandwidths and distances
|
||||
@@ -92,8 +93,7 @@ xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
|
||||
| --- | --- | --- | --- |
|
||||
| PE_DMA -> NOC | 256.0 | Physical (PE pos) | Matches HBM slice BW |
|
||||
| NOC -> PE_CPU | - | 0.0 mm | Command path only |
|
||||
| NOC <-> xbar_top | 256.0 | 0.0 mm | Per xbar half |
|
||||
| NOC <-> xbar_bot | 256.0 | 0.0 mm | Per xbar half |
|
||||
| Router <-> HBM_CTRL | 256.0 | 0.0 mm | Per PE router (ADR-0019) |
|
||||
| NOC <-> M_CPU | - | 0.0 mm | Command path |
|
||||
| NOC <-> SRAM | 128.0 x4 | 0.0 mm | 512 GB/s aggregate |
|
||||
| NOC <-> UCIe conn | 128.0 | 0.0 mm | Per connection, 4 per port |
|
||||
@@ -117,7 +117,7 @@ Inter-cube traffic path:
|
||||
```text
|
||||
Source: PE_DMA -> NOC -> conn{i} -> ucie-{PORT}
|
||||
[UCIe link: 512 GB/s, 1.0mm seam distance]
|
||||
Target: ucie-{PORT} -> conn{i} -> NOC -> xbar -> HBM
|
||||
Target: ucie-{PORT} -> conn{i} -> r{x}c{y} -> (mesh hops) -> hbm_ctrl
|
||||
```
|
||||
|
||||
UCIe overhead (8.0 ns) is applied at each ucie-{PORT} node, so a
|
||||
@@ -128,31 +128,31 @@ full crossing incurs 16 ns (TX port + RX port).
|
||||
**PE DMA to local HBM (same half):**
|
||||
|
||||
```text
|
||||
PE_DMA -> NOC -> xbar_top -> HBM_CTRL.slice{0-3}
|
||||
PE_DMA -> r{x}c{y} -> hbm_ctrl (local: 0 mesh hops, switching overhead only)
|
||||
```
|
||||
|
||||
**PE DMA to cross-half HBM:**
|
||||
**PE DMA to remote PE's HBM:**
|
||||
|
||||
```text
|
||||
PE_DMA -> NOC -> xbar_top -> bridge -> xbar_bot -> HBM_CTRL.slice{4-7}
|
||||
PE_DMA -> r{x}c{y} -> (mesh hops) -> r{x'}c{y'} -> hbm_ctrl
|
||||
```
|
||||
|
||||
**PE DMA to remote cube HBM:**
|
||||
|
||||
```text
|
||||
PE_DMA -> NOC -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> NOC -> xbar -> HBM
|
||||
PE_DMA -> r{x}c{y} -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> r{x'}c{y'} -> hbm_ctrl
|
||||
```
|
||||
|
||||
**Kernel Launch command to PE:**
|
||||
|
||||
```text
|
||||
[from io_noc] -> ucie -> conn -> NOC -> M_CPU -> NOC -> PE_CPU
|
||||
[from io_noc] -> ucie -> conn -> r{x}c{y} -> (mesh hops) -> M_CPU -> (mesh hops) -> PE_CPU
|
||||
```
|
||||
|
||||
**Shared SRAM access:**
|
||||
|
||||
```text
|
||||
PE_DMA -> NOC -> SRAM
|
||||
PE_DMA -> r{x}c{y} -> (mesh hops) -> SRAM
|
||||
```
|
||||
|
||||
### D8. Mesh generation
|
||||
@@ -169,7 +169,7 @@ The generator produces a `mesh_data` dictionary containing:
|
||||
- PE-to-router attachments (pe_dma, pe_cpu per PE)
|
||||
- UCIe-to-router attachments (N/S/E/W, distributed across edge routers)
|
||||
- M_CPU and SRAM router attachments
|
||||
- xbar_top/bot router assignments (top-half vs bottom-half PE routers)
|
||||
- HBM attachment per PE router (ADR-0019)
|
||||
|
||||
## Consequences
|
||||
|
||||
@@ -182,8 +182,8 @@ The generator produces a `mesh_data` dictionary containing:
|
||||
## Links
|
||||
|
||||
- ADR-0003 D3 (cube-level NOC definition — extended by this ADR)
|
||||
- ADR-0004 D1 (PE DMA to local HBM path via xbar)
|
||||
- ADR-0004 D3 (cross-half HBM via bridge)
|
||||
- ADR-0014 D1 (PE_DMA dual egress: xbar for HBM, NOC for non-HBM)
|
||||
- ADR-0004 D1 (PE DMA to local HBM path via router mesh)
|
||||
- ADR-0014 D1 (PE_DMA egress via router mesh)
|
||||
- ADR-0019 (NOC-Local HBM — xbar/bridge 제거, 명시적 라우터 mesh)
|
||||
- ADR-0015 D4 (fabric paths for Memory R/W and Kernel Launch)
|
||||
- ADR-0016 D1 (IOChiplet io_noc — analogous pattern at IO chiplet level)
|
||||
|
||||
Reference in New Issue
Block a user