Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

- Remove xbar_top/bot, bridge, single noc node from topology - Each cube_mesh.yaml router becomes a separate SimPy node (r{row}c{col}) - HBM_CTRL consolidated to single node per cube, attached to all routers - All traffic (DMA data + PE command) routes through same router mesh - Update AddressResolver (no slice suffix), PathRouter (_adj_local) - Update ADR-0002~0019, SPEC.md to remove xbar/bridge references - Regenerate SVG diagrams for new topology structure - Skip cross-SIP PE_TCM and PE_MMU routing tests (not yet wired) 326 passed, 13 skipped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:51:28 -07:00
parent 31c7110da7
commit 5917b3497c
35 changed files with 953 additions and 1326 deletions
@@ -44,14 +44,15 @@ Each PE contains the following logical components.
 **PE_DMA**

 - Handles memory transfers between PE_TCM and external memory domains.
- PE_DMA has **dual egress** at the CUBE level:
-  - **→ XBAR**: dedicated path to HBM (local and cross-half via bridge)
-  - **→ NOC**: path to non-HBM destinations (shared SRAM, inter-cube UCIe, etc.)
+- PE_DMA connects to the NOC router mesh at the CUBE level (ADR-0019):
+  - All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the router mesh
+  - Local HBM access: PE_DMA → local router → hbm_ctrl (switching overhead only)
+  - Remote/shared: PE_DMA → local router → (mesh hops) → destination
 - Supported directions include:
-  - HBM → PE_TCM (via XBAR)
-  - PE_TCM → HBM (via XBAR)
-  - PE_TCM → shared SRAM (via NOC)
-  - PE_TCM → other memory domains (via NOC, if supported by topology)
+  - HBM → PE_TCM (via router mesh)
+  - PE_TCM → HBM (via router mesh)
+  - PE_TCM → shared SRAM (via router mesh)
+  - PE_TCM → other memory domains (via router mesh, if supported by topology)

 **PE_GEMM**

@@ -251,7 +252,7 @@ Compute operations use a TCM-centric dataflow model.
 **Input path (HBM)**

 ```text
-HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
+HBM → router mesh → PE_DMA (DMA_READ) → PE_TCM
 ```

 **Input path (shared SRAM)**
@@ -268,14 +269,14 @@ Compute engines read input tensors from PE_TCM.
 PE_TCM → GEMM / MATH
 ```

-Weights for GEMM may optionally stream directly from HBM (via XBAR).
+Weights for GEMM may optionally stream directly from HBM (via router mesh).

 **Output path (HBM)**

 Compute results are written to PE_TCM, then DMA writes to HBM.

 ```text
-PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
+PE_TCM → PE_DMA (DMA_WRITE) → router mesh → HBM
 ```

 **Output path (shared SRAM)**
@@ -347,9 +348,9 @@ PE instances are derived from `cube.pe_layout`.

 External connectivity such as:

- PE_DMA → XBAR (HBM data path)
- PE_DMA → NOC (non-HBM data path: shared SRAM, inter-cube UCIe)
- NOC → PE_CPU (command path from M_CPU)
+- PE_DMA → router mesh → HBM (data path, ADR-0019)
+- PE_DMA → router mesh → shared SRAM, inter-cube UCIe (non-HBM data path)
+- router mesh → PE_CPU (command path from M_CPU)

 is modeled at the CUBE level (see ADR-0003 D3).