ADR-0003/0014: generalize "router mesh" to "NOC"

NOC topology is an implementation choice (mesh, ring, crossbar, etc.).
ADR-0017 covers the current 2D mesh choice; ADRs at the system-level
shouldn't bind to that specific implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 23:23:46 -07:00
parent c9bd5387ac
commit 32b29a1e5c
2 changed files with 21 additions and 19 deletions
+7 -5
View File
@@ -35,11 +35,13 @@ We model the system hierarchy explicitly:
- A CUBE contains: - A CUBE contains:
- HBM + memory controller (HBM_CTRL) - HBM + memory controller (HBM_CTRL)
- NOC router mesh: 2D grid of explicit routers (from cube_mesh.yaml) with XY routing; - NOC (on-die fabric): carries all intra-cube traffic including HBM data,
carries all intra-cube traffic including HBM data, inter-cube (UCIe), inter-cube (UCIe), command (M_CPU↔PE_CPU), and shared SRAM access.
command (M_CPU↔PE_CPU), and shared SRAM access. Must provide: full-BW PE↔local HBM path, PE↔SRAM connectivity,
HBM_CTRL is attached to PE routers (local HBM = 0 hop). PE↔UCIe connectivity, M_CPU↔PE command path.
See ADR-0017 and ADR-0019 for full architecture. NOC topology is an implementation choice (e.g., 2D mesh, ring, crossbar);
current implementation uses a 2D mesh with XY routing (see ADR-0017).
HBM_CTRL is attached to each PE's local NOC port (local HBM = minimal hop).
- Shared SRAM: cube-level shared memory accessible by all PEs via NOC - Shared SRAM: cube-level shared memory accessible by all PEs via NOC
- management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation - management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation
- multiple PEs - multiple PEs
@@ -44,15 +44,15 @@ Each PE contains the following logical components.
**PE_DMA** **PE_DMA**
- Handles memory transfers between PE_TCM and external memory domains. - Handles memory transfers between PE_TCM and external memory domains.
- PE_DMA connects to the NOC router mesh at the CUBE level (ADR-0019): - PE_DMA connects to the cube-level NOC (on-die fabric):
- All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the router mesh - All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the NOC
- Local HBM access: PE_DMA → local router → hbm_ctrl (switching overhead only) - Local HBM access: PE_DMA → NOC → hbm_ctrl (minimal hop)
- Remote/shared: PE_DMA → local router → (mesh hops) → destination - Remote/shared: PE_DMA → NOC → (fabric hops) → destination
- Supported directions include: - Supported directions include:
- HBM → PE_TCM (via router mesh) - HBM → PE_TCM (via NOC)
- PE_TCM → HBM (via router mesh) - PE_TCM → HBM (via NOC)
- PE_TCM → shared SRAM (via router mesh) - PE_TCM → shared SRAM (via NOC)
- PE_TCM → other memory domains (via router mesh, if supported by topology) - PE_TCM → other memory domains (via NOC, if supported by topology)
**PE_GEMM** **PE_GEMM**
@@ -252,7 +252,7 @@ Compute operations use a TCM-centric dataflow model.
**Input path (HBM)** **Input path (HBM)**
```text ```text
HBM → router mesh → PE_DMA (DMA_READ) → PE_TCM HBM → NOC → PE_DMA (DMA_READ) → PE_TCM
``` ```
**Input path (shared SRAM)** **Input path (shared SRAM)**
@@ -269,14 +269,14 @@ Compute engines read input tensors from PE_TCM.
PE_TCM → GEMM / MATH PE_TCM → GEMM / MATH
``` ```
Weights for GEMM may optionally stream directly from HBM (via router mesh). Weights for GEMM may optionally stream directly from HBM (via NOC).
**Output path (HBM)** **Output path (HBM)**
Compute results are written to PE_TCM, then DMA writes to HBM. Compute results are written to PE_TCM, then DMA writes to HBM.
```text ```text
PE_TCM → PE_DMA (DMA_WRITE) → router mesh → HBM PE_TCM → PE_DMA (DMA_WRITE) → NOC → HBM
``` ```
**Output path (shared SRAM)** **Output path (shared SRAM)**
@@ -348,9 +348,9 @@ PE instances are derived from `cube.pe_layout`.
External connectivity such as: External connectivity such as:
- PE_DMA → router mesh → HBM (data path, ADR-0019) - PE_DMA → NOC → HBM (data path)
- PE_DMA → router mesh → shared SRAM, inter-cube UCIe (non-HBM data path) - PE_DMA → NOC → shared SRAM, inter-cube UCIe (non-HBM data path)
- router mesh → PE_CPU (command path from M_CPU) - NOC → PE_CPU (command path from M_CPU)
is modeled at the CUBE level (see ADR-0003 D3). is modeled at the CUBE level (see ADR-0003 D3).