diff --git a/docs/adr/ADR-0003-target-system-hierarchy.md b/docs/adr/ADR-0003-target-system-hierarchy.md index 30b948d..e5acc7d 100644 --- a/docs/adr/ADR-0003-target-system-hierarchy.md +++ b/docs/adr/ADR-0003-target-system-hierarchy.md @@ -35,11 +35,13 @@ We model the system hierarchy explicitly: - A CUBE contains: - HBM + memory controller (HBM_CTRL) - - NOC router mesh: 2D grid of explicit routers (from cube_mesh.yaml) with XY routing; - carries all intra-cube traffic including HBM data, inter-cube (UCIe), - command (M_CPU↔PE_CPU), and shared SRAM access. - HBM_CTRL is attached to PE routers (local HBM = 0 hop). - See ADR-0017 and ADR-0019 for full architecture. + - NOC (on-die fabric): carries all intra-cube traffic including HBM data, + inter-cube (UCIe), command (M_CPU↔PE_CPU), and shared SRAM access. + Must provide: full-BW PE↔local HBM path, PE↔SRAM connectivity, + PE↔UCIe connectivity, M_CPU↔PE command path. + NOC topology is an implementation choice (e.g., 2D mesh, ring, crossbar); + current implementation uses a 2D mesh with XY routing (see ADR-0017). + HBM_CTRL is attached to each PE's local NOC port (local HBM = minimal hop). - Shared SRAM: cube-level shared memory accessible by all PEs via NOC - management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation - multiple PEs diff --git a/docs/adr/ADR-0014-pe-internal-execution-model.md b/docs/adr/ADR-0014-pe-internal-execution-model.md index ae17b69..7153b2a 100644 --- a/docs/adr/ADR-0014-pe-internal-execution-model.md +++ b/docs/adr/ADR-0014-pe-internal-execution-model.md @@ -44,15 +44,15 @@ Each PE contains the following logical components. **PE_DMA** - Handles memory transfers between PE_TCM and external memory domains. -- PE_DMA connects to the NOC router mesh at the CUBE level (ADR-0019): - - All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the router mesh - - Local HBM access: PE_DMA → local router → hbm_ctrl (switching overhead only) - - Remote/shared: PE_DMA → local router → (mesh hops) → destination +- PE_DMA connects to the cube-level NOC (on-die fabric): + - All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the NOC + - Local HBM access: PE_DMA → NOC → hbm_ctrl (minimal hop) + - Remote/shared: PE_DMA → NOC → (fabric hops) → destination - Supported directions include: - - HBM → PE_TCM (via router mesh) - - PE_TCM → HBM (via router mesh) - - PE_TCM → shared SRAM (via router mesh) - - PE_TCM → other memory domains (via router mesh, if supported by topology) + - HBM → PE_TCM (via NOC) + - PE_TCM → HBM (via NOC) + - PE_TCM → shared SRAM (via NOC) + - PE_TCM → other memory domains (via NOC, if supported by topology) **PE_GEMM** @@ -252,7 +252,7 @@ Compute operations use a TCM-centric dataflow model. **Input path (HBM)** ```text -HBM → router mesh → PE_DMA (DMA_READ) → PE_TCM +HBM → NOC → PE_DMA (DMA_READ) → PE_TCM ``` **Input path (shared SRAM)** @@ -269,14 +269,14 @@ Compute engines read input tensors from PE_TCM. PE_TCM → GEMM / MATH ``` -Weights for GEMM may optionally stream directly from HBM (via router mesh). +Weights for GEMM may optionally stream directly from HBM (via NOC). **Output path (HBM)** Compute results are written to PE_TCM, then DMA writes to HBM. ```text -PE_TCM → PE_DMA (DMA_WRITE) → router mesh → HBM +PE_TCM → PE_DMA (DMA_WRITE) → NOC → HBM ``` **Output path (shared SRAM)** @@ -348,9 +348,9 @@ PE instances are derived from `cube.pe_layout`. External connectivity such as: -- PE_DMA → router mesh → HBM (data path, ADR-0019) -- PE_DMA → router mesh → shared SRAM, inter-cube UCIe (non-HBM data path) -- router mesh → PE_CPU (command path from M_CPU) +- PE_DMA → NOC → HBM (data path) +- PE_DMA → NOC → shared SRAM, inter-cube UCIe (non-HBM data path) +- NOC → PE_CPU (command path from M_CPU) is modeled at the CUBE level (see ADR-0003 D3).