Compare commits
21 Commits
31c7110da7
...
eb792e6212
| Author | SHA1 | Date | |
|---|---|---|---|
| eb792e6212 | |||
| 7640635f90 | |||
| 3ea4fa90f8 | |||
| 5125d92c17 | |||
| 72acc5c8bb | |||
| bde76ec959 | |||
| d3de982ea4 | |||
| df81835d84 | |||
| 66ec6cd40c | |||
| e766163a25 | |||
| 24faf2e1d4 | |||
| 7cd30e106e | |||
| 109c9b4483 | |||
| e94f1de078 | |||
| 5c6abe6d12 | |||
| f298e3c7cc | |||
| 91085733ba | |||
| d2c92b8a18 | |||
| 08256c1326 | |||
| 624161f52f | |||
| 5917b3497c |
@@ -104,7 +104,7 @@ The simulator MUST accept multiple topologies (YAML / JSON / dict), varying:
|
|||||||
- SIP count,
|
- SIP count,
|
||||||
- CUBE count per SIP,
|
- CUBE count per SIP,
|
||||||
- PE count per CUBE,
|
- PE count per CUBE,
|
||||||
- on-chip fabric structure (e.g., mesh / NoC / XBAR),
|
- on-chip fabric structure (e.g., mesh / NoC router grid),
|
||||||
- IO chiplets and interconnects,
|
- IO chiplets and interconnects,
|
||||||
- link bandwidth, latency, and capacity parameters.
|
- link bandwidth, latency, and capacity parameters.
|
||||||
|
|
||||||
@@ -119,8 +119,7 @@ Given a topology:
|
|||||||
|
|
||||||
All components MUST be replaceable behind stable interfaces, including:
|
All components MUST be replaceable behind stable interfaces, including:
|
||||||
|
|
||||||
- routers and fabrics (NoC, bridges, switches),
|
- routers and fabrics (NoC router mesh, switches),
|
||||||
- XBAR-like selectors,
|
|
||||||
- DMA engines and queues,
|
- DMA engines and queues,
|
||||||
- memory controllers and services (HBM, TCM, queues),
|
- memory controllers and services (HBM, TCM, queues),
|
||||||
- management and control processors (modeled components).
|
- management and control processors (modeled components).
|
||||||
@@ -226,7 +225,7 @@ No implicit translation or hidden latency is allowed.
|
|||||||
|
|
||||||
### 2.1 Graph Execution Model
|
### 2.1 Graph Execution Model
|
||||||
|
|
||||||
- Nodes represent modeled components (PE blocks, XBAR, NoC, bridges,
|
- Nodes represent modeled components (PE blocks, NoC routers,
|
||||||
HBM controllers, IO components, etc.).
|
HBM controllers, IO components, etc.).
|
||||||
- Directed edges represent interconnect links with latency and bandwidth attributes.
|
- Directed edges represent interconnect links with latency and bandwidth attributes.
|
||||||
- Execution model:
|
- Execution model:
|
||||||
|
|||||||
@@ -28,9 +28,6 @@ components:
|
|||||||
switch_v1: kernbench.components.builtin.forwarding:TransitComponent
|
switch_v1: kernbench.components.builtin.forwarding:TransitComponent
|
||||||
noc_v1: kernbench.components.builtin.forwarding:TransitComponent
|
noc_v1: kernbench.components.builtin.forwarding:TransitComponent
|
||||||
ucie_v1: kernbench.components.builtin.forwarding:TransitComponent
|
ucie_v1: kernbench.components.builtin.forwarding:TransitComponent
|
||||||
noc_2d_mesh_v1: kernbench.components.builtin.noc:TwoDMeshNocComponent
|
|
||||||
xbar_v1: kernbench.components.builtin.xbar:PositionAwareXbarComponent
|
|
||||||
|
|
||||||
# IO / Host interface
|
# IO / Host interface
|
||||||
pcie_ep_v1: kernbench.components.builtin.pcie_ep:PcieEpComponent
|
pcie_ep_v1: kernbench.components.builtin.pcie_ep:PcieEpComponent
|
||||||
io_cpu_v1: kernbench.components.builtin.io_cpu:IoCpuComponent
|
io_cpu_v1: kernbench.components.builtin.io_cpu:IoCpuComponent
|
||||||
|
|||||||
@@ -34,12 +34,11 @@ shortcuts that obscure control paths.
|
|||||||
(topology + policy + request).
|
(topology + policy + request).
|
||||||
|
|
||||||
### D3. Bypass is explicit and graph-represented
|
### D3. Bypass is explicit and graph-represented
|
||||||
- Any bypass (e.g., local cube HBM access via XBAR instead of NOC) must be:
|
- All paths must be explicitly represented in the graph and subject to latency accumulation.
|
||||||
- explicitly represented as a graph path, and
|
- Example: PE_DMA connects to the NOC router mesh (ADR-0019). All destinations
|
||||||
- subject to latency accumulation like any other path.
|
(HBM, shared SRAM, inter-cube UCIe) are reached via explicit mesh hops.
|
||||||
- Example: PE_DMA has dual egress — one to XBAR (HBM path) and one to NOC (non-HBM path).
|
Local HBM access has minimal hops (switching overhead only); remote access
|
||||||
Both are explicit graph edges; neither is a “bypass” — they are distinct data paths
|
traverses additional routers.
|
||||||
serving different memory domains.
|
|
||||||
- Implicit or “magic” bypass paths are disallowed.
|
- Implicit or “magic” bypass paths are disallowed.
|
||||||
|
|
||||||
### D4. No zero-latency end-to-end paths
|
### D4. No zero-latency end-to-end paths
|
||||||
|
|||||||
@@ -35,12 +35,11 @@ We model the system hierarchy explicitly:
|
|||||||
|
|
||||||
- A CUBE contains:
|
- A CUBE contains:
|
||||||
- HBM + memory controller (HBM_CTRL)
|
- HBM + memory controller (HBM_CTRL)
|
||||||
- XBAR (top/bottom): HBM pseudo-channel crossbar, PE's dedicated path to HBM
|
- NOC router mesh: 2D grid of explicit routers (from cube_mesh.yaml) with XY routing;
|
||||||
- Bridge (left/right): connects XBAR.top ↔ XBAR.bottom for cross-half HBM access
|
carries all intra-cube traffic including HBM data, inter-cube (UCIe),
|
||||||
- NOC: 2D mesh router grid spanning the entire cube with XY routing and
|
command (M_CPU↔PE_CPU), and shared SRAM access.
|
||||||
per-segment contention modeling; carries all intra-cube traffic including
|
HBM_CTRL is attached to PE routers (local HBM = 0 hop).
|
||||||
PE DMA to xbar (HBM), inter-cube (UCIe), command (M_CPU↔PE_CPU), and
|
See ADR-0017 and ADR-0019 for full architecture.
|
||||||
shared SRAM access. See ADR-0017 for full NOC architecture.
|
|
||||||
- Shared SRAM: cube-level shared memory accessible by all PEs via NOC
|
- Shared SRAM: cube-level shared memory accessible by all PEs via NOC
|
||||||
- management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation
|
- management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation
|
||||||
- multiple PEs
|
- multiple PEs
|
||||||
|
|||||||
@@ -14,9 +14,9 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
|
|||||||
### D1. Local HBM definition
|
### D1. Local HBM definition
|
||||||
|
|
||||||
- Each PE is assigned a logically defined “local HBM” region.
|
- Each PE is assigned a logically defined “local HBM” region.
|
||||||
- Local HBM corresponds to the pseudo-channel subset directly attached to that PE’s DMA path
|
- Local HBM corresponds to the pseudo-channel subset directly attached to that PE’s
|
||||||
via the XBAR (top or bottom, depending on PE corner placement).
|
router in the NOC mesh (ADR-0019).
|
||||||
- The path is: PE_DMA → XBAR.top/bottom → HBM_CTRL.
|
- The path is: PE_DMA → local router → HBM_CTRL (switching overhead only, 0 mesh hops).
|
||||||
- The mapping (HBM pseudo-channels → PE local regions) is derived from topology configuration.
|
- The mapping (HBM pseudo-channels → PE local regions) is derived from topology configuration.
|
||||||
|
|
||||||
### D2. Local HBM bandwidth guarantee contract
|
### D2. Local HBM bandwidth guarantee contract
|
||||||
@@ -27,19 +27,18 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
|
|||||||
The efficiency factor (configured via `hbm_ctrl.attrs.efficiency`, default 0.8)
|
The efficiency factor (configured via `hbm_ctrl.attrs.efficiency`, default 0.8)
|
||||||
models real-world DRAM inefficiencies (refresh cycles, bank conflicts, page
|
models real-world DRAM inefficiencies (refresh cycles, bank conflicts, page
|
||||||
misses). For example: 256 GB/s spec x 0.8 = 204.8 GB/s effective.
|
misses). For example: 256 GB/s spec x 0.8 = 204.8 GB/s effective.
|
||||||
- The topology builder applies the efficiency factor to xbar-to-hbm edge
|
- The topology builder applies the efficiency factor to router-to-hbm edge
|
||||||
bandwidth at graph construction time, so all downstream routing and latency
|
bandwidth at graph construction time, so all downstream routing and latency
|
||||||
computation uses the effective value.
|
computation uses the effective value.
|
||||||
- This guarantee is modeled by:
|
- This guarantee is modeled by:
|
||||||
- a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point,
|
- a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point,
|
||||||
- while still incurring non-zero latency along explicitly modeled components.
|
- while still incurring non-zero latency along explicitly modeled components.
|
||||||
|
|
||||||
### D3. Cross-half HBM semantics
|
### D3. Remote PE HBM semantics (intra-cube)
|
||||||
|
|
||||||
- A PE connected to XBAR.bottom that accesses HBM pseudo-channels on the XBAR.top half
|
- A PE that accesses another PE's local HBM traverses the router mesh:
|
||||||
(or vice versa) traverses a bridge:
|
- PE_DMA → local router → (mesh hops) → target PE's router → HBM_CTRL
|
||||||
- PE_DMA → XBAR.bottom → bridge → XBAR.top → HBM_CTRL
|
- Router mesh bandwidth and hop count may limit remote HBM access relative to local access.
|
||||||
- Bridge bandwidth may limit cross-half HBM access relative to local-half access.
|
|
||||||
|
|
||||||
### D4. Non-local HBM semantics (inter-cube / inter-SIP)
|
### D4. Non-local HBM semantics (inter-cube / inter-SIP)
|
||||||
|
|
||||||
@@ -61,7 +60,7 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
|
|||||||
Tests should cover:
|
Tests should cover:
|
||||||
|
|
||||||
- local-HBM case: BW matches HBM BW regardless of fabric BW parameter
|
- local-HBM case: BW matches HBM BW regardless of fabric BW parameter
|
||||||
- cross-half HBM case: latency includes bridge traversal
|
- remote PE HBM case: latency includes mesh hop traversal
|
||||||
- non-local cases (inter-cube/inter-SIP): BW/latency respond to fabric/link parameters
|
- non-local cases (inter-cube/inter-SIP): BW/latency respond to fabric/link parameters
|
||||||
- shared SRAM case: access via NOC with correct BW
|
- shared SRAM case: access via NOC with correct BW
|
||||||
|
|
||||||
|
|||||||
@@ -82,9 +82,8 @@ Explain cube-internal structure and data/control flow.
|
|||||||
|
|
||||||
**Visible elements**
|
**Visible elements**
|
||||||
|
|
||||||
- XBAR (top/bottom): HBM pseudo-channel crossbar
|
- Router mesh: 2D grid of NOC routers (from cube_mesh.yaml), all traffic routes through mesh
|
||||||
- Bridge (left/right): cross-half HBM connectors between XBAR.top and XBAR.bottom
|
- HBM_CTRL attached to PE routers (local HBM = 0 hop)
|
||||||
- NOC: distributed on-die fabric for non-HBM traffic
|
|
||||||
- HBM subsystem (HBM_CTRL)
|
- HBM subsystem (HBM_CTRL)
|
||||||
- Shared SRAM: cube-level shared memory
|
- Shared SRAM: cube-level shared memory
|
||||||
- Management CPU (M_CPU)
|
- Management CPU (M_CPU)
|
||||||
@@ -97,14 +96,13 @@ Explain cube-internal structure and data/control flow.
|
|||||||
|
|
||||||
**Visible links**
|
**Visible links**
|
||||||
|
|
||||||
- PE → XBAR (HBM data path, top or bottom by corner placement)
|
- PE → router (HBM + non-HBM data path via mesh)
|
||||||
- PE → NOC (non-HBM data path)
|
- Router ↔ HBM_CTRL (local HBM access)
|
||||||
- XBAR ↔ bridge ↔ XBAR (cross-half HBM access)
|
- Router ↔ Router (mesh hops for remote access)
|
||||||
- XBAR → HBM_CTRL
|
- Router ↔ UCIe endpoints
|
||||||
- NOC ↔ UCIe endpoints
|
- Router ↔ shared SRAM
|
||||||
- NOC ↔ shared SRAM
|
- M_CPU ↔ router (command path)
|
||||||
- M_CPU ↔ NOC (command path)
|
- Router → PE_CPU (command delivery, collapsed into PE block)
|
||||||
- NOC → PE_CPU (command delivery, collapsed into PE block)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -61,9 +61,9 @@ For each view (SIP / CUBE / PE):
|
|||||||
- preserve connectivity semantics relevant to that view,
|
- preserve connectivity semantics relevant to that view,
|
||||||
- compute distance buckets and assign layout layers deterministically.
|
- compute distance buckets and assign layout layers deterministically.
|
||||||
- CUBE-level projection MUST include:
|
- CUBE-level projection MUST include:
|
||||||
- XBAR (top/bottom), bridge (left/right), NOC, HBM_CTRL, shared SRAM, M_CPU, UCIe ports,
|
- Router mesh (from cube_mesh.yaml), HBM_CTRL, shared SRAM, M_CPU, UCIe ports,
|
||||||
and PEs as opaque blocks.
|
and PEs as opaque blocks.
|
||||||
- Distinct edge kinds for HBM path (PE→XBAR) vs non-HBM path (PE→NOC).
|
- All paths (HBM, non-HBM, command) route through the same router mesh (ADR-0019).
|
||||||
- Default anchors are implicit (ADR-0005) and MUST NOT require instance indices.
|
- Default anchors are implicit (ADR-0005) and MUST NOT require instance indices.
|
||||||
|
|
||||||
### D6. Output formats and determinism
|
### D6. Output formats and determinism
|
||||||
|
|||||||
@@ -44,14 +44,15 @@ Each PE contains the following logical components.
|
|||||||
**PE_DMA**
|
**PE_DMA**
|
||||||
|
|
||||||
- Handles memory transfers between PE_TCM and external memory domains.
|
- Handles memory transfers between PE_TCM and external memory domains.
|
||||||
- PE_DMA has **dual egress** at the CUBE level:
|
- PE_DMA connects to the NOC router mesh at the CUBE level (ADR-0019):
|
||||||
- **→ XBAR**: dedicated path to HBM (local and cross-half via bridge)
|
- All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the router mesh
|
||||||
- **→ NOC**: path to non-HBM destinations (shared SRAM, inter-cube UCIe, etc.)
|
- Local HBM access: PE_DMA → local router → hbm_ctrl (switching overhead only)
|
||||||
|
- Remote/shared: PE_DMA → local router → (mesh hops) → destination
|
||||||
- Supported directions include:
|
- Supported directions include:
|
||||||
- HBM → PE_TCM (via XBAR)
|
- HBM → PE_TCM (via router mesh)
|
||||||
- PE_TCM → HBM (via XBAR)
|
- PE_TCM → HBM (via router mesh)
|
||||||
- PE_TCM → shared SRAM (via NOC)
|
- PE_TCM → shared SRAM (via router mesh)
|
||||||
- PE_TCM → other memory domains (via NOC, if supported by topology)
|
- PE_TCM → other memory domains (via router mesh, if supported by topology)
|
||||||
|
|
||||||
**PE_GEMM**
|
**PE_GEMM**
|
||||||
|
|
||||||
@@ -251,7 +252,7 @@ Compute operations use a TCM-centric dataflow model.
|
|||||||
**Input path (HBM)**
|
**Input path (HBM)**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
|
HBM → router mesh → PE_DMA (DMA_READ) → PE_TCM
|
||||||
```
|
```
|
||||||
|
|
||||||
**Input path (shared SRAM)**
|
**Input path (shared SRAM)**
|
||||||
@@ -268,14 +269,14 @@ Compute engines read input tensors from PE_TCM.
|
|||||||
PE_TCM → GEMM / MATH
|
PE_TCM → GEMM / MATH
|
||||||
```
|
```
|
||||||
|
|
||||||
Weights for GEMM may optionally stream directly from HBM (via XBAR).
|
Weights for GEMM may optionally stream directly from HBM (via router mesh).
|
||||||
|
|
||||||
**Output path (HBM)**
|
**Output path (HBM)**
|
||||||
|
|
||||||
Compute results are written to PE_TCM, then DMA writes to HBM.
|
Compute results are written to PE_TCM, then DMA writes to HBM.
|
||||||
|
|
||||||
```text
|
```text
|
||||||
PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
|
PE_TCM → PE_DMA (DMA_WRITE) → router mesh → HBM
|
||||||
```
|
```
|
||||||
|
|
||||||
**Output path (shared SRAM)**
|
**Output path (shared SRAM)**
|
||||||
@@ -347,9 +348,9 @@ PE instances are derived from `cube.pe_layout`.
|
|||||||
|
|
||||||
External connectivity such as:
|
External connectivity such as:
|
||||||
|
|
||||||
- PE_DMA → XBAR (HBM data path)
|
- PE_DMA → router mesh → HBM (data path, ADR-0019)
|
||||||
- PE_DMA → NOC (non-HBM data path: shared SRAM, inter-cube UCIe)
|
- PE_DMA → router mesh → shared SRAM, inter-cube UCIe (non-HBM data path)
|
||||||
- NOC → PE_CPU (command path from M_CPU)
|
- router mesh → PE_CPU (command path from M_CPU)
|
||||||
|
|
||||||
is modeled at the CUBE level (see ADR-0003 D3).
|
is modeled at the CUBE level (see ADR-0003 D3).
|
||||||
|
|
||||||
|
|||||||
@@ -104,13 +104,13 @@ Kernel Launch routes through M_CPU for PE fan-out.
|
|||||||
```text
|
```text
|
||||||
pcie_ep → io_noc → io_ucie
|
pcie_ep → io_noc → io_ucie
|
||||||
→ [transit cubes: ucie_in → noc → ucie_out] (zero or more)
|
→ [transit cubes: ucie_in → noc → ucie_out] (zero or more)
|
||||||
→ target cube: ucie_in → noc → xbar → hbm_ctrl
|
→ target cube: ucie_in → router mesh → hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
**Memory R/W completion path:**
|
**Memory R/W completion path:**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
hbm_ctrl → xbar → noc → [transit cubes: ucie → noc → ucie]
|
hbm_ctrl → router mesh → [transit cubes: ucie → router mesh → ucie]
|
||||||
→ io_ucie → io_noc → pcie_ep
|
→ io_ucie → io_noc → pcie_ep
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -49,7 +49,7 @@ Memory operations (MemoryWrite, MemoryRead) are routed directly from pcie_ep
|
|||||||
through io_noc to the target cube, bypassing io_cpu entirely:
|
through io_noc to the target cube, bypassing io_cpu entirely:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
pcie_ep → io_noc → conn → io_ucie → [cube UCIe] → noc → xbar → hbm_ctrl
|
pcie_ep → io_noc → conn → io_ucie → [cube UCIe] → router mesh → hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
This avoids the 10ns io_cpu overhead for pure data transfers. The simulation
|
This avoids the 10ns io_cpu overhead for pure data transfers. The simulation
|
||||||
|
|||||||
@@ -16,9 +16,10 @@ architecture.
|
|||||||
|
|
||||||
### D1. NOC node and router grid
|
### D1. NOC node and router grid
|
||||||
|
|
||||||
Each cube contains a single NOC topology node (`sip{S}.cube{C}.noc`)
|
Each cube contains a 2D router mesh generated by `mesh_gen.py`.
|
||||||
implemented as `noc_2d_mesh_v1`. Internally, the NOC models a 2D router
|
Each router is a separate topology node (`sip{S}.cube{C}.r{row}c{col}`)
|
||||||
grid generated by `mesh_gen.py`.
|
implemented as `forwarding_v1`. (Supersedes the original single-node
|
||||||
|
`noc_2d_mesh_v1` design — see ADR-0019.)
|
||||||
|
|
||||||
Grid properties:
|
Grid properties:
|
||||||
|
|
||||||
@@ -82,8 +83,8 @@ PE4.cpu <--+ | | +--< PE6.cpu
|
|||||||
|
|
|
|
||||||
UCIe-S (conn x4)
|
UCIe-S (conn x4)
|
||||||
|
|
||||||
xbar_top attached to: r0c0, r0c1, r1c4, r1c5 (top-half PE routers)
|
HBM attach: PE가 있는 라우터에 hbm_ctrl도 연결 (ADR-0019 D1)
|
||||||
xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
|
(xbar_top/xbar_bot은 ADR-0019에 의해 제거됨)
|
||||||
```
|
```
|
||||||
|
|
||||||
### D5. NOC edge bandwidths and distances
|
### D5. NOC edge bandwidths and distances
|
||||||
@@ -92,8 +93,7 @@ xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
|
|||||||
| --- | --- | --- | --- |
|
| --- | --- | --- | --- |
|
||||||
| PE_DMA -> NOC | 256.0 | Physical (PE pos) | Matches HBM slice BW |
|
| PE_DMA -> NOC | 256.0 | Physical (PE pos) | Matches HBM slice BW |
|
||||||
| NOC -> PE_CPU | - | 0.0 mm | Command path only |
|
| NOC -> PE_CPU | - | 0.0 mm | Command path only |
|
||||||
| NOC <-> xbar_top | 256.0 | 0.0 mm | Per xbar half |
|
| Router <-> HBM_CTRL | 256.0 | 0.0 mm | Per PE router (ADR-0019) |
|
||||||
| NOC <-> xbar_bot | 256.0 | 0.0 mm | Per xbar half |
|
|
||||||
| NOC <-> M_CPU | - | 0.0 mm | Command path |
|
| NOC <-> M_CPU | - | 0.0 mm | Command path |
|
||||||
| NOC <-> SRAM | 128.0 x4 | 0.0 mm | 512 GB/s aggregate |
|
| NOC <-> SRAM | 128.0 x4 | 0.0 mm | 512 GB/s aggregate |
|
||||||
| NOC <-> UCIe conn | 128.0 | 0.0 mm | Per connection, 4 per port |
|
| NOC <-> UCIe conn | 128.0 | 0.0 mm | Per connection, 4 per port |
|
||||||
@@ -117,7 +117,7 @@ Inter-cube traffic path:
|
|||||||
```text
|
```text
|
||||||
Source: PE_DMA -> NOC -> conn{i} -> ucie-{PORT}
|
Source: PE_DMA -> NOC -> conn{i} -> ucie-{PORT}
|
||||||
[UCIe link: 512 GB/s, 1.0mm seam distance]
|
[UCIe link: 512 GB/s, 1.0mm seam distance]
|
||||||
Target: ucie-{PORT} -> conn{i} -> NOC -> xbar -> HBM
|
Target: ucie-{PORT} -> conn{i} -> r{x}c{y} -> (mesh hops) -> hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
UCIe overhead (8.0 ns) is applied at each ucie-{PORT} node, so a
|
UCIe overhead (8.0 ns) is applied at each ucie-{PORT} node, so a
|
||||||
@@ -128,31 +128,31 @@ full crossing incurs 16 ns (TX port + RX port).
|
|||||||
**PE DMA to local HBM (same half):**
|
**PE DMA to local HBM (same half):**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
PE_DMA -> NOC -> xbar_top -> HBM_CTRL.slice{0-3}
|
PE_DMA -> r{x}c{y} -> hbm_ctrl (local: 0 mesh hops, switching overhead only)
|
||||||
```
|
```
|
||||||
|
|
||||||
**PE DMA to cross-half HBM:**
|
**PE DMA to remote PE's HBM:**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
PE_DMA -> NOC -> xbar_top -> bridge -> xbar_bot -> HBM_CTRL.slice{4-7}
|
PE_DMA -> r{x}c{y} -> (mesh hops) -> r{x'}c{y'} -> hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
**PE DMA to remote cube HBM:**
|
**PE DMA to remote cube HBM:**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
PE_DMA -> NOC -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> NOC -> xbar -> HBM
|
PE_DMA -> r{x}c{y} -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> r{x'}c{y'} -> hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
**Kernel Launch command to PE:**
|
**Kernel Launch command to PE:**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
[from io_noc] -> ucie -> conn -> NOC -> M_CPU -> NOC -> PE_CPU
|
[from io_noc] -> ucie -> conn -> r{x}c{y} -> (mesh hops) -> M_CPU -> (mesh hops) -> PE_CPU
|
||||||
```
|
```
|
||||||
|
|
||||||
**Shared SRAM access:**
|
**Shared SRAM access:**
|
||||||
|
|
||||||
```text
|
```text
|
||||||
PE_DMA -> NOC -> SRAM
|
PE_DMA -> r{x}c{y} -> (mesh hops) -> SRAM
|
||||||
```
|
```
|
||||||
|
|
||||||
### D8. Mesh generation
|
### D8. Mesh generation
|
||||||
@@ -169,7 +169,7 @@ The generator produces a `mesh_data` dictionary containing:
|
|||||||
- PE-to-router attachments (pe_dma, pe_cpu per PE)
|
- PE-to-router attachments (pe_dma, pe_cpu per PE)
|
||||||
- UCIe-to-router attachments (N/S/E/W, distributed across edge routers)
|
- UCIe-to-router attachments (N/S/E/W, distributed across edge routers)
|
||||||
- M_CPU and SRAM router attachments
|
- M_CPU and SRAM router attachments
|
||||||
- xbar_top/bot router assignments (top-half vs bottom-half PE routers)
|
- HBM attachment per PE router (ADR-0019)
|
||||||
|
|
||||||
## Consequences
|
## Consequences
|
||||||
|
|
||||||
@@ -182,8 +182,8 @@ The generator produces a `mesh_data` dictionary containing:
|
|||||||
## Links
|
## Links
|
||||||
|
|
||||||
- ADR-0003 D3 (cube-level NOC definition — extended by this ADR)
|
- ADR-0003 D3 (cube-level NOC definition — extended by this ADR)
|
||||||
- ADR-0004 D1 (PE DMA to local HBM path via xbar)
|
- ADR-0004 D1 (PE DMA to local HBM path via router mesh)
|
||||||
- ADR-0004 D3 (cross-half HBM via bridge)
|
- ADR-0014 D1 (PE_DMA egress via router mesh)
|
||||||
- ADR-0014 D1 (PE_DMA dual egress: xbar for HBM, NOC for non-HBM)
|
- ADR-0019 (NOC-Local HBM — xbar/bridge 제거, 명시적 라우터 mesh)
|
||||||
- ADR-0015 D4 (fabric paths for Memory R/W and Kernel Launch)
|
- ADR-0015 D4 (fabric paths for Memory R/W and Kernel Launch)
|
||||||
- ADR-0016 D1 (IOChiplet io_noc — analogous pattern at IO chiplet level)
|
- ADR-0016 D1 (IOChiplet io_noc — analogous pattern at IO chiplet level)
|
||||||
|
|||||||
@@ -247,7 +247,7 @@ simulator의 routing 및 resource 모델에서 직접 사용 가능한 request
|
|||||||
DmaReadCmd.src_addr (VA)
|
DmaReadCmd.src_addr (VA)
|
||||||
→ MMU.translate(VA) → PA
|
→ MMU.translate(VA) → PA
|
||||||
→ PhysAddr.decode(PA) → PhysAddr object
|
→ PhysAddr.decode(PA) → PhysAddr object
|
||||||
→ resolver.resolve(PhysAddr) → dst_node_id (e.g., "sip0.cube0.hbm_ctrl.slice3")
|
→ resolver.resolve(PhysAddr) → dst_node_id (e.g., "sip0.cube0.hbm_ctrl")
|
||||||
→ router.find_path(pe_prefix, dst_node_id) → path
|
→ router.find_path(pe_prefix, dst_node_id) → path
|
||||||
→ 1개 sub-Transaction 생성 → fabric inject
|
→ 1개 sub-Transaction 생성 → fabric inject
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -36,16 +36,14 @@ topology 파라미터로 결정된다.
|
|||||||
|
|
||||||
## Decision
|
## Decision
|
||||||
|
|
||||||
### D1. HBM controller는 CUBE당 단일 endpoint로 정의한다
|
### D1. HBM은 PE 라우터에 attach된다
|
||||||
|
|
||||||
현재의 `hbm_ctrl.slice{0-7}` (8개 노드)를 **`hbm_ctrl` 단일 노드**로 통합한다.
|
현재의 `hbm_ctrl.slice{0-7}` (8개 노드)를 **`hbm_ctrl` 단일 노드**로 통합하고,
|
||||||
|
PE가 attach된 라우터에 HBM access point도 함께 attach한다.
|
||||||
|
|
||||||
- pseudo channel은 HBM controller 노드 자체가 아니라,
|
- n:1 mode: PE의 local HBM 접근은 자기 라우터에서 바로 (switching overhead만, 0 hop)
|
||||||
controller에 연결되는 **link의 단위**로 표현한다
|
- remote PE의 HBM 접근: mesh hop을 거쳐 대상 PE의 라우터에 도달
|
||||||
- HBM controller 내부의 read/write resource 모델은 유지하되,
|
- HBM controller 내부의 read/write resource 모델은 유지
|
||||||
mode에 따라 contention 단위가 달라진다:
|
|
||||||
- 1:1 mode: per-channel link가 BW contention point (controller는 terminal)
|
|
||||||
- n:1 mode: aggregated link가 BW contention point (controller는 terminal)
|
|
||||||
|
|
||||||
노드 네이밍 변경:
|
노드 네이밍 변경:
|
||||||
|
|
||||||
@@ -53,198 +51,127 @@ topology 파라미터로 결정된다.
|
|||||||
| ---- | ------- |
|
| ---- | ------- |
|
||||||
| `sip0.cube0.hbm_ctrl.slice0` ~ `slice7` | `sip0.cube0.hbm_ctrl` (단일) |
|
| `sip0.cube0.hbm_ctrl.slice0` ~ `slice7` | `sip0.cube0.hbm_ctrl` (단일) |
|
||||||
|
|
||||||
|
`mesh_gen.py`에서 PE attachment에 `pe{idx}.hbm`을 추가하여,
|
||||||
|
builder가 해당 라우터와 hbm_ctrl 간 edge를 생성한다.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### D2. xbar, bridge 완전 제거
|
### D2. xbar, bridge, 단일 NOC 노드 완전 제거
|
||||||
|
|
||||||
기존 다음 노드 및 관련 edge를 모두 제거한다:
|
기존 다음 노드 및 관련 edge를 모두 제거한다:
|
||||||
|
|
||||||
- `{cube}.xbar_top`, `{cube}.xbar_bot`
|
- `{cube}.xbar_top`, `{cube}.xbar_bot`
|
||||||
- `{cube}.bridge.left`, `{cube}.bridge.right`
|
- `{cube}.bridge.left`, `{cube}.bridge.right`
|
||||||
|
- `{cube}.noc` (단일 TwoDMeshNocComponent 노드)
|
||||||
- `noc_to_xbar`, `xbar_to_noc`, `xbar_to_hbm`, `hbm_to_xbar` 종류의 edge
|
- `noc_to_xbar`, `xbar_to_noc`, `xbar_to_hbm`, `hbm_to_xbar` 종류의 edge
|
||||||
- `xbar_to_bridge`, `bridge_to_xbar` 종류의 edge
|
- `xbar_to_bridge`, `bridge_to_xbar` 종류의 edge
|
||||||
|
- `pe_to_noc`, `noc_to_pe`, `noc_to_pe_cpu` 등 단일 noc 노드 참조 edge
|
||||||
|
|
||||||
이들의 역할(PE→HBM 라우팅, cross-half 연결)은
|
이들의 역할은 **cube_mesh.yaml 기반의 명시적 라우터 mesh**가 대체한다.
|
||||||
channel router 및 horizontal line 연결이 대체한다 (D3, D4 참조).
|
기존 `mesh_gen.py`가 생성하는 6×6 라우터 grid의 각 라우터(r0c0, r0c1, ...)를
|
||||||
|
별도의 SimPy 노드로 topology graph에 생성하고,
|
||||||
|
인접 라우터 간 XY mesh edge로 연결한다.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### D3. 1:1 mode: per-channel router 기반 연결
|
### D3. 명시적 라우터 mesh (n:1 / 1:1 공통 기반)
|
||||||
|
|
||||||
#### channel router 정의
|
#### cube_mesh.yaml 기반 라우터 노드
|
||||||
|
|
||||||
1:1 mode에서 graph compiler는 pseudo-channel 수만큼의 **channel router** 노드를
|
`mesh_gen.py`가 생성한 cube_mesh.yaml의 각 non-null 라우터를
|
||||||
생성한다. channel router는 NOC의 일부이다.
|
topology graph의 **별도 SimPy 노드**로 생성한다.
|
||||||
|
|
||||||
```text
|
- 노드 ID: `{cube}.r{row}c{col}` (e.g., `sip0.cube0.r0c0`)
|
||||||
파라미터 예: hbm_pseudo_channels=64, pes_per_cube=8
|
- kind: `noc_router`, impl: `forwarding_v1`
|
||||||
→ channels_per_pe = 8, 총 64개 channel router 생성
|
- pos_mm: cube_mesh.yaml에서 가져옴
|
||||||
```
|
|
||||||
|
|
||||||
노드 네이밍: `{cube}.ch_r{global_channel_id}`
|
기존 cube_mesh.yaml의 attach 정보에 따라 각 라우터에 component를 연결:
|
||||||
|
- `pe{p}.dma` → PE_DMA ↔ 라우터 edge
|
||||||
|
- `pe{p}.cpu` → PE_CPU ↔ 라우터 edge
|
||||||
|
- `pe{p}.hbm` → HBM_CTRL ↔ 라우터 edge (n:1에서 추가)
|
||||||
|
- `m_cpu` → M_CPU ↔ 라우터 edge
|
||||||
|
- `sram` → SRAM ↔ 라우터 edge
|
||||||
|
- `ucie_{dir}.c{i}` → UCIe conn ↔ 라우터 edge
|
||||||
|
|
||||||
| PE | 소유 channel routers |
|
라우터 간 XY mesh edge: 인접 라우터 간 bidirectional edge.
|
||||||
| -- | -------------------- |
|
null 라우터(HBM exclusion zone)는 skip.
|
||||||
| PE0 | ch_r0, ch_r1, ..., ch_r7 |
|
|
||||||
| PE1 | ch_r8, ch_r9, ..., ch_r15 |
|
|
||||||
| ... | ... |
|
|
||||||
| PE7 | ch_r56, ch_r57, ..., ch_r63 |
|
|
||||||
|
|
||||||
일반화: PE `p`는 channel `p * channels_per_pe` ~ `(p+1) * channels_per_pe - 1`을 소유.
|
#### 1:1 mode 확장 (나중에 구현)
|
||||||
|
|
||||||
#### PE_DMA ↔ channel router 연결
|
1:1 mode에서는 각 라우터가 N개 channel mini-router로 분화된다.
|
||||||
|
per-channel routing과 ChannelSplitter (LA → per-channel PA) 도입이 필요.
|
||||||
각 PE_DMA는 자신의 local channel router N개와 양방향 link로 연결된다:
|
PE당 N개 GEMM engine도 이 시점에 추가.
|
||||||
|
|
||||||
```text
|
|
||||||
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.ch_r0 (bw: channel_bw_gbs)
|
|
||||||
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.ch_r1 (bw: channel_bw_gbs)
|
|
||||||
...
|
|
||||||
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.ch_r7 (bw: channel_bw_gbs)
|
|
||||||
```
|
|
||||||
|
|
||||||
- edge kind: `pe_to_ch_router` / `ch_router_to_pe`
|
|
||||||
- BW: `hbm_channel_bw_gbs` (e.g., 32 GB/s)
|
|
||||||
- distance: PE에서 channel router까지의 물리적 거리 (layout 기반)
|
|
||||||
|
|
||||||
#### channel router ↔ HBM controller 연결
|
|
||||||
|
|
||||||
각 channel router는 cube의 hbm_ctrl과 양방향 link로 연결된다:
|
|
||||||
|
|
||||||
```text
|
|
||||||
sip0.cube0.ch_r0 ←→ sip0.cube0.hbm_ctrl (bw: channel_bw_gbs)
|
|
||||||
sip0.cube0.ch_r1 ←→ sip0.cube0.hbm_ctrl (bw: channel_bw_gbs)
|
|
||||||
...
|
|
||||||
sip0.cube0.ch_r63 ←→ sip0.cube0.hbm_ctrl (bw: channel_bw_gbs)
|
|
||||||
```
|
|
||||||
|
|
||||||
- edge kind: `ch_router_to_hbm` / `hbm_to_ch_router`
|
|
||||||
- BW: `hbm_channel_bw_gbs` (e.g., 32 GB/s)
|
|
||||||
|
|
||||||
#### 1:1 mode 전체 데이터 경로
|
|
||||||
|
|
||||||
```text
|
|
||||||
PE0.pe_dma
|
|
||||||
├→ ch_r0 → hbm_ctrl (32 GB/s)
|
|
||||||
├→ ch_r1 → hbm_ctrl (32 GB/s)
|
|
||||||
├→ ...
|
|
||||||
└→ ch_r7 → hbm_ctrl (32 GB/s)
|
|
||||||
총 PE0 local BW = N × channel_bw_gbs
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### D4. 1:1 mode: horizontal line 연결 (cross-PE channel 접근)
|
### D4. cross-PE HBM 접근 (n:1 mode)
|
||||||
|
|
||||||
#### 배치 규칙
|
n:1 mode에서 PE가 다른 PE의 local HBM에 접근하는 경우,
|
||||||
|
cube_mesh.yaml의 XY mesh를 통해 대상 PE의 라우터까지 hop한다.
|
||||||
|
|
||||||
같은 **logical index**를 가지는 channel router들을 동일한 horizontal row에 배치한다.
|
예: PE0(r0c0)이 PE2(r1c4)의 HBM에 접근:
|
||||||
|
|
||||||
logical index 정의: `logical_idx = global_channel_id % channels_per_pe`
|
|
||||||
|
|
||||||
```text
|
```text
|
||||||
파라미터 예: channels_per_pe=8, pes_per_cube=8
|
PE0.pe_dma → r0c0 → r0c1 → r0c2 → r0c3 → r0c4 → r1c4 → hbm_ctrl
|
||||||
|
|
||||||
Row 0: ch_r0 (PE0) ↔ ch_r8 (PE1) ↔ ch_r16 (PE2) ↔ ... ↔ ch_r56 (PE7)
|
|
||||||
Row 1: ch_r1 (PE0) ↔ ch_r9 (PE1) ↔ ch_r17 (PE2) ↔ ... ↔ ch_r57 (PE7)
|
|
||||||
Row 2: ch_r2 (PE0) ↔ ch_r10 (PE1) ↔ ch_r18 (PE2) ↔ ... ↔ ch_r58 (PE7)
|
|
||||||
...
|
|
||||||
Row 7: ch_r7 (PE0) ↔ ch_r15 (PE1) ↔ ch_r23 (PE2) ↔ ... ↔ ch_r63 (PE7)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
일반화: Row `r`에는 `{ch_r(p * N + r) | p ∈ 0..pes_per_cube-1}`이 위치.
|
Dijkstra router가 mesh에서 최단 경로를 탐색한다.
|
||||||
여기서 `N = channels_per_pe`.
|
|
||||||
|
|
||||||
#### horizontal line edge
|
1:1 mode에서의 cross-PE channel 접근은 D3의 1:1 확장 시 정의한다.
|
||||||
|
|
||||||
같은 row에서 인접한 channel router끼리 양방향 edge로 연결:
|
|
||||||
|
|
||||||
```text
|
|
||||||
ch_r0 ↔ ch_r8 ↔ ch_r16 ↔ ... ↔ ch_r56
|
|
||||||
```
|
|
||||||
|
|
||||||
- edge kind: `ch_horizontal`
|
|
||||||
- BW: `hbm_channel_bw_gbs` (or configurable inter-PE channel BW)
|
|
||||||
- distance: PE 간 물리적 거리
|
|
||||||
|
|
||||||
#### cross-PE HBM 접근 경로 (1:1 mode)
|
|
||||||
|
|
||||||
PE0이 PE1의 local channel (ch_r8)에 접근하는 경우:
|
|
||||||
|
|
||||||
```text
|
|
||||||
PE0.pe_dma → ch_r0 → ch_r8 (horizontal hop) → hbm_ctrl
|
|
||||||
```
|
|
||||||
|
|
||||||
Dijkstra router가 horizontal line을 통해 최단 경로를 탐색한다.
|
|
||||||
|
|
||||||
#### 설계 의도
|
|
||||||
|
|
||||||
이 배치 규칙은:
|
|
||||||
|
|
||||||
- routing 규칙 단순화: horizontal = cross-PE, vertical = PE-local
|
|
||||||
- 거리 계산 단순화: row 내 hop 수 = |src_pe - dst_pe|
|
|
||||||
- 구조적 반복성 확보: 모든 row가 동일한 구조
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### D5. n:1 mode: aggregated router 기반 연결
|
### D5. n:1 mode: cube_mesh.yaml 라우터 mesh 사용
|
||||||
|
|
||||||
#### aggregated router 정의
|
n:1 mode에서는 별도의 "aggregated router"를 생성하지 않는다.
|
||||||
|
기존 cube_mesh.yaml의 라우터 grid가 그 역할을 한다.
|
||||||
n:1 mode에서 graph compiler는 PE당 1개의 **aggregated router** 노드를 생성한다.
|
|
||||||
aggregated router는 NOC의 일부이다.
|
|
||||||
|
|
||||||
노드 네이밍: `{cube}.pe{p}.agg_router`
|
|
||||||
|
|
||||||
#### 연결 구조
|
#### 연결 구조
|
||||||
|
|
||||||
```text
|
각 PE가 attach된 라우터에 PE_DMA, PE_CPU, HBM이 함께 연결된다:
|
||||||
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.pe0.agg_router (bw: N × channel_bw_gbs)
|
|
||||||
sip0.cube0.pe0.agg_router ←→ sip0.cube0.hbm_ctrl (bw: N × channel_bw_gbs)
|
|
||||||
```
|
|
||||||
|
|
||||||
- edge kind: `pe_to_agg_router` / `agg_router_to_pe`, `agg_to_hbm` / `hbm_to_agg`
|
|
||||||
- BW: `channels_per_pe × hbm_channel_bw_gbs` (e.g., 8 × 32 = 256 GB/s)
|
|
||||||
|
|
||||||
#### cross-PE 접근 (n:1 mode)
|
|
||||||
|
|
||||||
PE0이 PE1의 local HBM에 접근하는 경우:
|
|
||||||
|
|
||||||
```text
|
```text
|
||||||
PE0.pe_dma → PE0.agg_router → PE1.agg_router → hbm_ctrl
|
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.r0c0 (bw: N × channel_bw_gbs)
|
||||||
|
sip0.cube0.hbm_ctrl ←→ sip0.cube0.r0c0 (bw: N × channel_bw_gbs)
|
||||||
```
|
```
|
||||||
|
|
||||||
aggregated router 간 연결:
|
라우터 간 XY mesh edge로 연결. PE의 local HBM 접근은
|
||||||
|
자기 라우터에서 바로 (switching overhead만).
|
||||||
```text
|
|
||||||
pe0.agg_router ↔ pe1.agg_router ↔ pe2.agg_router ↔ ... ↔ pe7.agg_router
|
|
||||||
```
|
|
||||||
|
|
||||||
- edge kind: `agg_horizontal`
|
|
||||||
- BW: configurable (inter-PE aggregated BW)
|
|
||||||
|
|
||||||
#### n:1 mode 전체 데이터 경로
|
#### n:1 mode 전체 데이터 경로
|
||||||
|
|
||||||
|
**local HBM (0 hop):**
|
||||||
```text
|
```text
|
||||||
PE0.pe_dma → PE0.agg_router → hbm_ctrl
|
PE0.pe_dma → r0c0 → hbm_ctrl (switching overhead only)
|
||||||
(BW = N × channel_bw_gbs = 256 GB/s)
|
```
|
||||||
|
|
||||||
|
**remote HBM (mesh hops):**
|
||||||
|
```text
|
||||||
|
PE0.pe_dma → r0c0 → r0c1 → ... → r1c4 → hbm_ctrl
|
||||||
|
```
|
||||||
|
|
||||||
|
**M_CPU DMA:**
|
||||||
|
```text
|
||||||
|
M_CPU → r2c0 → (mesh hops) → r{x}c{y} → hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### D6. local / remote access를 NOC로 통일한다
|
### D6. 모든 트래픽을 동일 router mesh로 통일한다
|
||||||
|
|
||||||
- 모든 memory access는 NOC(channel router 또는 aggregated router)를 통해 전달된다
|
- 모든 memory access (DMA data)와 command (PE_CPU)가 동일 router mesh를 사용한다
|
||||||
- local access도 별도의 fast path(xbar)를 사용하지 않는다
|
- local access도 별도의 fast path(xbar)를 사용하지 않는다
|
||||||
- cross-cube (remote) access 경로:
|
- cross-cube (remote) access 경로:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
1:1 mode: PE_DMA → ch_r{local} → ch_r{...} → UCIe → remote_ch_r → remote_hbm_ctrl
|
PE_DMA → r{x}c{y} → (mesh hops) → ucie_conn → ucie-{PORT}
|
||||||
n:1 mode: PE_DMA → agg_router → UCIe → remote_agg_router → remote_hbm_ctrl
|
→ [UCIe link] → remote ucie → remote conn → remote r{x}c{y} → hbm_ctrl
|
||||||
```
|
```
|
||||||
|
|
||||||
UCIe 연결은 기존 구조를 유지하되,
|
UCIe 연결은 기존 구조를 유지하되,
|
||||||
양쪽 endpoint가 xbar 대신 channel router 또는 aggregated router가 된다.
|
양쪽 endpoint가 xbar 대신 mesh 라우터가 된다.
|
||||||
|
|
||||||
|
UCIe line 수는 BW 비율로 결정: `ucie_lines_per_side = ceil(ucie_bw / noc_line_bw)`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -266,9 +193,7 @@ return f"sip{s}.cube{c}.hbm_ctrl"
|
|||||||
```
|
```
|
||||||
|
|
||||||
pe_slice 계산이 제거된다.
|
pe_slice 계산이 제거된다.
|
||||||
BAAW가 이미 dst_node를 결정하므로, PE_DMA의 1:1 mode에서는
|
n:1 mode에서 PE_DMA는 자기 라우터에 attach된 hbm_ctrl에 직접 접근한다.
|
||||||
resolver를 거치지 않고 BAAW가 직접 channel router node_id를 반환한다.
|
|
||||||
n:1 mode에서도 BAAW가 aggregated router node_id를 반환한다.
|
|
||||||
|
|
||||||
resolver.resolve()는 외부 접근(M_CPU DMA 등) 및 backward compatibility용으로 유지한다.
|
resolver.resolve()는 외부 접근(M_CPU DMA 등) 및 backward compatibility용으로 유지한다.
|
||||||
|
|
||||||
@@ -305,16 +230,10 @@ links:
|
|||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
links:
|
links:
|
||||||
pe_to_ch_router_bw_gbs: 32.0 # PE_DMA ↔ channel router
|
router_link_bw_gbs: 256.0 # 라우터 간 XY mesh link BW
|
||||||
pe_to_ch_router_mm: 1.0 # 물리적 거리
|
router_overhead_ns: 2.0 # 라우터 switching overhead
|
||||||
ch_router_to_hbm_bw_gbs: 32.0 # channel router ↔ hbm_ctrl
|
pe_to_router_bw_gbs: 256.0 # PE_DMA ↔ 라우터
|
||||||
ch_router_to_hbm_mm: 2.0 # 물리적 거리
|
hbm_to_router_bw_gbs: 256.0 # HBM ↔ 라우터 (= N × channel_bw)
|
||||||
ch_horizontal_bw_gbs: 32.0 # channel router 간 horizontal link
|
|
||||||
ch_horizontal_mm: 1.5 # PE 간 horizontal 거리
|
|
||||||
# n:1 mode용
|
|
||||||
pe_to_agg_router_bw_gbs: 256.0 # PE_DMA ↔ aggregated router
|
|
||||||
agg_to_hbm_bw_gbs: 256.0 # aggregated router ↔ hbm_ctrl
|
|
||||||
agg_horizontal_bw_gbs: 256.0 # aggregated router 간 link
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -341,19 +260,18 @@ links:
|
|||||||
|
|
||||||
### Positive
|
### Positive
|
||||||
|
|
||||||
- 1:1 mode에서 pseudo-channel 단위 BW contention 모델링이 자연스럽다
|
- cube_mesh.yaml 기반 라우터 mesh로 물리적 배치를 정확히 반영한다
|
||||||
- n:1 mode에서 aggregated bandwidth 모델이 단순하다
|
- n:1 mode에서 기존 VA 체계를 유지하여 전환 비용이 낮다
|
||||||
- local / remote access 경로가 NOC로 통일된다
|
- local / remote / command 트래픽이 동일 mesh로 통일되어 단순하다
|
||||||
- graph compiler 기반 topology 생성과 잘 맞는다
|
- graph compiler 기반 topology 생성과 잘 맞는다
|
||||||
- channel 수, PE 수가 모두 파라미터이므로 다양한 구성을 테스트할 수 있다
|
- channel 수, PE 수가 모두 파라미터이므로 다양한 구성을 테스트할 수 있다
|
||||||
|
- 1:1 mode 확장이 라우터 분화로 자연스럽게 가능하다
|
||||||
|
|
||||||
### Negative
|
### Negative
|
||||||
|
|
||||||
- 1:1 mode에서 router 및 link 수가 크게 증가한다
|
- 명시적 라우터 노드로 인해 SimPy 노드 수가 증가한다 (6×6 = 최대 32개 라우터/cube)
|
||||||
(64 channel routers + 64 edges to HBM + 56 horizontal edges per cube)
|
- 기존 xbar/bridge/단일 NOC 기반 테스트 전면 재작성 필요
|
||||||
- local access도 NOC 경로를 사용하므로 모델이 더 일반화된다
|
- TwoDMeshNocComponent의 내부 contention 모델을 라우터별 모델로 교체 필요
|
||||||
- 기존 xbar 기반 테스트 전면 재작성 필요
|
|
||||||
- SimPy 노드 수 증가에 따른 시뮬레이션 성능 영향 가능
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,156 +1,312 @@
|
|||||||
<svg xmlns="http://www.w3.org/2000/svg" width="556" height="472" viewBox="0 0 556 472">
|
<svg xmlns="http://www.w3.org/2000/svg" width="970" height="900" viewBox="0 0 970 900">
|
||||||
<title>cube</title>
|
<title>cube</title>
|
||||||
<rect width="556" height="472" fill="#f8fafc"/>
|
<rect width="970" height="900" fill="#0f172a"/>
|
||||||
<text x="278" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">CUBE VIEW</text>
|
<text x="485" y="22" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#94a3b8">CUBE TOPOLOGY — 17.0×14.0mm | 6×6 Router Mesh | n_to_one mode | 64 pseudo-ch</text>
|
||||||
<rect x="40.0" y="40.0" width="476.0" height="392.0" rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>
|
<text x="485" y="40" text-anchor="middle" font-family="monospace" font-size="10" fill="#64748b">Per-PE: 8 ch × 32.0 GB/s = 256.0 GB/s | Cube total: 64 × 32.0 = 2048.0 GB/s</text>
|
||||||
<rect x="152.0" y="166.0" width="252.0" height="140.0" rx="4" fill="#d1fae5" stroke="#10b981" stroke-width="1.5" stroke-dasharray="6,3" opacity="0.5"/>
|
<rect x="60" y="60" width="850.0" height="700.0" rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>
|
||||||
<text x="278.0" y="278.0" text-anchor="middle" font-family="monospace" font-size="11" fill="#047857" opacity="0.7">HBM</text>
|
<rect x="260" y="285" width="450" height="250" rx="6" fill="#052e16" stroke="#047857" stroke-width="2" opacity="0.6"/>
|
||||||
<polyline points="82.0,82.0 82.0,95.0 82.0,95.0 82.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<text x="485" y="395" text-anchor="middle" font-family="monospace" font-size="11" font-weight="bold" fill="#047857">HBM_CTRL | 64 pseudo channels</text>
|
||||||
<text x="82.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<text x="485" y="412" text-anchor="middle" font-family="monospace" font-size="9" fill="#05966988">Total BW: 2048 GB/s</text>
|
||||||
<polyline points="82.0,82.0 82.0,144.0 334.0,144.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="270.0" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,144.0 82.0,144.0 82.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="283.4" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<polyline points="166.0,82.0 166.0,95.0 166.0,95.0 166.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="296.9" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<text x="166.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="310.3" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<polyline points="166.0,82.0 166.0,154.0 334.0,154.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="323.8" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,144.0 166.0,144.0 166.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="337.2" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<polyline points="390.0,82.0 390.0,95.0 390.0,95.0 390.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="350.6" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<text x="390.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="364.1" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
|
||||||
<polyline points="390.0,82.0 390.0,164.0 334.0,164.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="377.5" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,144.0 390.0,144.0 390.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="390.9" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<polyline points="474.0,82.0 474.0,95.0 474.0,95.0 474.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="404.4" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<text x="474.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="417.8" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<polyline points="474.0,82.0 474.0,174.0 334.0,174.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="431.2" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,144.0 474.0,144.0 474.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="444.7" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<polyline points="82.0,390.0 82.0,347.0 82.0,347.0 82.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="458.1" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<text x="82.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="471.6" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
|
||||||
<polyline points="82.0,390.0 82.0,338.0 334.0,338.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="485.0" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,298.0 82.0,298.0 82.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="498.4" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<polyline points="166.0,390.0 166.0,347.0 166.0,347.0 166.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="511.9" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<text x="166.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="525.3" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<polyline points="166.0,390.0 166.0,348.0 334.0,348.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="538.8" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,298.0 166.0,298.0 166.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="552.2" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<polyline points="390.0,390.0 390.0,347.0 390.0,347.0 390.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="565.6" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<text x="390.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="579.1" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
|
||||||
<polyline points="390.0,390.0 390.0,358.0 334.0,358.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="592.5" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,298.0 390.0,298.0 390.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="605.9" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<polyline points="474.0,390.0 474.0,347.0 474.0,347.0 474.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
<rect x="619.4" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<text x="474.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
<rect x="632.8" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<polyline points="474.0,390.0 474.0,368.0 334.0,368.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<rect x="646.2" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<polyline points="334.0,236.0 334.0,298.0 474.0,298.0 474.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<rect x="659.7" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<polyline points="82.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<rect x="673.1" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<text x="152.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<rect x="686.6" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
|
||||||
<polyline points="166.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<text x="324" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#3b82f6">PE0×8ch</text>
|
||||||
<text x="194.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<text x="431" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#60a5fa">PE1×8ch</text>
|
||||||
<polyline points="390.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<text x="539" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#8b5cf6">PE2×8ch</text>
|
||||||
<text x="306.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<text x="646" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#a78bfa">PE3×8ch</text>
|
||||||
<polyline points="474.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<rect x="270.0" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<text x="348.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<rect x="283.4" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<polyline points="82.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<rect x="296.9" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<text x="152.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<rect x="310.3" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<polyline points="166.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<rect x="323.8" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<text x="194.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<rect x="337.2" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<polyline points="390.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<rect x="350.6" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<text x="306.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<rect x="364.1" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
|
||||||
<polyline points="474.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
<rect x="377.5" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<text x="348.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
<rect x="390.9" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<line x1="82.0" y1="138.0" x2="166.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="404.4" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<text x="124.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="417.8" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<line x1="166.0" y1="138.0" x2="82.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="431.2" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<text x="124.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="444.7" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<line x1="166.0" y1="138.0" x2="390.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="458.1" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<text x="278.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
<rect x="471.6" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
|
||||||
<line x1="390.0" y1="138.0" x2="166.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="485.0" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<text x="278.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
<rect x="498.4" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<line x1="390.0" y1="138.0" x2="474.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="511.9" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<text x="432.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="525.3" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<line x1="474.0" y1="138.0" x2="390.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="538.8" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<text x="432.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="552.2" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<line x1="82.0" y1="334.0" x2="166.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="565.6" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<text x="124.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="579.1" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
|
||||||
<line x1="166.0" y1="334.0" x2="82.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="592.5" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<text x="124.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="605.9" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<line x1="166.0" y1="334.0" x2="390.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="619.4" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<text x="278.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
<rect x="632.8" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<line x1="390.0" y1="334.0" x2="166.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="646.2" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<text x="278.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
<rect x="659.7" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<line x1="390.0" y1="334.0" x2="474.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<rect x="673.1" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<text x="432.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<rect x="686.6" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
|
||||||
<line x1="474.0" y1="334.0" x2="390.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
<text x="324" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#f59e0b">PE4×8ch</text>
|
||||||
<text x="432.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
<text x="431" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#fbbf24">PE5×8ch</text>
|
||||||
<polyline points="82.0,138.0 110.0,138.0 110.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<text x="539" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#ef4444">PE6×8ch</text>
|
||||||
<text x="96.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<text x="646" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#f87171">PE7×8ch</text>
|
||||||
<polyline points="110.0,292.0 82.0,292.0 82.0,138.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="135" y1="135" x2="285" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="96.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="135" y1="135" x2="135" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="82.0,334.0 110.0,334.0 110.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="285" y1="135" x2="435" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="96.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="285" y1="135" x2="285" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="110.0,292.0 82.0,292.0 82.0,334.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="435" y1="135" x2="585" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="96.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="435" y1="135" x2="435" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="474.0,138.0 446.0,138.0 446.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="585" y1="135" x2="685" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="460.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="585" y1="135" x2="585" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="446.0,292.0 474.0,292.0 474.0,138.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="685" y1="135" x2="835" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="460.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="685" y1="135" x2="685" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="474.0,334.0 446.0,334.0 446.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="835" y1="135" x2="835" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="460.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="135" y1="260" x2="285" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="446.0,292.0 474.0,292.0 474.0,334.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
<line x1="135" y1="260" x2="135" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="460.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
<line x1="285" y1="260" x2="435" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="334.0,236.0 334.0,131.4 278.0,131.4 278.0,56.8" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
<line x1="285" y1="260" x2="285" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="334.0,236.0 334.0,310.6 278.0,310.6 278.0,415.2" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
<line x1="435" y1="260" x2="585" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="334.0,236.0 334.0,221.0 488.0,221.0 488.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
<line x1="435" y1="260" x2="435" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="334.0,236.0 334.0,221.0 68.0,221.0 68.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
<line x1="585" y1="260" x2="685" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="446.0,194.0 446.0,200.0 334.0,200.0 334.0,236.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<line x1="585" y1="260" x2="585" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="334.0,236.0 334.0,200.0 446.0,200.0 446.0,194.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
<line x1="685" y1="260" x2="835" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="334.0,236.0 110.0,236.0 110.0,194.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.8"/>
|
<line x1="685" y1="260" x2="685" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<polyline points="110.0,194.0 334.0,194.0 334.0,236.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.8"/>
|
<line x1="835" y1="260" x2="835" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="250.0" y="40.0" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
<line x1="135" y1="335" x2="285" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="278.0" y="60.8" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-N</text>
|
<line x1="135" y1="335" x2="135" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="250.0" y="398.4" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
<line x1="285" y1="335" x2="685" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="278.0" y="419.2" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-S</text>
|
<line x1="285" y1="335" x2="285" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="460.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
<line x1="685" y1="335" x2="835" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="488.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-E</text>
|
<line x1="685" y1="335" x2="685" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="40.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
<line x1="835" y1="335" x2="835" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="68.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-W</text>
|
<line x1="135" y1="485" x2="285" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="306.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#a78bfa" stroke="#475569" stroke-width="1"/>
|
<line x1="135" y1="485" x2="135" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="334.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">NOC</text>
|
<line x1="285" y1="485" x2="685" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="418.0" y="177.2" width="56.0" height="33.6" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
|
<line x1="285" y1="485" x2="285" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="446.0" y="198.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">M CPU</text>
|
<line x1="685" y1="485" x2="835" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="194.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
|
<line x1="685" y1="485" x2="685" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="222.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#ffffff">HBM CTRL</text>
|
<line x1="835" y1="485" x2="835" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="82.0" y="177.2" width="56.0" height="33.6" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
|
<line x1="135" y1="560" x2="285" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="110.0" y="198.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">SRAM</text>
|
<line x1="135" y1="560" x2="135" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="82.0" y="275.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<line x1="285" y1="560" x2="435" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="110.0" y="296.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">Bridge LEFT</text>
|
<line x1="285" y1="560" x2="285" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="418.0" y="275.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<line x1="435" y1="560" x2="585" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="446.0" y="296.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">Bridge RIGHT</text>
|
<line x1="435" y1="560" x2="435" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="56.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<line x1="585" y1="560" x2="685" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="82.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE0</text>
|
<line x1="585" y1="560" x2="585" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="54.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<line x1="685" y1="560" x2="835" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="82.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE0</text>
|
<line x1="685" y1="560" x2="685" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="140.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<line x1="835" y1="560" x2="835" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="166.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE1</text>
|
<line x1="135" y1="685" x2="285" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="138.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<line x1="285" y1="685" x2="435" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="166.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE1</text>
|
<line x1="435" y1="685" x2="585" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="364.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<line x1="585" y1="685" x2="685" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<text x="390.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE2</text>
|
<line x1="685" y1="685" x2="835" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
|
||||||
<rect x="362.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<circle cx="135" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
<text x="390.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE2</text>
|
<text x="135" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c0</text>
|
||||||
<rect x="448.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<rect x="119" y="81" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
<text x="474.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE3</text>
|
<text x="135" y="92" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE0</text>
|
||||||
<rect x="446.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<line x1="135" y1="127" x2="149" y2="97" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
<text x="474.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE3</text>
|
<circle cx="285" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
<rect x="56.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<text x="285" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c1</text>
|
||||||
<text x="82.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE4</text>
|
<rect x="269" y="81" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
<rect x="54.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<text x="285" y="92" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE1</text>
|
||||||
<text x="82.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE4</text>
|
<line x1="285" y1="127" x2="299" y2="97" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
<rect x="140.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<circle cx="435" cy="135" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="166.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE5</text>
|
<text x="435" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c2</text>
|
||||||
<rect x="138.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<circle cx="585" cy="135" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="166.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE5</text>
|
<text x="585" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c3</text>
|
||||||
<rect x="364.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<circle cx="685" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
<text x="390.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE6</text>
|
<text x="685" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c4</text>
|
||||||
<rect x="362.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<circle cx="835" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
<text x="390.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE6</text>
|
<text x="835" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c5</text>
|
||||||
<rect x="448.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
<circle cx="135" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
<text x="474.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE7</text>
|
<text x="135" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c0</text>
|
||||||
<rect x="446.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
<circle cx="285" cy="260" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="474.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE7</text>
|
<text x="285" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c1</text>
|
||||||
|
<circle cx="435" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="435" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c2</text>
|
||||||
|
<rect x="419" y="206" width="32" height="16" rx="3" fill="#451a03" stroke="#f59e0b" stroke-width="1"/>
|
||||||
|
<text x="435" y="217" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#f59e0b">M_CPU</text>
|
||||||
|
<line x1="435" y1="252" x2="449" y2="222" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="585" cy="260" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="585" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c3</text>
|
||||||
|
<circle cx="685" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="685" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c4</text>
|
||||||
|
<rect x="669" y="206" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
|
<text x="685" y="217" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE2</text>
|
||||||
|
<line x1="685" y1="252" x2="699" y2="222" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="835" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="835" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c5</text>
|
||||||
|
<rect x="819" y="206" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
|
<text x="835" y="217" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE3</text>
|
||||||
|
<line x1="835" y1="252" x2="849" y2="222" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="135" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="135" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c0</text>
|
||||||
|
<circle cx="285" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="285" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c1</text>
|
||||||
|
<circle cx="685" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="685" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c4</text>
|
||||||
|
<circle cx="835" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="835" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c5</text>
|
||||||
|
<circle cx="135" cy="485" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="135" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c0</text>
|
||||||
|
<rect x="119" y="523" width="32" height="16" rx="3" fill="#1c1917" stroke="#d97706" stroke-width="1"/>
|
||||||
|
<text x="135" y="534" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#d97706">SRAM</text>
|
||||||
|
<line x1="135" y1="493" x2="149" y2="523" stroke="#d97706" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="285" cy="485" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="285" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c1</text>
|
||||||
|
<circle cx="685" cy="485" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="685" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c4</text>
|
||||||
|
<circle cx="835" cy="485" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="835" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c5</text>
|
||||||
|
<circle cx="135" cy="560" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="135" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c0</text>
|
||||||
|
<rect x="119" y="598" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
|
<text x="135" y="609" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE4</text>
|
||||||
|
<line x1="135" y1="568" x2="149" y2="598" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="285" cy="560" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="285" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c1</text>
|
||||||
|
<rect x="269" y="598" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
|
<text x="285" y="609" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE5</text>
|
||||||
|
<line x1="285" y1="568" x2="299" y2="598" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="435" cy="560" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="435" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c2</text>
|
||||||
|
<circle cx="585" cy="560" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="585" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c3</text>
|
||||||
|
<circle cx="685" cy="560" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="685" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c4</text>
|
||||||
|
<circle cx="835" cy="560" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="835" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c5</text>
|
||||||
|
<circle cx="135" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="135" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c0</text>
|
||||||
|
<circle cx="285" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="285" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c1</text>
|
||||||
|
<circle cx="435" cy="685" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="435" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c2</text>
|
||||||
|
<circle cx="585" cy="685" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="585" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c3</text>
|
||||||
|
<circle cx="685" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="685" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c4</text>
|
||||||
|
<rect x="669" y="723" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
|
<text x="685" y="734" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE6</text>
|
||||||
|
<line x1="685" y1="693" x2="699" y2="723" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
|
<circle cx="835" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
|
||||||
|
<text x="835" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c5</text>
|
||||||
|
<rect x="819" y="723" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
|
||||||
|
<text x="835" y="734" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE7</text>
|
||||||
|
<line x1="835" y1="693" x2="849" y2="723" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
|
||||||
|
<polyline points="135,143 208,216 251,216 324,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="239" y="216" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="285,143 358,216 358,216 431,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="368" y="216" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="685,268 674,278 549,278 539,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="622" y="278" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="835,268 824,278 657,278 646,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="751" y="278" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="135,552 146,542 313,542 324,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="239" y="542" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="285,552 296,542 421,542 431,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="368" y="542" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="685,677 612,604 612,604 539,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="622" y="604" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<polyline points="835,677 762,604 719,604 646,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
|
||||||
|
<text x="751" y="604" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
|
||||||
|
<rect x="65" y="360" width="50" height="100" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
|
||||||
|
<text x="90" y="357" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-W</text>
|
||||||
|
<rect x="67" y="362" width="46" height="23" rx="2" fill="#818cf8" opacity="0.7"/>
|
||||||
|
<text x="90" y="376" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
|
||||||
|
<polyline points="127,135 120,142 120,366 113,374" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="67" y="386" width="46" height="23" rx="2" fill="#a78bfa" opacity="0.7"/>
|
||||||
|
<text x="90" y="400" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
|
||||||
|
<polyline points="127,260 120,267 120,390 113,398" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="67" y="410" width="46" height="23" rx="2" fill="#c084fc" opacity="0.7"/>
|
||||||
|
<text x="90" y="424" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
|
||||||
|
<polyline points="127,560 120,553 120,428 113,422" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="67" y="434" width="46" height="23" rx="2" fill="#e879f9" opacity="0.7"/>
|
||||||
|
<text x="90" y="448" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
|
||||||
|
<polyline points="127,685 120,678 120,452 113,446" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="435" y="65" width="100" height="50" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
|
||||||
|
<text x="485" y="62" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-N</text>
|
||||||
|
<rect x="437" y="67" width="23" height="46" rx="2" fill="#818cf8" opacity="0.7"/>
|
||||||
|
<text x="448" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
|
||||||
|
<polyline points="135,127 142,120 442,120 448,113" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="461" y="67" width="23" height="46" rx="2" fill="#a78bfa" opacity="0.7"/>
|
||||||
|
<text x="472" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
|
||||||
|
<polyline points="285,127 292,120 466,120 472,113" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="485" y="67" width="23" height="46" rx="2" fill="#c084fc" opacity="0.7"/>
|
||||||
|
<text x="496" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
|
||||||
|
<polyline points="685,127 678,120 504,120 496,113" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="509" y="67" width="23" height="46" rx="2" fill="#e879f9" opacity="0.7"/>
|
||||||
|
<text x="520" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
|
||||||
|
<polyline points="835,127 828,120 528,120 520,113" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="855" y="360" width="50" height="100" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
|
||||||
|
<text x="880" y="357" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-E</text>
|
||||||
|
<rect x="857" y="362" width="46" height="23" rx="2" fill="#818cf8" opacity="0.7"/>
|
||||||
|
<text x="880" y="376" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
|
||||||
|
<polyline points="843,135 850,142 850,367 857,374" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="857" y="386" width="46" height="23" rx="2" fill="#a78bfa" opacity="0.7"/>
|
||||||
|
<text x="880" y="400" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
|
||||||
|
<polyline points="843,260 850,267 850,391 857,398" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="857" y="410" width="46" height="23" rx="2" fill="#c084fc" opacity="0.7"/>
|
||||||
|
<text x="880" y="424" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
|
||||||
|
<polyline points="843,560 850,553 850,428 857,422" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="857" y="434" width="46" height="23" rx="2" fill="#e879f9" opacity="0.7"/>
|
||||||
|
<text x="880" y="448" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
|
||||||
|
<polyline points="843,685 850,678 850,452 857,446" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="435" y="705" width="100" height="50" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
|
||||||
|
<text x="485" y="702" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-S</text>
|
||||||
|
<rect x="437" y="707" width="23" height="46" rx="2" fill="#818cf8" opacity="0.7"/>
|
||||||
|
<text x="448" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
|
||||||
|
<polyline points="135,693 142,700 442,700 448,707" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="461" y="707" width="23" height="46" rx="2" fill="#a78bfa" opacity="0.7"/>
|
||||||
|
<text x="472" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
|
||||||
|
<polyline points="285,693 292,700 466,700 472,707" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="485" y="707" width="23" height="46" rx="2" fill="#c084fc" opacity="0.7"/>
|
||||||
|
<text x="496" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
|
||||||
|
<polyline points="685,693 678,700 504,700 496,707" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="509" y="707" width="23" height="46" rx="2" fill="#e879f9" opacity="0.7"/>
|
||||||
|
<text x="520" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
|
||||||
|
<polyline points="835,693 828,700 528,700 520,707" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
|
||||||
|
<rect x="60" y="865" width="10" height="10" rx="2" fill="#3b82f6" stroke="#475569" stroke-width="0.5"/>
|
||||||
|
<text x="74" y="874" font-family="monospace" font-size="8" fill="#94a3b8">PE Router</text>
|
||||||
|
<rect x="147" y="865" width="10" height="10" rx="2" fill="#f59e0b" stroke="#475569" stroke-width="0.5"/>
|
||||||
|
<text x="161" y="874" font-family="monospace" font-size="8" fill="#94a3b8">M_CPU / SRAM</text>
|
||||||
|
<rect x="255" y="865" width="10" height="10" rx="2" fill="#8b5cf6" stroke="#475569" stroke-width="0.5"/>
|
||||||
|
<text x="269" y="874" font-family="monospace" font-size="8" fill="#94a3b8">UCIe</text>
|
||||||
|
<rect x="307" y="865" width="10" height="10" rx="2" fill="#334155" stroke="#475569" stroke-width="0.5"/>
|
||||||
|
<text x="321" y="874" font-family="monospace" font-size="8" fill="#94a3b8">Relay</text>
|
||||||
|
<rect x="366" y="865" width="10" height="10" rx="2" fill="#10b981" stroke="#475569" stroke-width="0.5"/>
|
||||||
|
<text x="380" y="874" font-family="monospace" font-size="8" fill="#94a3b8">HBM Link</text>
|
||||||
|
<rect x="446" y="865" width="10" height="10" rx="2" fill="#475569" stroke="#475569" stroke-width="0.5"/>
|
||||||
|
<text x="460" y="874" font-family="monospace" font-size="8" fill="#94a3b8">Mesh Link</text>
|
||||||
</svg>
|
</svg>
|
||||||
|
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 30 KiB |
@@ -26,6 +26,8 @@
|
|||||||
<text x="285.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE GEMM</text>
|
<text x="285.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE GEMM</text>
|
||||||
<rect x="241.2" y="243.0" width="87.5" height="49.0" rx="4" fill="#ec4899" stroke="#475569" stroke-width="1"/>
|
<rect x="241.2" y="243.0" width="87.5" height="49.0" rx="4" fill="#ec4899" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="285.0" y="271.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE MATH</text>
|
<text x="285.0" y="271.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE MATH</text>
|
||||||
|
<rect x="136.2" y="68.0" width="87.5" height="49.0" rx="4" fill="#e2e8f0" stroke="#475569" stroke-width="1"/>
|
||||||
|
<text x="180.0" y="96.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE MMU</text>
|
||||||
<rect x="346.2" y="155.5" width="87.5" height="49.0" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
|
<rect x="346.2" y="155.5" width="87.5" height="49.0" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="390.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE TCM</text>
|
<text x="390.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE TCM</text>
|
||||||
</svg>
|
</svg>
|
||||||
|
Before Width: | Height: | Size: 3.2 KiB After Width: | Height: | Size: 3.4 KiB |
@@ -51,13 +51,13 @@
|
|||||||
<line x1="396.0" y1="504.0" x2="540.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
<line x1="396.0" y1="504.0" x2="540.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="468.0" y="500.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
<text x="468.0" y="500.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||||
<polyline points="324.0,56.0 108.0,56.0 108.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
<polyline points="324.0,56.0 108.0,56.0 108.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="216.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
<text x="216.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
|
||||||
<polyline points="324.0,56.0 252.0,56.0 252.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
<polyline points="324.0,56.0 252.0,56.0 252.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="288.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
<text x="288.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
|
||||||
<polyline points="324.0,56.0 396.0,56.0 396.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
<polyline points="324.0,56.0 396.0,56.0 396.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="360.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
<text x="360.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
|
||||||
<polyline points="324.0,56.0 540.0,56.0 540.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
<polyline points="324.0,56.0 540.0,56.0 540.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="432.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
<text x="432.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
|
||||||
<rect x="84.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
<rect x="84.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="108.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,0)</text>
|
<text x="108.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,0)</text>
|
||||||
<rect x="228.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
<rect x="228.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
@@ -3,9 +3,9 @@
|
|||||||
<rect width="768" height="396" fill="#f8fafc"/>
|
<rect width="768" height="396" fill="#f8fafc"/>
|
||||||
<text x="384" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">SYSTEM VIEW</text>
|
<text x="384" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">SYSTEM VIEW</text>
|
||||||
<polyline points="384.0,60.0 182.0,60.0 182.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
|
<polyline points="384.0,60.0 182.0,60.0 182.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="283.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 256GB/s</text>
|
<text x="283.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 768GB/s</text>
|
||||||
<polyline points="384.0,60.0 586.0,60.0 586.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
|
<polyline points="384.0,60.0 586.0,60.0 586.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
|
||||||
<text x="485.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 256GB/s</text>
|
<text x="485.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 768GB/s</text>
|
||||||
<rect x="374.0" y="57.0" width="20.0" height="6.0" rx="4" fill="#6366f1" stroke="#475569" stroke-width="1"/>
|
<rect x="374.0" y="57.0" width="20.0" height="6.0" rx="4" fill="#6366f1" stroke="#475569" stroke-width="1"/>
|
||||||
<text x="384.0" y="64.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">Fabric Switch</text>
|
<text x="384.0" y="64.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">Fabric Switch</text>
|
||||||
<rect x="62.0" y="138.0" width="240.0" height="200.0" rx="4" fill="#e0e7ff" stroke="#475569" stroke-width="1"/>
|
<rect x="62.0" y="138.0" width="240.0" height="200.0" rx="4" fill="#e0e7ff" stroke="#475569" stroke-width="1"/>
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 1.9 KiB After Width: | Height: | Size: 1.9 KiB |
@@ -116,7 +116,7 @@ def _fmt_util(eff: float, bn: float | None) -> str:
|
|||||||
|
|
||||||
|
|
||||||
def _short_name(node_id: str) -> str:
|
def _short_name(node_id: str) -> str:
|
||||||
"""Shorten node id: keep last 2 segments to avoid ambiguity (xbar.pe0 vs pe0)."""
|
"""Shorten node id: keep last 2 segments to avoid ambiguity (router.pe0 vs pe0)."""
|
||||||
parts = node_id.split(".")
|
parts = node_id.split(".")
|
||||||
return ".".join(parts[-2:]) if len(parts) >= 2 else node_id
|
return ".".join(parts[-2:]) if len(parts) >= 2 else node_id
|
||||||
|
|
||||||
@@ -366,7 +366,7 @@ def run_probe(topology_path: str, case_filter: str | None = None) -> int:
|
|||||||
|
|
||||||
# --- PE DMA Summary Table ---
|
# --- PE DMA Summary Table ---
|
||||||
print()
|
print()
|
||||||
print(f"=== PE DMA Latency (pe_dma -> xbar -> HBM, data={nbytes}B) ===")
|
print(f"=== PE DMA Latency (pe_dma -> router -> HBM, data={nbytes}B) ===")
|
||||||
print(f" {'Case':<26} {'Target':<28} {'Actual':>8}"
|
print(f" {'Case':<26} {'Target':<28} {'Actual':>8}"
|
||||||
f" {'Ovhd':>6} {'Drain':>6} {'Wire':>5} {'Ovhd%':>6} {'Drain%':>7}"
|
f" {'Ovhd':>6} {'Drain':>6} {'Wire':>5} {'Ovhd%':>6} {'Drain%':>7}"
|
||||||
f" {'Eff.BW':>8} {'BN.BW':>8} {'Util%':>6}")
|
f" {'Eff.BW':>8} {'BN.BW':>8} {'Util%':>6}")
|
||||||
|
|||||||
@@ -137,7 +137,7 @@ def _extract_peaks(spec: dict | None) -> tuple[float, float]:
|
|||||||
gemm_attrs = comps.get("pe_gemm", {}).get("attrs", {})
|
gemm_attrs = comps.get("pe_gemm", {}).get("attrs", {})
|
||||||
peak_tflops = float(gemm_attrs.get("peak_tflops_f16", 0.0))
|
peak_tflops = float(gemm_attrs.get("peak_tflops_f16", 0.0))
|
||||||
cube_links = cube.get("links", {})
|
cube_links = cube.get("links", {})
|
||||||
hbm_bw = float(cube_links.get("xbar_to_hbm_bw_gbs", 0.0))
|
hbm_bw = float(cube_links.get("hbm_to_router_bw_gbs", 0.0))
|
||||||
return peak_tflops, hbm_bw
|
return peak_tflops, hbm_bw
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -114,7 +114,7 @@ class HbmCtrlComponent(ComponentBase):
|
|||||||
|
|
||||||
parts = self.node.id.split(".")
|
parts = self.node.id.split(".")
|
||||||
cube_id = int(parts[1].replace("cube", ""))
|
cube_id = int(parts[1].replace("cube", ""))
|
||||||
pe_id = int(parts[3].replace("slice", ""))
|
pe_id = 0 # single hbm_ctrl, PE info from request
|
||||||
resp_msg = ResponseMsg(
|
resp_msg = ResponseMsg(
|
||||||
correlation_id=txn.request.correlation_id,
|
correlation_id=txn.request.correlation_id,
|
||||||
request_id=txn.request.request_id,
|
request_id=txn.request.request_id,
|
||||||
|
|||||||
@@ -238,14 +238,11 @@ class MCpuComponent(ComponentBase):
|
|||||||
def _resolve_dma_destinations(self, request: Any, target_pe: int | str) -> list[str]:
|
def _resolve_dma_destinations(self, request: Any, target_pe: int | str) -> list[str]:
|
||||||
"""Return list of HBM destination node_ids for DMA fan-out.
|
"""Return list of HBM destination node_ids for DMA fan-out.
|
||||||
|
|
||||||
Uses PA-based resolution to determine the actual target cube and slice,
|
With single hbm_ctrl per cube (ADR-0019), always returns one node.
|
||||||
enabling cross-cube DMA routing when the PA points to a remote cube.
|
PA-based resolution still used for cross-cube routing.
|
||||||
"""
|
"""
|
||||||
cube_prefix = self.node.id.rsplit(".", 1)[0] # e.g. "sip0.cube0"
|
cube_prefix = self.node.id.rsplit(".", 1)[0] # e.g. "sip0.cube0"
|
||||||
|
|
||||||
if isinstance(target_pe, int):
|
|
||||||
return [f"{cube_prefix}.hbm_ctrl.slice{target_pe}"]
|
|
||||||
|
|
||||||
# PA-based resolution: extract actual target from physical address
|
# PA-based resolution: extract actual target from physical address
|
||||||
pa_val = getattr(request, "dst_pa", None) or getattr(request, "src_pa", None)
|
pa_val = getattr(request, "dst_pa", None) or getattr(request, "src_pa", None)
|
||||||
if pa_val is not None:
|
if pa_val is not None:
|
||||||
@@ -256,12 +253,8 @@ class MCpuComponent(ComponentBase):
|
|||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
# "all" without PA (KernelLaunch): all slices in local cube
|
# Default: single hbm_ctrl in local cube
|
||||||
n_slices = 8
|
return [f"{cube_prefix}.hbm_ctrl"]
|
||||||
if self.ctx and self.ctx.spec:
|
|
||||||
mm = self.ctx.spec.get("cube", {}).get("memory_map", {})
|
|
||||||
n_slices = mm.get("hbm_slices_per_cube", 8)
|
|
||||||
return [f"{cube_prefix}.hbm_ctrl.slice{i}" for i in range(n_slices)]
|
|
||||||
|
|
||||||
def _mmu_msg_fanout(self, env: simpy.Environment, txn: Any) -> Generator:
|
def _mmu_msg_fanout(self, env: simpy.Environment, txn: Any) -> Generator:
|
||||||
"""Fan out MmuMapMsg/MmuUnmapMsg to target PE_MMU(s) via NOC.
|
"""Fan out MmuMapMsg/MmuUnmapMsg to target PE_MMU(s) via NOC.
|
||||||
|
|||||||
@@ -1,224 +0,0 @@
|
|||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from collections.abc import Generator
|
|
||||||
from typing import TYPE_CHECKING, Any
|
|
||||||
|
|
||||||
import simpy
|
|
||||||
|
|
||||||
from kernbench.components.base import ComponentBase
|
|
||||||
|
|
||||||
if TYPE_CHECKING:
|
|
||||||
from kernbench.components.context import ComponentContext
|
|
||||||
from kernbench.topology.types import Node
|
|
||||||
|
|
||||||
|
|
||||||
class TwoDMeshNocComponent(ComponentBase):
|
|
||||||
"""2D mesh NOC modeled as a single smart node.
|
|
||||||
|
|
||||||
Latency model:
|
|
||||||
- Traversal latency = Manhattan distance between prev_hop and next_hop
|
|
||||||
node positions, split into XY segments, traversed with pipeline.
|
|
||||||
- overhead_ns (from node.attrs) is added once per traversal.
|
|
||||||
|
|
||||||
Contention model:
|
|
||||||
- Each directed XY segment is a simpy.Resource(capacity=1).
|
|
||||||
- Pipeline: next segment's resource is requested before the current
|
|
||||||
segment's timeout completes, so a free downstream segment is acquired
|
|
||||||
immediately (wormhole-style cut-through).
|
|
||||||
- Two transactions sharing a segment (same row or column band) contend.
|
|
||||||
|
|
||||||
Concurrency:
|
|
||||||
- _worker spawns an independent SimPy process per transaction, so the
|
|
||||||
NOC is never serialized at the node level — only at segment resources.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, node: Node, ctx: ComponentContext | None = None) -> None:
|
|
||||||
super().__init__(node, ctx)
|
|
||||||
self._env: simpy.Environment | None = None
|
|
||||||
self._links: dict[tuple, simpy.Resource] = {}
|
|
||||||
self._x_grid: list[float] = []
|
|
||||||
self._y_grid: list[float] = []
|
|
||||||
|
|
||||||
def start(self, env: simpy.Environment) -> None:
|
|
||||||
self._env = env
|
|
||||||
self._build_grid()
|
|
||||||
super().start(env)
|
|
||||||
|
|
||||||
def run(self, env: simpy.Environment, nbytes: int) -> Generator:
|
|
||||||
yield env.timeout(0)
|
|
||||||
|
|
||||||
# ── Grid construction ────────────────────────────────────────────
|
|
||||||
|
|
||||||
def _build_grid(self) -> None:
|
|
||||||
if not self.ctx:
|
|
||||||
return
|
|
||||||
mesh = self.ctx.spec.get("_mesh") if self.ctx.spec else None
|
|
||||||
if mesh:
|
|
||||||
self._build_grid_from_mesh(mesh)
|
|
||||||
else:
|
|
||||||
self._build_grid_from_positions()
|
|
||||||
|
|
||||||
def _build_grid_from_mesh(self, mesh: dict) -> None:
|
|
||||||
"""Build XY grid from cube_mesh.yaml router positions (authoritative)."""
|
|
||||||
origin_x, origin_y = self._cube_origin()
|
|
||||||
xs: set[float] = set()
|
|
||||||
ys: set[float] = set()
|
|
||||||
for key, router in mesh.get("routers", {}).items():
|
|
||||||
if router is not None:
|
|
||||||
xs.add(round(origin_x + router["pos_mm"][0], 2))
|
|
||||||
ys.add(round(origin_y + router["pos_mm"][1], 2))
|
|
||||||
self._x_grid = sorted(xs)
|
|
||||||
self._y_grid = sorted(ys)
|
|
||||||
|
|
||||||
def _build_grid_from_positions(self) -> None:
|
|
||||||
"""Fallback: infer grid from all node positions in the cube."""
|
|
||||||
cube_prefix = self.node.id.rsplit(".", 1)[0]
|
|
||||||
xs: set[float] = set()
|
|
||||||
ys: set[float] = set()
|
|
||||||
for node_id, pos in self.ctx.positions.items():
|
|
||||||
if node_id.startswith(cube_prefix + ".") and pos is not None:
|
|
||||||
xs.add(round(pos[0], 2))
|
|
||||||
ys.add(round(pos[1], 2))
|
|
||||||
self._x_grid = sorted(xs)
|
|
||||||
self._y_grid = sorted(ys)
|
|
||||||
|
|
||||||
def _cube_origin(self) -> tuple[float, float]:
|
|
||||||
"""Compute absolute origin (top-left) of this cube from cube_id."""
|
|
||||||
parts = self.node.id.split(".")
|
|
||||||
cube_str = [p for p in parts if p.startswith("cube")][0]
|
|
||||||
cube_id = int(cube_str[4:])
|
|
||||||
spec = self.ctx.spec
|
|
||||||
sip_spec = spec.get("sip", {})
|
|
||||||
cube_spec = spec.get("cube", {})
|
|
||||||
mesh_w = sip_spec.get("cube_mesh", {}).get("w", 4)
|
|
||||||
cube_w = cube_spec.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
|
|
||||||
cube_h = cube_spec.get("geometry", {}).get("cube_mm", {}).get("h", 14.0)
|
|
||||||
seam = sip_spec.get("links", {}).get("inter_cube_mesh", {}).get(
|
|
||||||
"distance_mm_across_seam", 1.0)
|
|
||||||
col = cube_id % mesh_w
|
|
||||||
row = cube_id // mesh_w
|
|
||||||
return (col * (cube_w + seam), row * (cube_h + seam))
|
|
||||||
|
|
||||||
def _get_link(self, key: tuple) -> simpy.Resource:
|
|
||||||
if key not in self._links:
|
|
||||||
assert self._env is not None
|
|
||||||
self._links[key] = simpy.Resource(self._env, capacity=1)
|
|
||||||
return self._links[key]
|
|
||||||
|
|
||||||
# ── Worker ───────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def _worker(self, env: simpy.Environment) -> Generator:
|
|
||||||
while True:
|
|
||||||
txn: Any = yield self._inbox.get()
|
|
||||||
env.process(self._route(env, txn))
|
|
||||||
|
|
||||||
def _route(self, env: simpy.Environment, txn: Any) -> Generator:
|
|
||||||
prev_hop = txn.path[txn.step - 1] if txn.step > 0 else None
|
|
||||||
next_hop = txn.next_hop
|
|
||||||
overhead_ns = float(self.node.attrs.get("overhead_ns", 0.0))
|
|
||||||
|
|
||||||
links: list[tuple[tuple, float]] = []
|
|
||||||
if prev_hop and next_hop and self.ctx:
|
|
||||||
src_pos = self.ctx.positions.get(prev_hop)
|
|
||||||
dst_pos = self.ctx.positions.get(next_hop)
|
|
||||||
if src_pos and dst_pos:
|
|
||||||
links = self._xy_links(src_pos, dst_pos)
|
|
||||||
|
|
||||||
if links:
|
|
||||||
yield from self._traverse(env, links, overhead_ns)
|
|
||||||
else:
|
|
||||||
yield env.timeout(overhead_ns)
|
|
||||||
|
|
||||||
if next_hop:
|
|
||||||
yield self.out_ports[next_hop].put(txn.advance())
|
|
||||||
else:
|
|
||||||
drain = getattr(txn, "drain_ns", 0.0)
|
|
||||||
if drain > 0:
|
|
||||||
yield env.timeout(drain)
|
|
||||||
txn.done.succeed()
|
|
||||||
|
|
||||||
# ── XY routing and pipelined link traversal ──────────────────────
|
|
||||||
|
|
||||||
def _traverse(
|
|
||||||
self,
|
|
||||||
env: simpy.Environment,
|
|
||||||
links: list[tuple[tuple, float]],
|
|
||||||
overhead_ns: float,
|
|
||||||
) -> Generator:
|
|
||||||
"""Pipeline: request next segment before current timeout finishes."""
|
|
||||||
ns_per_mm = self.ctx.ns_per_mm # type: ignore[union-attr]
|
|
||||||
|
|
||||||
# Acquire first link
|
|
||||||
first_key, _ = links[0]
|
|
||||||
current_resource = self._get_link(first_key)
|
|
||||||
current_req = current_resource.request()
|
|
||||||
yield current_req
|
|
||||||
|
|
||||||
for i, (_, dist_mm) in enumerate(links):
|
|
||||||
# Request next link before current timeout (pipeline)
|
|
||||||
if i + 1 < len(links):
|
|
||||||
next_key, _ = links[i + 1]
|
|
||||||
next_resource = self._get_link(next_key)
|
|
||||||
next_req = next_resource.request()
|
|
||||||
|
|
||||||
yield env.timeout(dist_mm * ns_per_mm + (overhead_ns if i == 0 else 0.0))
|
|
||||||
current_resource.release(current_req)
|
|
||||||
|
|
||||||
if i + 1 < len(links):
|
|
||||||
yield next_req # usually already fulfilled (pipeline)
|
|
||||||
current_resource = next_resource
|
|
||||||
current_req = next_req
|
|
||||||
|
|
||||||
def _xy_links(
|
|
||||||
self,
|
|
||||||
src: tuple[float, float],
|
|
||||||
dst: tuple[float, float],
|
|
||||||
) -> list[tuple[tuple, float]]:
|
|
||||||
"""XY routing: horizontal segment first, then vertical.
|
|
||||||
|
|
||||||
Returns list of (link_key, dist_mm) pairs, where link_key uniquely
|
|
||||||
identifies a directed segment shared across concurrent transactions.
|
|
||||||
"""
|
|
||||||
x0, y0 = src
|
|
||||||
x1, y1 = dst
|
|
||||||
links: list[tuple[tuple, float]] = []
|
|
||||||
|
|
||||||
# Horizontal segment at y≈y0
|
|
||||||
if abs(x0 - x1) > 1e-9:
|
|
||||||
y_band = self._snap(y0, self._y_grid)
|
|
||||||
for xa, xb in self._segments(x0, x1, self._x_grid):
|
|
||||||
d = abs(xb - xa)
|
|
||||||
if d > 1e-9:
|
|
||||||
lo, hi = (xa, xb) if xa < xb else (xb, xa)
|
|
||||||
dir_h = "E" if xb > xa else "W"
|
|
||||||
links.append((("H", round(y_band, 2), round(lo, 2), round(hi, 2), dir_h), d))
|
|
||||||
|
|
||||||
# Vertical segment at x≈x1
|
|
||||||
if abs(y0 - y1) > 1e-9:
|
|
||||||
x_band = self._snap(x1, self._x_grid)
|
|
||||||
for ya, yb in self._segments(y0, y1, self._y_grid):
|
|
||||||
d = abs(yb - ya)
|
|
||||||
if d > 1e-9:
|
|
||||||
lo, hi = (ya, yb) if ya < yb else (yb, ya)
|
|
||||||
dir_v = "S" if yb > ya else "N"
|
|
||||||
links.append((("V", round(x_band, 2), round(lo, 2), round(hi, 2), dir_v), d))
|
|
||||||
|
|
||||||
return links
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _snap(val: float, grid: list[float]) -> float:
|
|
||||||
if not grid:
|
|
||||||
return val
|
|
||||||
return min(grid, key=lambda g: abs(g - val))
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _segments(a: float, b: float, grid: list[float]) -> list[tuple[float, float]]:
|
|
||||||
"""Consecutive (p_i, p_{i+1}) pairs covering range [a, b] using grid waypoints."""
|
|
||||||
if abs(a - b) < 1e-9:
|
|
||||||
return []
|
|
||||||
lo, hi = (a, b) if a < b else (b, a)
|
|
||||||
pts = [lo] + [g for g in grid if lo + 1e-9 < g < hi - 1e-9] + [hi]
|
|
||||||
pairs = [(pts[i], pts[i + 1]) for i in range(len(pts) - 1)]
|
|
||||||
if a > b:
|
|
||||||
pairs = [(p2, p1) for p1, p2 in reversed(pairs)]
|
|
||||||
return pairs
|
|
||||||
@@ -96,7 +96,7 @@ class PeDmaComponent(PeEngineBase):
|
|||||||
request=sub_request, path=path, step=0,
|
request=sub_request, path=path, step=0,
|
||||||
nbytes=cmd.nbytes, done=sub_done, drain_ns=drain_ns,
|
nbytes=cmd.nbytes, done=sub_done, drain_ns=drain_ns,
|
||||||
)
|
)
|
||||||
# Send to next hop (path[0] is pe_dma itself, path[1] is xbar)
|
# Send to next hop (path[0] is pe_dma itself, path[1] is router)
|
||||||
if len(path) > 1:
|
if len(path) > 1:
|
||||||
yield self.out_ports[path[1]].put(sub_txn.advance())
|
yield self.out_ports[path[1]].put(sub_txn.advance())
|
||||||
# DMA channel released after issue
|
# DMA channel released after issue
|
||||||
|
|||||||
@@ -1,168 +0,0 @@
|
|||||||
"""Position-aware XBAR component.
|
|
||||||
|
|
||||||
Models crossbar latency as base_overhead_ns + internal_distance * ns_per_mm,
|
|
||||||
where internal_distance is the Manhattan distance between the entry port
|
|
||||||
(PE router attachment) and exit port (HBM slice logical position) within
|
|
||||||
the crossbar matrix.
|
|
||||||
|
|
||||||
PE router positions come from cube_mesh.yaml (via ctx.spec["_mesh"]).
|
|
||||||
HBM slice positions are uniformly distributed across the HBM physical width.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from collections.abc import Generator
|
|
||||||
from typing import TYPE_CHECKING, Any
|
|
||||||
|
|
||||||
import simpy
|
|
||||||
|
|
||||||
from kernbench.components.base import ComponentBase
|
|
||||||
|
|
||||||
if TYPE_CHECKING:
|
|
||||||
from kernbench.components.context import ComponentContext
|
|
||||||
from kernbench.topology.types import Node
|
|
||||||
|
|
||||||
|
|
||||||
class PositionAwareXbarComponent(ComponentBase):
|
|
||||||
"""XBAR with position-dependent latency based on PE-to-slice distance.
|
|
||||||
|
|
||||||
Latency = base_overhead_ns + |entry_port_x - exit_port_x| * ns_per_mm
|
|
||||||
|
|
||||||
Entry/exit port X positions are determined from the transaction path:
|
|
||||||
- PE_DMA nodes: router X from cube_mesh.yaml
|
|
||||||
- HBM slices: uniformly distributed across HBM physical width
|
|
||||||
- Bridge nodes: physical X from topology positions
|
|
||||||
- NOC: resolved by scanning path for PE_DMA node
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, node: Node, ctx: ComponentContext | None = None) -> None:
|
|
||||||
super().__init__(node, ctx)
|
|
||||||
self._base_overhead_ns = float(node.attrs.get("overhead_ns", 0.0))
|
|
||||||
self._pe_router_xs: dict[str, float] = {}
|
|
||||||
self._slice_xs: dict[str, float] = {}
|
|
||||||
self._bridge_xs: dict[str, float] = {}
|
|
||||||
self._ns_per_mm: float = 0.0
|
|
||||||
|
|
||||||
def start(self, env: simpy.Environment) -> None:
|
|
||||||
self._build_position_map()
|
|
||||||
super().start(env)
|
|
||||||
|
|
||||||
def run(self, env: simpy.Environment, nbytes: int) -> Generator:
|
|
||||||
yield env.timeout(self._base_overhead_ns)
|
|
||||||
|
|
||||||
# ── Position map construction ─────────────────────────────────
|
|
||||||
|
|
||||||
def _build_position_map(self) -> None:
|
|
||||||
if not self.ctx or not self.ctx.spec:
|
|
||||||
return
|
|
||||||
mesh = self.ctx.spec.get("_mesh")
|
|
||||||
if not mesh:
|
|
||||||
return
|
|
||||||
|
|
||||||
self._ns_per_mm = self.ctx.ns_per_mm
|
|
||||||
cube_prefix = self.node.id.rsplit(".", 1)[0]
|
|
||||||
xbar_name = self.node.id.rsplit(".", 1)[1]
|
|
||||||
is_top = xbar_name == "xbar_top"
|
|
||||||
xbar_key = "top" if is_top else "bottom"
|
|
||||||
|
|
||||||
# PE router X positions from mesh attachments
|
|
||||||
routers_list = mesh.get("xbar", {}).get(xbar_key, {}).get("routers", [])
|
|
||||||
for router_id in routers_list:
|
|
||||||
router_data = mesh["routers"].get(router_id)
|
|
||||||
if router_data is None:
|
|
||||||
continue
|
|
||||||
router_x = router_data["pos_mm"][0]
|
|
||||||
for attach in router_data.get("attach", []):
|
|
||||||
if attach.endswith(".dma"):
|
|
||||||
pe_name = attach.split(".")[0]
|
|
||||||
pe_dma_id = f"{cube_prefix}.{pe_name}.pe_dma"
|
|
||||||
self._pe_router_xs[pe_dma_id] = router_x
|
|
||||||
|
|
||||||
# HBM slice X positions: uniformly distributed across HBM width
|
|
||||||
cube_spec = self.ctx.spec.get("cube", {})
|
|
||||||
cube_w = cube_spec.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
|
|
||||||
hbm_w = cube_spec.get("geometry", {}).get("hbm_mm", {}).get("w", 9.0)
|
|
||||||
n_slices = cube_spec.get("memory_map", {}).get("hbm_slices_per_cube", 8)
|
|
||||||
half = n_slices // 2
|
|
||||||
hbm_left = (cube_w - hbm_w) / 2
|
|
||||||
|
|
||||||
if is_top:
|
|
||||||
slice_range = range(half)
|
|
||||||
else:
|
|
||||||
slice_range = range(half, n_slices)
|
|
||||||
|
|
||||||
n = len(list(slice_range))
|
|
||||||
for i, sl in enumerate(slice_range):
|
|
||||||
if n > 1:
|
|
||||||
x = hbm_left + i * hbm_w / (n - 1)
|
|
||||||
else:
|
|
||||||
x = cube_w / 2
|
|
||||||
self._slice_xs[f"{cube_prefix}.hbm_ctrl.slice{sl}"] = x
|
|
||||||
|
|
||||||
# Bridge X positions from topology positions
|
|
||||||
for node_id, pos in self.ctx.positions.items():
|
|
||||||
if node_id.startswith(cube_prefix + ".bridge.") and pos is not None:
|
|
||||||
origin_x = self._cube_origin_x()
|
|
||||||
self._bridge_xs[node_id] = pos[0] - origin_x
|
|
||||||
|
|
||||||
def _cube_origin_x(self) -> float:
|
|
||||||
"""Compute absolute X origin of this cube."""
|
|
||||||
parts = self.node.id.split(".")
|
|
||||||
cube_str = [p for p in parts if p.startswith("cube")][0]
|
|
||||||
cube_id = int(cube_str[4:])
|
|
||||||
spec = self.ctx.spec
|
|
||||||
sip_spec = spec.get("sip", {})
|
|
||||||
cube_spec = spec.get("cube", {})
|
|
||||||
mesh_w = sip_spec.get("cube_mesh", {}).get("w", 4)
|
|
||||||
cube_w = cube_spec.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
|
|
||||||
seam = sip_spec.get("links", {}).get("inter_cube_mesh", {}).get(
|
|
||||||
"distance_mm_across_seam", 1.0)
|
|
||||||
col = cube_id % mesh_w
|
|
||||||
return col * (cube_w + seam)
|
|
||||||
|
|
||||||
# ── Worker override ───────────────────────────────────────────
|
|
||||||
|
|
||||||
def _worker(self, env: simpy.Environment) -> Generator:
|
|
||||||
while True:
|
|
||||||
txn: Any = yield self._inbox.get()
|
|
||||||
env.process(self._position_aware_forward(env, txn))
|
|
||||||
|
|
||||||
def _position_aware_forward(
|
|
||||||
self, env: simpy.Environment, txn: Any,
|
|
||||||
) -> Generator:
|
|
||||||
prev_hop = txn.path[txn.step - 1] if txn.step > 0 else None
|
|
||||||
next_hop = txn.next_hop
|
|
||||||
|
|
||||||
overhead = self._base_overhead_ns
|
|
||||||
if prev_hop and next_hop and self._ns_per_mm > 0:
|
|
||||||
entry_x = self._get_port_x(prev_hop, txn.path)
|
|
||||||
exit_x = self._get_port_x(next_hop, txn.path)
|
|
||||||
if entry_x is not None and exit_x is not None:
|
|
||||||
overhead = self._base_overhead_ns + abs(entry_x - exit_x) * self._ns_per_mm
|
|
||||||
|
|
||||||
yield env.timeout(overhead)
|
|
||||||
|
|
||||||
if next_hop:
|
|
||||||
yield self.out_ports[next_hop].put(txn.advance())
|
|
||||||
else:
|
|
||||||
drain = getattr(txn, "drain_ns", 0.0)
|
|
||||||
if drain > 0:
|
|
||||||
yield env.timeout(drain)
|
|
||||||
txn.done.succeed()
|
|
||||||
|
|
||||||
def _get_port_x(self, node_id: str, path: list[str]) -> float | None:
|
|
||||||
"""Resolve the X position of an XBAR port from node context."""
|
|
||||||
# Direct lookup: PE DMA
|
|
||||||
if node_id in self._pe_router_xs:
|
|
||||||
return self._pe_router_xs[node_id]
|
|
||||||
# Direct lookup: HBM slice
|
|
||||||
if node_id in self._slice_xs:
|
|
||||||
return self._slice_xs[node_id]
|
|
||||||
# Direct lookup: bridge
|
|
||||||
if node_id in self._bridge_xs:
|
|
||||||
return self._bridge_xs[node_id]
|
|
||||||
# NOC: scan path for PE DMA node
|
|
||||||
if "noc" in node_id:
|
|
||||||
for p in path:
|
|
||||||
if p in self._pe_router_xs:
|
|
||||||
return self._pe_router_xs[p]
|
|
||||||
return None
|
|
||||||
@@ -22,8 +22,6 @@ class AddressResolver:
|
|||||||
|
|
||||||
def __init__(self, graph: TopologyGraph) -> None:
|
def __init__(self, graph: TopologyGraph) -> None:
|
||||||
self._node_ids = set(graph.nodes)
|
self._node_ids = set(graph.nodes)
|
||||||
mm = graph.spec["cube"]["memory_map"]
|
|
||||||
self._slice_size_bytes = mm["hbm_total_gb_per_cube"] * (1 << 30) // mm["hbm_slices_per_cube"]
|
|
||||||
|
|
||||||
# ── Physical-address resolution ──────────────────────────────────
|
# ── Physical-address resolution ──────────────────────────────────
|
||||||
|
|
||||||
@@ -31,8 +29,7 @@ class AddressResolver:
|
|||||||
s = addr.sip_id
|
s = addr.sip_id
|
||||||
c = addr.cube_id
|
c = addr.cube_id
|
||||||
if addr.kind == "hbm":
|
if addr.kind == "hbm":
|
||||||
pe_slice = PhysAddr.hbm_pe_id(addr.hbm_offset, self._slice_size_bytes)
|
node_id = f"sip{s}.cube{c}.hbm_ctrl"
|
||||||
node_id = f"sip{s}.cube{c}.hbm_ctrl.slice{pe_slice}"
|
|
||||||
elif addr.kind == "pe_resource":
|
elif addr.kind == "pe_resource":
|
||||||
if addr.unit_type == UnitType.PE:
|
if addr.unit_type == UnitType.PE:
|
||||||
node_id = f"sip{s}.cube{c}.pe{addr.pe_id}.pe_tcm"
|
node_id = f"sip{s}.cube{c}.pe{addr.pe_id}.pe_tcm"
|
||||||
@@ -84,12 +81,17 @@ class PathRouter:
|
|||||||
|
|
||||||
# Edge kinds excluded from M_CPU DMA adjacency: prevents routing through
|
# Edge kinds excluded from M_CPU DMA adjacency: prevents routing through
|
||||||
# PE-internal pipeline nodes when computing DMA paths.
|
# PE-internal pipeline nodes when computing DMA paths.
|
||||||
_MCPU_DMA_EXCLUDE = {"pe_internal", "pe_to_xbar"}
|
_MCPU_DMA_EXCLUDE = {"pe_internal", "pe_to_router"}
|
||||||
|
|
||||||
|
_UCIE_KINDS = {"ucie_internal", "ucie_conn_to_router", "router_to_ucie_conn",
|
||||||
|
"ucie_conn_to_noc", "noc_to_ucie_conn", "ucie_mesh",
|
||||||
|
"io_to_cube", "cube_to_io"}
|
||||||
|
|
||||||
def __init__(self, graph: TopologyGraph) -> None:
|
def __init__(self, graph: TopologyGraph) -> None:
|
||||||
self._adj: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
self._adj: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
||||||
self._adj_all: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
self._adj_all: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
||||||
self._adj_mcpu_dma: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
self._adj_mcpu_dma: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
||||||
|
self._adj_local: dict[str, list[tuple[str, float]]] = defaultdict(list)
|
||||||
for e in graph.edges:
|
for e in graph.edges:
|
||||||
w = e.routing_weight_mm if e.routing_weight_mm is not None else e.distance_mm
|
w = e.routing_weight_mm if e.routing_weight_mm is not None else e.distance_mm
|
||||||
self._adj_all[e.src].append((e.dst, w))
|
self._adj_all[e.src].append((e.dst, w))
|
||||||
@@ -97,6 +99,8 @@ class PathRouter:
|
|||||||
self._adj[e.src].append((e.dst, w))
|
self._adj[e.src].append((e.dst, w))
|
||||||
if e.kind not in self._MCPU_DMA_EXCLUDE:
|
if e.kind not in self._MCPU_DMA_EXCLUDE:
|
||||||
self._adj_mcpu_dma[e.src].append((e.dst, w))
|
self._adj_mcpu_dma[e.src].append((e.dst, w))
|
||||||
|
if e.kind not in self._UCIE_KINDS:
|
||||||
|
self._adj_local[e.src].append((e.dst, w))
|
||||||
|
|
||||||
def find_path(self, src_pe: str, dst_node: str) -> list[str]:
|
def find_path(self, src_pe: str, dst_node: str) -> list[str]:
|
||||||
"""PE DMA routing: prepends .pe_dma, excludes command edges."""
|
"""PE DMA routing: prepends .pe_dma, excludes command edges."""
|
||||||
@@ -107,30 +111,22 @@ class PathRouter:
|
|||||||
start = f"{src_pe}.pe_dma"
|
start = f"{src_pe}.pe_dma"
|
||||||
return self._run_dijkstra_with_dist(self._adj, start, dst_node)
|
return self._run_dijkstra_with_dist(self._adj, start, dst_node)
|
||||||
|
|
||||||
def find_mcpu_dma_path(self, m_cpu_id: str, dst_hbm_slice_id: str) -> list[str]:
|
def find_mcpu_dma_path(self, m_cpu_id: str, dst_hbm_id: str) -> list[str]:
|
||||||
"""M_CPU DMA path: never routes through PE-internal nodes (ADR-0015 D5).
|
"""M_CPU DMA path: routes through router mesh (ADR-0019).
|
||||||
|
|
||||||
Same-cube: deterministic [m_cpu, noc, xbar_top/bot, hbm_ctrl.slice_i].
|
Same-cube: uses _adj_local (no UCIe) to stay within mesh.
|
||||||
Cross-cube: Dijkstra via _adj_mcpu_dma (pe_internal/pe_to_xbar excluded)
|
Cross-cube: uses _adj_all to route via UCIe.
|
||||||
→ routes through NOC → UCIe → target cube NOC → xbar → HBM.
|
|
||||||
"""
|
"""
|
||||||
m_cube = ".".join(m_cpu_id.split(".")[:2])
|
m_cube = ".".join(m_cpu_id.split(".")[:2])
|
||||||
d_cube = ".".join(dst_hbm_slice_id.split(".")[:2])
|
d_cube = ".".join(dst_hbm_id.split(".")[:2])
|
||||||
if m_cube == d_cube:
|
if m_cube == d_cube:
|
||||||
slice_idx = int(dst_hbm_slice_id.rsplit("slice", 1)[1])
|
return self._run_dijkstra(self._adj_local, m_cpu_id, dst_hbm_id)
|
||||||
xbar = "xbar_top" if slice_idx < 4 else "xbar_bot"
|
return self._run_dijkstra(self._adj_all, m_cpu_id, dst_hbm_id)
|
||||||
return [
|
|
||||||
m_cpu_id,
|
|
||||||
f"{m_cube}.noc",
|
|
||||||
f"{m_cube}.{xbar}",
|
|
||||||
dst_hbm_slice_id,
|
|
||||||
]
|
|
||||||
return self._run_dijkstra(self._adj_mcpu_dma, m_cpu_id, dst_hbm_slice_id)
|
|
||||||
|
|
||||||
def find_memory_path(self, src: str, dst: str) -> list[str]:
|
def find_memory_path(self, src: str, dst: str) -> list[str]:
|
||||||
"""Direct memory path: pcie_ep → io_noc → cube → xbar → hbm_ctrl.
|
"""Direct memory path: pcie_ep → io_noc → cube → router mesh → hbm_ctrl.
|
||||||
|
|
||||||
Uses _adj_mcpu_dma which excludes pe_internal and pe_to_xbar edges,
|
Uses _adj_mcpu_dma which excludes pe_internal and pe_to_router edges,
|
||||||
preventing routing through PE pipeline nodes.
|
preventing routing through PE pipeline nodes.
|
||||||
"""
|
"""
|
||||||
return self._run_dijkstra(self._adj_mcpu_dma, src, dst)
|
return self._run_dijkstra(self._adj_mcpu_dma, src, dst)
|
||||||
|
|||||||
@@ -173,7 +173,7 @@ class RuntimeContext:
|
|||||||
pe_comps = pe_template.get("components", {})
|
pe_comps = pe_template.get("components", {})
|
||||||
tcm_cfg = pe_comps.get("pe_tcm", {}).get("attrs", {})
|
tcm_cfg = pe_comps.get("pe_tcm", {}).get("attrs", {})
|
||||||
|
|
||||||
sip_count = system.get("sips", {}).get("count", 1)
|
total_sip_count = system.get("sips", {}).get("count", 1)
|
||||||
cubes_per_sip = system.get("sips", {}).get("cubes_per_sip", 16)
|
cubes_per_sip = system.get("sips", {}).get("cubes_per_sip", 16)
|
||||||
pes_per_cube = (
|
pes_per_cube = (
|
||||||
cube.get("pe_layout", {}).get("pe_per_corner", 2)
|
cube.get("pe_layout", {}).get("pe_per_corner", 2)
|
||||||
@@ -183,6 +183,17 @@ class RuntimeContext:
|
|||||||
hbm_slices = mm.get("hbm_slices_per_cube", 8)
|
hbm_slices = mm.get("hbm_slices_per_cube", 8)
|
||||||
tcm_mb = tcm_cfg.get("size_mb", 16)
|
tcm_mb = tcm_cfg.get("size_mb", 16)
|
||||||
|
|
||||||
|
# Scope to target_device: single SIP or all SIPs
|
||||||
|
from kernbench.runtime_api.types import DeviceSelector, resolve_device
|
||||||
|
td = self.target_device if isinstance(self.target_device, DeviceSelector) else resolve_device(str(self.target_device))
|
||||||
|
if td.is_all:
|
||||||
|
sip_range = range(total_sip_count)
|
||||||
|
sip_count = total_sip_count
|
||||||
|
else:
|
||||||
|
sip_idx = td.sip_index
|
||||||
|
sip_range = range(sip_idx, sip_idx + 1)
|
||||||
|
sip_count = 1
|
||||||
|
|
||||||
cfg = AddressConfig(
|
cfg = AddressConfig(
|
||||||
sip_count=sip_count,
|
sip_count=sip_count,
|
||||||
cubes_per_sip=cubes_per_sip,
|
cubes_per_sip=cubes_per_sip,
|
||||||
@@ -193,13 +204,13 @@ class RuntimeContext:
|
|||||||
tcm_scheduler_reserved_bytes=4 * (1 << 20),
|
tcm_scheduler_reserved_bytes=4 * (1 << 20),
|
||||||
sram_bytes_per_cube=32 * (1 << 20),
|
sram_bytes_per_cube=32 * (1 << 20),
|
||||||
)
|
)
|
||||||
# Create allocators for all SIPs × cubes × PEs
|
# Create allocators scoped to target SIP(s) only
|
||||||
# Flat index: sip_id * cubes_per_sip * pes_per_cube + cube_id * pes_per_cube + pe_id
|
# Flat index: sip_id * cubes_per_sip * pes_per_cube + cube_id * pes_per_cube + pe_id
|
||||||
self._pes_per_cube = pes_per_cube
|
self._pes_per_cube = pes_per_cube
|
||||||
self._num_cubes = cubes_per_sip
|
self._num_cubes = cubes_per_sip
|
||||||
self._num_sips = sip_count
|
self._num_sips = sip_count
|
||||||
cubes_x_pes = cubes_per_sip * pes_per_cube
|
cubes_x_pes = cubes_per_sip * pes_per_cube
|
||||||
for sip_id in range(sip_count):
|
for sip_id in sip_range:
|
||||||
for cube_id in range(cubes_per_sip):
|
for cube_id in range(cubes_per_sip):
|
||||||
for pe_id in range(pes_per_cube):
|
for pe_id in range(pes_per_cube):
|
||||||
flat_idx = sip_id * cubes_x_pes + cube_id * pes_per_cube + pe_id
|
flat_idx = sip_id * cubes_x_pes + cube_id * pes_per_cube + pe_id
|
||||||
|
|||||||
@@ -41,7 +41,7 @@ class DeviceSelector:
|
|||||||
def sip_index(self) -> int:
|
def sip_index(self) -> int:
|
||||||
if self.is_all:
|
if self.is_all:
|
||||||
raise ValueError("DeviceSelector is 'all'; no single sip_index.")
|
raise ValueError("DeviceSelector is 'all'; no single sip_index.")
|
||||||
m = re.fullmatch(r"sip:(\d+)", self.raw)
|
m = re.fullmatch(r"sip:?(\d+)", self.raw)
|
||||||
if not m:
|
if not m:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
f"Invalid device '{self.raw}'. Expected 'all' or 'sip:<N>' (e.g., sip:0)."
|
f"Invalid device '{self.raw}'. Expected 'all' or 'sip:<N>' (e.g., sip:0)."
|
||||||
@@ -64,8 +64,9 @@ def resolve_device(raw: str | None) -> DeviceSelector:
|
|||||||
if raw == "all":
|
if raw == "all":
|
||||||
return DeviceSelector(raw="all")
|
return DeviceSelector(raw="all")
|
||||||
|
|
||||||
m = re.fullmatch(r"sip:(\d+)", raw)
|
m = re.fullmatch(r"sip:?(\d+)", raw)
|
||||||
if not m:
|
if not m:
|
||||||
raise ValueError(f"Invalid device '{raw}'. Expected 'all' or 'sip:<N>' (e.g., sip:0).")
|
raise ValueError(f"Invalid device '{raw}'. Expected 'all' or 'sip:<N>' (e.g., sip:0).")
|
||||||
|
raw = f"sip:{m.group(1)}" # normalize to sip:N format
|
||||||
|
|
||||||
return DeviceSelector(raw=raw)
|
return DeviceSelector(raw=raw)
|
||||||
|
|||||||
@@ -19,9 +19,9 @@ class GraphEngine:
|
|||||||
"""simpy-based discrete-event simulation engine.
|
"""simpy-based discrete-event simulation engine.
|
||||||
|
|
||||||
Request routing:
|
Request routing:
|
||||||
MemoryWrite/Read: pcie_ep → io_noc → cube → xbar → hbm_ctrl (m_cpu bypass)
|
MemoryWrite/Read: pcie_ep → io_noc → cube → router mesh → hbm_ctrl (m_cpu bypass)
|
||||||
KernelLaunch: pcie_ep → io_noc → io_cpu → io_noc → cube → m_cpu → PE
|
KernelLaunch: pcie_ep → io_noc → io_cpu → io_noc → cube → m_cpu → PE
|
||||||
PeDmaMsg: pe_dma → xbar → hbm_ctrl (direct probe)
|
PeDmaMsg: pe_dma → router mesh → hbm_ctrl (direct probe)
|
||||||
|
|
||||||
Component implementations are DI-injectable via component_overrides (ADR-0007 D3).
|
Component implementations are DI-injectable via component_overrides (ADR-0007 D3).
|
||||||
"""
|
"""
|
||||||
@@ -261,7 +261,7 @@ class GraphEngine:
|
|||||||
done.succeed()
|
done.succeed()
|
||||||
|
|
||||||
def _process_memory_direct(self, key: str, request: Any, done: simpy.Event):
|
def _process_memory_direct(self, key: str, request: Any, done: simpy.Event):
|
||||||
"""Direct memory path: pcie_ep → io_noc → cube → xbar → hbm_ctrl.
|
"""Direct memory path: pcie_ep → io_noc → cube → router mesh → hbm_ctrl.
|
||||||
|
|
||||||
MemoryWrite: data flows forward (nbytes on wires), drain at hbm_ctrl terminal.
|
MemoryWrite: data flows forward (nbytes on wires), drain at hbm_ctrl terminal.
|
||||||
MemoryRead: command flows forward (nbytes=0), hbm_ctrl sends data back on
|
MemoryRead: command flows forward (nbytes=0), hbm_ctrl sends data back on
|
||||||
|
|||||||
@@ -287,7 +287,7 @@ def _generate_probe_d2h(graph, edge_map) -> list[dict]:
|
|||||||
|
|
||||||
|
|
||||||
def _generate_probe_pe_dma(graph, edge_map) -> list[dict]:
|
def _generate_probe_pe_dma(graph, edge_map) -> list[dict]:
|
||||||
"""PE DMA probes: pe_dma → xbar → HBM."""
|
"""PE DMA probes: pe_dma → router mesh → HBM."""
|
||||||
from kernbench.policy.address.phyaddr import PhysAddr
|
from kernbench.policy.address.phyaddr import PhysAddr
|
||||||
from kernbench.policy.routing.router import AddressResolver, PathRouter
|
from kernbench.policy.routing.router import AddressResolver, PathRouter
|
||||||
|
|
||||||
@@ -399,7 +399,7 @@ def _generate_bench_qkv_gemm(graph, edge_map) -> list[dict]:
|
|||||||
# Find pe0 → HBM path
|
# Find pe0 → HBM path
|
||||||
pe_ref = "sip0.cube0.pe0"
|
pe_ref = "sip0.cube0.pe0"
|
||||||
try:
|
try:
|
||||||
dma_path = router.find_path(pe_ref, f"sip0.cube0.hbm_ctrl.slice0")
|
dma_path = router.find_path(pe_ref, f"sip0.cube0.hbm_ctrl")
|
||||||
except Exception:
|
except Exception:
|
||||||
dma_path = [pe_ref]
|
dma_path = [pe_ref]
|
||||||
|
|
||||||
@@ -433,7 +433,7 @@ def _generate_bench_qkv_gemm(graph, edge_map) -> list[dict]:
|
|||||||
# DMA write result back
|
# DMA write result back
|
||||||
t += bw_ns
|
t += bw_ns
|
||||||
ev(t, type="process", request_id=rid,
|
ev(t, type="process", request_id=rid,
|
||||||
component="sip0.cube0.hbm_ctrl.slice0",
|
component="sip0.cube0.hbm_ctrl",
|
||||||
latency_ns=round(bw_ns, 3), metadata={"op": "write", "cmd": "dma_write_out"})
|
latency_ns=round(bw_ns, 3), metadata={"op": "write", "cmd": "dma_write_out"})
|
||||||
|
|
||||||
ev(t, type="complete", request_id=rid,
|
ev(t, type="complete", request_id=rid,
|
||||||
|
|||||||
@@ -155,12 +155,7 @@ def _cube_local_positions(cube_w: float, cube_h: float) -> dict[str, tuple[float
|
|||||||
"ucie-W": (uw, cy),
|
"ucie-W": (uw, cy),
|
||||||
"ucie-E": (cube_w - uw, cy),
|
"ucie-E": (cube_w - uw, cy),
|
||||||
"m_cpu": (cube_w - 2.5, cy - 1.5),
|
"m_cpu": (cube_w - 2.5, cy - 1.5),
|
||||||
"xbar_top": (cx, 3.5),
|
|
||||||
"hbm_ctrl": (cx - 2.0, cy),
|
"hbm_ctrl": (cx - 2.0, cy),
|
||||||
"xbar_bot": (cx, cube_h - 3.5),
|
|
||||||
"bridge.left": (2.5, cy + 2.0),
|
|
||||||
"bridge.right": (cube_w - 2.5, cy + 2.0),
|
|
||||||
"noc": (cx + 2.0, cy),
|
|
||||||
"sram": (2.5, cy - 1.5),
|
"sram": (2.5, cy - 1.5),
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -359,16 +354,21 @@ def _instantiate_cube(
|
|||||||
) -> None:
|
) -> None:
|
||||||
"""Add all cube-internal nodes and edges, including PE instances.
|
"""Add all cube-internal nodes and edges, including PE instances.
|
||||||
|
|
||||||
Topology: PE_DMA → NOC → xbar_top/bot → HBM_CTRL.
|
Topology: explicit router mesh from cube_mesh.yaml (ADR-0019).
|
||||||
No per-PE xbar nodes; position-aware XBAR top/bottom replaces chaining.
|
Each router is a separate SimPy node. Components attach to routers
|
||||||
|
based on cube_mesh.yaml attachment lists.
|
||||||
"""
|
"""
|
||||||
cube_w = cube["geometry"]["cube_mm"]["w"]
|
cube_w = cube["geometry"]["cube_mm"]["w"]
|
||||||
cube_h = cube["geometry"]["cube_mm"]["h"]
|
cube_h = cube["geometry"]["cube_mm"]["h"]
|
||||||
ox, oy = origin
|
ox, oy = origin
|
||||||
local_pos = _cube_local_positions(cube_w, cube_h)
|
local_pos = _cube_local_positions(cube_w, cube_h)
|
||||||
clinks = cube["links"]
|
clinks = cube["links"]
|
||||||
n_slices = cube["memory_map"]["hbm_slices_per_cube"]
|
mm = cube["memory_map"]
|
||||||
half = n_slices // 2
|
|
||||||
|
# ── Mode branch (ADR-0019) ──
|
||||||
|
mode = mm.get("hbm_mapping_mode", "n_to_one")
|
||||||
|
if mode == "one_to_one":
|
||||||
|
raise NotImplementedError("1:1 mode: ADR-0019 D3")
|
||||||
|
|
||||||
# ── UCIe ports + connection nodes ──
|
# ── UCIe ports + connection nodes ──
|
||||||
ucie_cfg = cube["ucie"]
|
ucie_cfg = cube["ucie"]
|
||||||
@@ -391,8 +391,8 @@ def _instantiate_cube(
|
|||||||
label=f"UCIe-{port} C{ci}",
|
label=f"UCIe-{port} C{ci}",
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── Named components: noc, m_cpu, sram ──
|
# ── Named components: m_cpu, sram (noc is now explicit routers) ──
|
||||||
for name in ("noc", "m_cpu", "sram"):
|
for name in ("m_cpu", "sram"):
|
||||||
c = cube["components"][name]
|
c = cube["components"][name]
|
||||||
nid = f"{cp}.{name}"
|
nid = f"{cp}.{name}"
|
||||||
lx, ly = local_pos[name]
|
lx, ly = local_pos[name]
|
||||||
@@ -402,49 +402,96 @@ def _instantiate_cube(
|
|||||||
label=name.upper().replace("_", " "),
|
label=name.upper().replace("_", " "),
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── xbar_top and xbar_bot (position-aware XBAR) ──
|
# ── HBM controller (single node, ADR-0019 D1) ──
|
||||||
xbar_spec = cube["components"]["xbar"]
|
|
||||||
for xbar_name, xbar_cfg in [("xbar_top", xbar_spec["top"]),
|
|
||||||
("xbar_bot", xbar_spec["bottom"])]:
|
|
||||||
nid = f"{cp}.{xbar_name}"
|
|
||||||
lx, ly = local_pos[xbar_name]
|
|
||||||
nodes[nid] = Node(
|
|
||||||
id=nid, kind=xbar_cfg["kind"], impl=xbar_cfg["impl"],
|
|
||||||
attrs=xbar_cfg["attrs"], pos_mm=(ox + lx, oy + ly),
|
|
||||||
label=xbar_name.upper().replace("_", " "),
|
|
||||||
)
|
|
||||||
|
|
||||||
# ── HBM controller slices ──
|
|
||||||
hbm_spec = cube["components"]["hbm_ctrl"]
|
hbm_spec = cube["components"]["hbm_ctrl"]
|
||||||
hbm_lx, hbm_ly = local_pos["hbm_ctrl"]
|
hbm_lx, hbm_ly = local_pos["hbm_ctrl"]
|
||||||
for sl in range(n_slices):
|
hbm_id = f"{cp}.hbm_ctrl"
|
||||||
sid = f"{cp}.hbm_ctrl.slice{sl}"
|
nodes[hbm_id] = Node(
|
||||||
nodes[sid] = Node(
|
id=hbm_id, kind=hbm_spec["kind"], impl=hbm_spec["impl"],
|
||||||
id=sid, kind=hbm_spec["kind"], impl=hbm_spec["impl"],
|
|
||||||
attrs=hbm_spec["attrs"], pos_mm=(ox + hbm_lx, oy + hbm_ly),
|
attrs=hbm_spec["attrs"], pos_mm=(ox + hbm_lx, oy + hbm_ly),
|
||||||
label=f"HBM SLICE{sl}",
|
label="HBM CTRL",
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── Bridges ──
|
# ── Router mesh from cube_mesh.yaml (ADR-0019 D3) ──
|
||||||
for br in xbar_spec["bridges"]:
|
routers = mesh_data["routers"]
|
||||||
bname = br["id"]
|
router_spec = cube["components"]["noc_router"]
|
||||||
nid = f"{cp}.bridge.{bname}"
|
router_bw = clinks.get("router_link_bw_gbs", 256.0)
|
||||||
lx, ly = local_pos[f"bridge.{bname}"]
|
pe_to_router_bw = clinks.get("pe_to_router_bw_gbs", 256.0)
|
||||||
nodes[nid] = Node(
|
hbm_eff = float(hbm_spec.get("attrs", {}).get("efficiency", 1.0))
|
||||||
id=nid, kind=br["kind"], impl=br["impl"],
|
hbm_to_router_bw = clinks.get("hbm_to_router_bw_gbs", 256.0) * hbm_eff
|
||||||
attrs=br["attrs"], pos_mm=(ox + lx, oy + ly),
|
sram_to_router_bw = clinks.get("sram_to_router_bw_gbs", 128.0)
|
||||||
label=f"Bridge {bname.upper()}",
|
ucie_conn_bw = ucie_cfg.get("per_connection_bw_gbs", 128.0)
|
||||||
|
|
||||||
|
n_rows = mesh_data["mesh"]["rows"]
|
||||||
|
n_cols = mesh_data["mesh"]["cols"]
|
||||||
|
|
||||||
|
# Create router nodes
|
||||||
|
for rkey, rval in routers.items():
|
||||||
|
if rval is None:
|
||||||
|
continue
|
||||||
|
rid = f"{cp}.{rkey}"
|
||||||
|
rx, ry = rval["pos_mm"]
|
||||||
|
nodes[rid] = Node(
|
||||||
|
id=rid, kind=router_spec["kind"], impl=router_spec["impl"],
|
||||||
|
attrs=router_spec["attrs"], pos_mm=(ox + rx, oy + ry),
|
||||||
|
label=rkey.upper(),
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── PE instances (no per-PE xbar nodes) ──
|
# Router ↔ router XY mesh edges (adjacent non-null routers)
|
||||||
|
for r in range(n_rows):
|
||||||
|
for c in range(n_cols):
|
||||||
|
rkey = f"r{r}c{c}"
|
||||||
|
if routers.get(rkey) is None:
|
||||||
|
continue
|
||||||
|
src_id = f"{cp}.{rkey}"
|
||||||
|
src_pos = routers[rkey]["pos_mm"]
|
||||||
|
|
||||||
|
# Horizontal neighbor (same row, next col)
|
||||||
|
for nc in range(c + 1, n_cols):
|
||||||
|
nkey = f"r{r}c{nc}"
|
||||||
|
if routers.get(nkey) is None:
|
||||||
|
continue
|
||||||
|
dst_id = f"{cp}.{nkey}"
|
||||||
|
dst_pos = routers[nkey]["pos_mm"]
|
||||||
|
dist = abs(dst_pos[0] - src_pos[0])
|
||||||
|
edges.append(Edge(
|
||||||
|
src=src_id, dst=dst_id,
|
||||||
|
distance_mm=round(dist, 2), bw_gbs=router_bw,
|
||||||
|
kind="router_mesh",
|
||||||
|
))
|
||||||
|
edges.append(Edge(
|
||||||
|
src=dst_id, dst=src_id,
|
||||||
|
distance_mm=round(dist, 2), bw_gbs=router_bw,
|
||||||
|
kind="router_mesh",
|
||||||
|
))
|
||||||
|
break # only immediate neighbor
|
||||||
|
|
||||||
|
# Vertical neighbor (same col, next row)
|
||||||
|
for nr in range(r + 1, n_rows):
|
||||||
|
nkey = f"r{nr}c{c}"
|
||||||
|
if routers.get(nkey) is None:
|
||||||
|
continue
|
||||||
|
dst_id = f"{cp}.{nkey}"
|
||||||
|
dst_pos = routers[nkey]["pos_mm"]
|
||||||
|
dist = abs(dst_pos[1] - src_pos[1])
|
||||||
|
edges.append(Edge(
|
||||||
|
src=src_id, dst=dst_id,
|
||||||
|
distance_mm=round(dist, 2), bw_gbs=router_bw,
|
||||||
|
kind="router_mesh",
|
||||||
|
))
|
||||||
|
edges.append(Edge(
|
||||||
|
src=dst_id, dst=src_id,
|
||||||
|
distance_mm=round(dist, 2), bw_gbs=router_bw,
|
||||||
|
kind="router_mesh",
|
||||||
|
))
|
||||||
|
break # only immediate neighbor
|
||||||
|
|
||||||
|
# ── PE instances ──
|
||||||
corners = cube["pe_layout"]["corners"]
|
corners = cube["pe_layout"]["corners"]
|
||||||
pe_per_corner = cube["pe_layout"]["pe_per_corner"]
|
pe_per_corner = cube["pe_layout"]["pe_per_corner"]
|
||||||
corner_pos = _corner_pe_positions(cube_w, cube_h)
|
corner_pos = _corner_pe_positions(cube_w, cube_h)
|
||||||
pe_tmpl = cube["pe_template"]
|
pe_tmpl = cube["pe_template"]
|
||||||
pe_links = pe_tmpl["links"]
|
pe_links = pe_tmpl["links"]
|
||||||
pe_noc_distances = _compute_pe_noc_distances(
|
|
||||||
mesh_data, corner_pos, corners, pe_per_corner,
|
|
||||||
)
|
|
||||||
|
|
||||||
pe_idx = 0
|
pe_idx = 0
|
||||||
for corner in corners:
|
for corner in corners:
|
||||||
@@ -465,118 +512,90 @@ def _instantiate_cube(
|
|||||||
|
|
||||||
# PE-internal edges
|
# PE-internal edges
|
||||||
_add_pe_internal_edges(edges, pp, pe_links)
|
_add_pe_internal_edges(edges, pp, pe_links)
|
||||||
|
|
||||||
# PE_DMA → noc (distance auto-computed from PE physical position)
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{pp}.pe_dma", dst=f"{cp}.noc",
|
|
||||||
distance_mm=pe_noc_distances.get(pe_idx, 0.0),
|
|
||||||
bw_gbs=clinks["pe_dma_to_noc_bw_gbs"],
|
|
||||||
kind="pe_to_noc",
|
|
||||||
))
|
|
||||||
|
|
||||||
# noc → PE_DMA (response delivery, reverse of pe_to_noc)
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{cp}.noc", dst=f"{pp}.pe_dma",
|
|
||||||
distance_mm=pe_noc_distances.get(pe_idx, 0.0),
|
|
||||||
bw_gbs=clinks["pe_dma_to_noc_bw_gbs"],
|
|
||||||
kind="noc_to_pe",
|
|
||||||
))
|
|
||||||
|
|
||||||
# noc → PE_CPU (command delivery)
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{cp}.noc", dst=f"{pp}.pe_cpu",
|
|
||||||
distance_mm=clinks["noc_to_pe_cpu_mm"],
|
|
||||||
kind="command",
|
|
||||||
))
|
|
||||||
|
|
||||||
# PE_CPU → noc (response delivery, reverse of command)
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{pp}.pe_cpu", dst=f"{cp}.noc",
|
|
||||||
distance_mm=clinks["noc_to_pe_cpu_mm"],
|
|
||||||
kind="pe_response",
|
|
||||||
))
|
|
||||||
|
|
||||||
# noc → PE_MMU (MMU mapping install)
|
|
||||||
pe_mmu_id = f"{pp}.pe_mmu"
|
|
||||||
if pe_mmu_id in nodes:
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{cp}.noc", dst=pe_mmu_id,
|
|
||||||
distance_mm=clinks.get("noc_to_pe_mmu_mm", 0.0),
|
|
||||||
kind="command",
|
|
||||||
))
|
|
||||||
|
|
||||||
pe_idx += 1
|
pe_idx += 1
|
||||||
|
|
||||||
# ── xbar_top/bot → HBM slices ──
|
# ── Component ↔ router edges (based on cube_mesh.yaml attach) ──
|
||||||
hbm_eff = float(hbm_spec.get("attrs", {}).get("efficiency", 1.0))
|
for rkey, rval in routers.items():
|
||||||
hbm_bw = clinks["xbar_to_hbm_bw_gbs"] * hbm_eff
|
if rval is None:
|
||||||
for i in range(half):
|
continue
|
||||||
|
rid = f"{cp}.{rkey}"
|
||||||
|
for item in rval.get("attach", []):
|
||||||
|
if item.endswith(".dma"):
|
||||||
|
# PE_DMA ↔ router
|
||||||
|
pe_prefix = item.rsplit(".", 1)[0]
|
||||||
|
dma_id = f"{cp}.{pe_prefix}.pe_dma"
|
||||||
|
if dma_id in nodes:
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.xbar_top", dst=f"{cp}.hbm_ctrl.slice{i}",
|
src=dma_id, dst=rid,
|
||||||
distance_mm=clinks["xbar_to_hbm_mm"],
|
distance_mm=0.0, bw_gbs=pe_to_router_bw,
|
||||||
bw_gbs=hbm_bw,
|
kind="pe_to_router",
|
||||||
kind="xbar_to_hbm",
|
|
||||||
))
|
))
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.hbm_ctrl.slice{i}", dst=f"{cp}.xbar_top",
|
src=rid, dst=dma_id,
|
||||||
distance_mm=clinks["xbar_to_hbm_mm"],
|
distance_mm=0.0, bw_gbs=pe_to_router_bw,
|
||||||
bw_gbs=hbm_bw,
|
kind="router_to_pe",
|
||||||
kind="hbm_to_xbar",
|
|
||||||
))
|
))
|
||||||
for i in range(half, n_slices):
|
elif item.endswith(".cpu"):
|
||||||
|
# PE_CPU ↔ router (command path)
|
||||||
|
pe_prefix = item.rsplit(".", 1)[0]
|
||||||
|
cpu_id = f"{cp}.{pe_prefix}.pe_cpu"
|
||||||
|
if cpu_id in nodes:
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.xbar_bot", dst=f"{cp}.hbm_ctrl.slice{i}",
|
src=rid, dst=cpu_id,
|
||||||
distance_mm=clinks["xbar_to_hbm_mm"],
|
distance_mm=clinks.get("noc_to_pe_cpu_mm", 0.0),
|
||||||
bw_gbs=hbm_bw,
|
kind="command",
|
||||||
kind="xbar_to_hbm",
|
|
||||||
))
|
))
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.hbm_ctrl.slice{i}", dst=f"{cp}.xbar_bot",
|
src=cpu_id, dst=rid,
|
||||||
distance_mm=clinks["xbar_to_hbm_mm"],
|
distance_mm=clinks.get("noc_to_pe_cpu_mm", 0.0),
|
||||||
bw_gbs=hbm_bw,
|
kind="pe_response",
|
||||||
kind="hbm_to_xbar",
|
|
||||||
))
|
))
|
||||||
|
# PE_MMU ↔ router (mapping install path)
|
||||||
# ── NOC ↔ xbar_top/bot ──
|
mmu_id = f"{cp}.{pe_prefix}.pe_mmu"
|
||||||
# xbar_top: primary (low routing weight), xbar_bot: secondary (high routing weight
|
if mmu_id in nodes:
|
||||||
# steers Dijkstra through xbar_top→bridge→xbar_bot for cross-half access)
|
|
||||||
noc_xbar_bw = clinks.get("noc_to_xbar_bw_gbs", 256.0)
|
|
||||||
noc_xbar_mm = clinks.get("noc_to_xbar_mm", 0.0)
|
|
||||||
for xbar_name, rw in [("xbar_top", None), ("xbar_bot", 100.0)]:
|
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.noc", dst=f"{cp}.{xbar_name}",
|
src=rid, dst=mmu_id,
|
||||||
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
|
distance_mm=0.0,
|
||||||
routing_weight_mm=rw, kind="noc_to_xbar",
|
kind="command",
|
||||||
|
))
|
||||||
|
elif item.endswith(".hbm"):
|
||||||
|
pass # HBM edges handled below (all routers)
|
||||||
|
elif item == "m_cpu":
|
||||||
|
# M_CPU ↔ router
|
||||||
|
mcpu_id = f"{cp}.m_cpu"
|
||||||
|
edges.append(Edge(
|
||||||
|
src=mcpu_id, dst=rid,
|
||||||
|
distance_mm=clinks.get("m_cpu_to_router_mm", 0.0),
|
||||||
|
kind="command",
|
||||||
))
|
))
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.{xbar_name}", dst=f"{cp}.noc",
|
src=rid, dst=mcpu_id,
|
||||||
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
|
distance_mm=clinks.get("m_cpu_to_router_mm", 0.0),
|
||||||
routing_weight_mm=rw, kind="xbar_to_noc",
|
kind="command",
|
||||||
))
|
))
|
||||||
|
elif item == "sram":
|
||||||
# ── Bridge connections: xbar_top ↔ bridge ↔ xbar_bot ──
|
# SRAM ↔ router
|
||||||
bridge_mm = clinks.get("xbar_to_bridge_mm", 3.0)
|
sram_id = f"{cp}.sram"
|
||||||
bridge_bw = clinks.get("xbar_to_bridge_bw_gbs", 128.0)
|
|
||||||
for bname in ("left", "right"):
|
|
||||||
br_node = f"{cp}.bridge.{bname}"
|
|
||||||
for xbar_name in ("xbar_top", "xbar_bot"):
|
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.{xbar_name}", dst=br_node,
|
src=sram_id, dst=rid,
|
||||||
distance_mm=bridge_mm, bw_gbs=bridge_bw,
|
distance_mm=0.0, bw_gbs=sram_to_router_bw,
|
||||||
kind="xbar_to_bridge",
|
kind="sram_to_router",
|
||||||
))
|
))
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=br_node, dst=f"{cp}.{xbar_name}",
|
src=rid, dst=sram_id,
|
||||||
distance_mm=bridge_mm, bw_gbs=bridge_bw,
|
distance_mm=0.0, bw_gbs=sram_to_router_bw,
|
||||||
kind="bridge_to_xbar",
|
kind="router_to_sram",
|
||||||
))
|
))
|
||||||
|
elif item.startswith("ucie_"):
|
||||||
# ── UCIe ↔ conn ↔ NOC ──
|
# UCIe conn ↔ router
|
||||||
ucie_conn_bw = ucie_cfg.get("per_connection_bw_gbs", 128.0)
|
# item format: "ucie_{dir}.c{i}" e.g. "ucie_n.c0"
|
||||||
for port in ucie_cfg["ports"]:
|
parts = item.split(".")
|
||||||
ucie_id = f"{cp}.ucie-{port}"
|
direction = parts[0].replace("ucie_", "").upper()
|
||||||
for ci in range(ucie_n_conn):
|
conn_num = parts[1].replace("c", "") # "0", "1", etc.
|
||||||
conn_id = f"{cp}.ucie-{port}.conn{ci}"
|
conn_id = f"{cp}.ucie-{direction}.conn{conn_num}"
|
||||||
|
ucie_id = f"{cp}.ucie-{direction}"
|
||||||
|
# conn ↔ ucie port
|
||||||
|
if conn_id in nodes:
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=ucie_id, dst=conn_id,
|
src=ucie_id, dst=conn_id,
|
||||||
distance_mm=0.0, kind="ucie_internal",
|
distance_mm=0.0, kind="ucie_internal",
|
||||||
@@ -585,44 +604,35 @@ def _instantiate_cube(
|
|||||||
src=conn_id, dst=ucie_id,
|
src=conn_id, dst=ucie_id,
|
||||||
distance_mm=0.0, kind="ucie_internal",
|
distance_mm=0.0, kind="ucie_internal",
|
||||||
))
|
))
|
||||||
|
# conn ↔ router
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=conn_id, dst=f"{cp}.noc",
|
src=conn_id, dst=rid,
|
||||||
distance_mm=0.0, bw_gbs=ucie_conn_bw,
|
distance_mm=0.0, bw_gbs=ucie_conn_bw,
|
||||||
kind="ucie_conn_to_noc",
|
kind="ucie_conn_to_router",
|
||||||
))
|
))
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.noc", dst=conn_id,
|
src=rid, dst=conn_id,
|
||||||
distance_mm=0.0, bw_gbs=ucie_conn_bw,
|
distance_mm=0.0, bw_gbs=ucie_conn_bw,
|
||||||
kind="noc_to_ucie_conn",
|
kind="router_to_ucie_conn",
|
||||||
))
|
))
|
||||||
|
|
||||||
# ── m_cpu ↔ noc (command dispatch) ──
|
# ── HBM_CTRL ↔ all routers (ADR-0019 D1) ──
|
||||||
|
# High routing weight prevents Dijkstra from using HBM as transit shortcut
|
||||||
|
for rkey, rval in routers.items():
|
||||||
|
if rval is None:
|
||||||
|
continue
|
||||||
|
rid = f"{cp}.{rkey}"
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.m_cpu", dst=f"{cp}.noc",
|
src=rid, dst=hbm_id,
|
||||||
distance_mm=clinks["m_cpu_to_noc_mm"],
|
distance_mm=0.0, bw_gbs=hbm_to_router_bw,
|
||||||
kind="command",
|
routing_weight_mm=1000.0,
|
||||||
|
kind="router_to_hbm",
|
||||||
))
|
))
|
||||||
edges.append(Edge(
|
edges.append(Edge(
|
||||||
src=f"{cp}.noc", dst=f"{cp}.m_cpu",
|
src=hbm_id, dst=rid,
|
||||||
distance_mm=clinks["m_cpu_to_noc_mm"],
|
distance_mm=0.0, bw_gbs=hbm_to_router_bw,
|
||||||
kind="command",
|
routing_weight_mm=1000.0,
|
||||||
))
|
kind="hbm_to_router",
|
||||||
|
|
||||||
# ── noc ↔ sram ──
|
|
||||||
_noc_sram = clinks["noc_to_sram"]
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{cp}.noc", dst=f"{cp}.sram",
|
|
||||||
distance_mm=clinks["noc_to_sram_mm"],
|
|
||||||
bw_gbs=_noc_sram["per_connection_bw_gbs"],
|
|
||||||
n_connections=_noc_sram["n_connections"],
|
|
||||||
kind="noc_to_sram",
|
|
||||||
))
|
|
||||||
edges.append(Edge(
|
|
||||||
src=f"{cp}.sram", dst=f"{cp}.noc",
|
|
||||||
distance_mm=clinks["noc_to_sram_mm"],
|
|
||||||
bw_gbs=_noc_sram["per_connection_bw_gbs"],
|
|
||||||
n_connections=_noc_sram["n_connections"],
|
|
||||||
kind="noc_to_sram",
|
|
||||||
))
|
))
|
||||||
|
|
||||||
|
|
||||||
@@ -901,8 +911,8 @@ def _build_cube_view(spec: dict) -> ViewGraph:
|
|||||||
label=f"UCIe-{port} C{ci}",
|
label=f"UCIe-{port} C{ci}",
|
||||||
)
|
)
|
||||||
|
|
||||||
# Named components (hbm_ctrl as single representative node in view)
|
# Named components (hbm_ctrl as single node in view)
|
||||||
for name in ("noc", "m_cpu", "hbm_ctrl", "sram"):
|
for name in ("m_cpu", "hbm_ctrl", "sram"):
|
||||||
c = cube["components"][name]
|
c = cube["components"][name]
|
||||||
lx, ly = local_pos.get(name, local_pos.get("hbm_ctrl"))
|
lx, ly = local_pos.get(name, local_pos.get("hbm_ctrl"))
|
||||||
nodes[name] = Node(
|
nodes[name] = Node(
|
||||||
@@ -911,159 +921,139 @@ def _build_cube_view(spec: dict) -> ViewGraph:
|
|||||||
label=name.upper().replace("_", " "),
|
label=name.upper().replace("_", " "),
|
||||||
)
|
)
|
||||||
|
|
||||||
# xbar_top, xbar_bot
|
# Load mesh data early (needed for router nodes + PE distances)
|
||||||
xbar_spec = cube["components"]["xbar"]
|
mesh_data = spec.get("_mesh", {})
|
||||||
for xbar_name, xbar_cfg in [("xbar_top", xbar_spec["top"]),
|
|
||||||
("xbar_bot", xbar_spec["bottom"])]:
|
# Router nodes from cube_mesh.yaml (explicit in view)
|
||||||
lx, ly = local_pos[xbar_name]
|
router_spec = cube["components"]["noc_router"]
|
||||||
nodes[xbar_name] = Node(
|
routers = mesh_data.get("routers", {})
|
||||||
id=xbar_name, kind=xbar_cfg["kind"], impl=xbar_cfg["impl"],
|
for rkey, rval in routers.items():
|
||||||
attrs=xbar_cfg["attrs"], pos_mm=(lx, ly),
|
if rval is None:
|
||||||
label=xbar_name.upper().replace("_", " "),
|
continue
|
||||||
|
rx, ry = rval["pos_mm"]
|
||||||
|
nodes[rkey] = Node(
|
||||||
|
id=rkey, kind=router_spec["kind"], impl=router_spec["impl"],
|
||||||
|
attrs=router_spec["attrs"], pos_mm=(rx, ry),
|
||||||
|
label=rkey.upper(),
|
||||||
)
|
)
|
||||||
|
|
||||||
# Bridges
|
# PEs as opaque blocks
|
||||||
for br in xbar_spec["bridges"]:
|
|
||||||
bname = br["id"]
|
|
||||||
bid = f"bridge.{bname}"
|
|
||||||
lx, ly = local_pos[bid]
|
|
||||||
nodes[bid] = Node(
|
|
||||||
id=bid, kind=br["kind"], impl=br["impl"],
|
|
||||||
attrs=br["attrs"], pos_mm=(lx, ly),
|
|
||||||
label=f"Bridge {bname.upper()}",
|
|
||||||
)
|
|
||||||
|
|
||||||
# PEs as opaque blocks (no per-PE xbar nodes)
|
|
||||||
corners = cube["pe_layout"]["corners"]
|
corners = cube["pe_layout"]["corners"]
|
||||||
pe_per_corner = cube["pe_layout"]["pe_per_corner"]
|
pe_per_corner = cube["pe_layout"]["pe_per_corner"]
|
||||||
corner_pos = _corner_pe_positions(cube_w, cube_h)
|
corner_pos = _corner_pe_positions(cube_w, cube_h)
|
||||||
mesh_data = spec.get("_mesh", {})
|
|
||||||
pe_noc_distances = _compute_pe_noc_distances(
|
pe_noc_distances = _compute_pe_noc_distances(
|
||||||
mesh_data, corner_pos, corners, pe_per_corner,
|
mesh_data, corner_pos, corners, pe_per_corner,
|
||||||
) if mesh_data else {}
|
) if mesh_data else {}
|
||||||
|
|
||||||
pe_idx = 0
|
pe_idx = 0
|
||||||
|
pe_offset_y = 1.2 # mm offset to avoid overlapping router node
|
||||||
for corner in corners:
|
for corner in corners:
|
||||||
|
is_top = corner in ("NW", "NE")
|
||||||
for ci in range(pe_per_corner):
|
for ci in range(pe_per_corner):
|
||||||
pid = f"pe{pe_idx}"
|
pid = f"pe{pe_idx}"
|
||||||
px, py = corner_pos[corner][ci]
|
px, py = corner_pos[corner][ci]
|
||||||
|
# Offset PE above (top) or below (bottom) its router
|
||||||
|
py_view = py - pe_offset_y if is_top else py + pe_offset_y
|
||||||
nodes[pid] = Node(
|
nodes[pid] = Node(
|
||||||
id=pid, kind="pe", impl="",
|
id=pid, kind="pe", impl="",
|
||||||
attrs={"corner": corner}, pos_mm=(px, py),
|
attrs={"corner": corner}, pos_mm=(px, py_view),
|
||||||
label=f"PE{pe_idx}",
|
label=f"PE{pe_idx}",
|
||||||
)
|
)
|
||||||
# PE → noc (distance auto-computed from PE physical position)
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src=pid, dst="noc",
|
|
||||||
distance_mm=pe_noc_distances.get(pe_idx, 0.0),
|
|
||||||
bw_gbs=clinks["pe_dma_to_noc_bw_gbs"],
|
|
||||||
kind="pe_to_noc",
|
|
||||||
))
|
|
||||||
# noc → PE (command delivery)
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="noc", dst=pid,
|
|
||||||
distance_mm=clinks["noc_to_pe_cpu_mm"],
|
|
||||||
kind="command",
|
|
||||||
))
|
|
||||||
pe_idx += 1
|
pe_idx += 1
|
||||||
|
|
||||||
# xbar_top/bot → hbm_ctrl
|
# View edges based on cube_mesh.yaml attach (mirrors _instantiate_cube logic)
|
||||||
view_edges.append(Edge(
|
pe_to_router_bw = clinks.get("pe_to_router_bw_gbs", 256.0)
|
||||||
src="xbar_top", dst="hbm_ctrl",
|
hbm_to_router_bw = clinks.get("hbm_to_router_bw_gbs", 256.0)
|
||||||
distance_mm=clinks["xbar_to_hbm_mm"],
|
sram_bw = clinks.get("sram_to_router_bw_gbs", 128.0)
|
||||||
bw_gbs=clinks["xbar_to_hbm_bw_gbs"],
|
|
||||||
kind="xbar_to_hbm",
|
|
||||||
))
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="xbar_bot", dst="hbm_ctrl",
|
|
||||||
distance_mm=clinks["xbar_to_hbm_mm"],
|
|
||||||
bw_gbs=clinks["xbar_to_hbm_bw_gbs"],
|
|
||||||
kind="xbar_to_hbm",
|
|
||||||
))
|
|
||||||
|
|
||||||
# noc ↔ xbar_top/bot
|
|
||||||
noc_xbar_bw = clinks.get("noc_to_xbar_bw_gbs", 256.0)
|
|
||||||
noc_xbar_mm = clinks.get("noc_to_xbar_mm", 0.0)
|
|
||||||
for xbar_name in ("xbar_top", "xbar_bot"):
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="noc", dst=xbar_name,
|
|
||||||
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
|
|
||||||
kind="noc_to_xbar",
|
|
||||||
))
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src=xbar_name, dst="noc",
|
|
||||||
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
|
|
||||||
kind="xbar_to_noc",
|
|
||||||
))
|
|
||||||
|
|
||||||
# bridge connections: xbar_top ↔ bridge ↔ xbar_bot
|
|
||||||
bridge_mm = clinks.get("xbar_to_bridge_mm", 3.0)
|
|
||||||
bridge_bw = clinks.get("xbar_to_bridge_bw_gbs", 128.0)
|
|
||||||
for bname in ("left", "right"):
|
|
||||||
br_id = f"bridge.{bname}"
|
|
||||||
for xbar_name in ("xbar_top", "xbar_bot"):
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src=xbar_name, dst=br_id,
|
|
||||||
distance_mm=bridge_mm, bw_gbs=bridge_bw,
|
|
||||||
kind="xbar_to_bridge",
|
|
||||||
))
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src=br_id, dst=xbar_name,
|
|
||||||
distance_mm=bridge_mm, bw_gbs=bridge_bw,
|
|
||||||
kind="bridge_to_xbar",
|
|
||||||
))
|
|
||||||
|
|
||||||
ucie_conn_bw_v = ucie_cfg.get("per_connection_bw_gbs", 128.0)
|
ucie_conn_bw_v = ucie_cfg.get("per_connection_bw_gbs", 128.0)
|
||||||
for port in ucie_cfg["ports"]:
|
n_rows = mesh_data.get("mesh", {}).get("rows", 6)
|
||||||
for ci in range(ucie_n_conn):
|
n_cols = mesh_data.get("mesh", {}).get("cols", 6)
|
||||||
conn_id = f"ucie-{port}.conn{ci}"
|
|
||||||
|
# Router ↔ router mesh edges
|
||||||
|
for r in range(n_rows):
|
||||||
|
for c in range(n_cols):
|
||||||
|
rkey = f"r{r}c{c}"
|
||||||
|
if routers.get(rkey) is None:
|
||||||
|
continue
|
||||||
|
src_pos = routers[rkey]["pos_mm"]
|
||||||
|
# Horizontal neighbor
|
||||||
|
for nc in range(c + 1, n_cols):
|
||||||
|
nkey = f"r{r}c{nc}"
|
||||||
|
if routers.get(nkey) is None:
|
||||||
|
continue
|
||||||
|
dist = abs(routers[nkey]["pos_mm"][0] - src_pos[0])
|
||||||
view_edges.append(Edge(
|
view_edges.append(Edge(
|
||||||
src="noc", dst=conn_id,
|
src=rkey, dst=nkey, distance_mm=round(dist, 2),
|
||||||
distance_mm=0.0, bw_gbs=ucie_conn_bw_v,
|
kind="router_mesh",
|
||||||
kind="noc_to_ucie_conn",
|
))
|
||||||
|
break
|
||||||
|
# Vertical neighbor
|
||||||
|
for nr in range(r + 1, n_rows):
|
||||||
|
nkey = f"r{nr}c{c}"
|
||||||
|
if routers.get(nkey) is None:
|
||||||
|
continue
|
||||||
|
dist = abs(routers[nkey]["pos_mm"][1] - src_pos[1])
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=rkey, dst=nkey, distance_mm=round(dist, 2),
|
||||||
|
kind="router_mesh",
|
||||||
|
))
|
||||||
|
break
|
||||||
|
|
||||||
|
# Component ↔ router edges from attach lists
|
||||||
|
for rkey, rval in routers.items():
|
||||||
|
if rval is None:
|
||||||
|
continue
|
||||||
|
for item in rval.get("attach", []):
|
||||||
|
if item.endswith(".dma"):
|
||||||
|
pe_prefix = item.rsplit(".", 1)[0]
|
||||||
|
pid = pe_prefix.replace("pe", "pe") # "pe0" → "pe0"
|
||||||
|
if pid in nodes:
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=pid, dst=rkey, distance_mm=0.0,
|
||||||
|
bw_gbs=pe_to_router_bw, kind="pe_to_router",
|
||||||
))
|
))
|
||||||
view_edges.append(Edge(
|
view_edges.append(Edge(
|
||||||
src=conn_id, dst=f"ucie-{port}",
|
src=rkey, dst=pid, distance_mm=0.0,
|
||||||
|
kind="command",
|
||||||
|
))
|
||||||
|
elif item.endswith(".hbm"):
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=rkey, dst="hbm_ctrl", distance_mm=0.0,
|
||||||
|
bw_gbs=hbm_to_router_bw, kind="router_to_hbm",
|
||||||
|
))
|
||||||
|
elif item == "m_cpu":
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src="m_cpu", dst=rkey, distance_mm=0.0, kind="command",
|
||||||
|
))
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=rkey, dst="m_cpu", distance_mm=0.0, kind="command",
|
||||||
|
))
|
||||||
|
elif item == "sram":
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src="sram", dst=rkey, distance_mm=0.0,
|
||||||
|
bw_gbs=sram_bw, kind="router_to_sram",
|
||||||
|
))
|
||||||
|
elif item.startswith("ucie_"):
|
||||||
|
parts = item.split(".")
|
||||||
|
direction = parts[0].replace("ucie_", "").upper()
|
||||||
|
conn_num = parts[1].replace("c", "")
|
||||||
|
conn_id = f"ucie-{direction}.conn{conn_num}"
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=rkey, dst=conn_id, distance_mm=0.0,
|
||||||
|
bw_gbs=ucie_conn_bw_v, kind="router_to_ucie_conn",
|
||||||
|
))
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=conn_id, dst=rkey, distance_mm=0.0,
|
||||||
|
bw_gbs=ucie_conn_bw_v, kind="ucie_conn_to_router",
|
||||||
|
))
|
||||||
|
view_edges.append(Edge(
|
||||||
|
src=conn_id, dst=f"ucie-{direction}",
|
||||||
distance_mm=0.0, kind="ucie_internal",
|
distance_mm=0.0, kind="ucie_internal",
|
||||||
))
|
))
|
||||||
view_edges.append(Edge(
|
view_edges.append(Edge(
|
||||||
src=f"ucie-{port}", dst=conn_id,
|
src=f"ucie-{direction}", dst=conn_id,
|
||||||
distance_mm=0.0, kind="ucie_internal",
|
distance_mm=0.0, kind="ucie_internal",
|
||||||
))
|
))
|
||||||
view_edges.append(Edge(
|
|
||||||
src=conn_id, dst="noc",
|
|
||||||
distance_mm=0.0, bw_gbs=ucie_conn_bw_v,
|
|
||||||
kind="ucie_conn_to_noc",
|
|
||||||
))
|
|
||||||
|
|
||||||
# m_cpu ↔ noc
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="m_cpu", dst="noc",
|
|
||||||
distance_mm=clinks["m_cpu_to_noc_mm"],
|
|
||||||
kind="command",
|
|
||||||
))
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="noc", dst="m_cpu",
|
|
||||||
distance_mm=clinks["m_cpu_to_noc_mm"],
|
|
||||||
kind="command",
|
|
||||||
))
|
|
||||||
|
|
||||||
# noc ↔ sram
|
|
||||||
_noc_sram_v = clinks["noc_to_sram"]
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="noc", dst="sram",
|
|
||||||
distance_mm=clinks["noc_to_sram_mm"],
|
|
||||||
bw_gbs=_noc_sram_v["per_connection_bw_gbs"],
|
|
||||||
n_connections=_noc_sram_v["n_connections"],
|
|
||||||
kind="noc_to_sram",
|
|
||||||
))
|
|
||||||
view_edges.append(Edge(
|
|
||||||
src="sram", dst="noc",
|
|
||||||
distance_mm=clinks["noc_to_sram_mm"],
|
|
||||||
bw_gbs=_noc_sram_v["per_connection_bw_gbs"],
|
|
||||||
n_connections=_noc_sram_v["n_connections"],
|
|
||||||
kind="noc_to_sram",
|
|
||||||
))
|
|
||||||
|
|
||||||
return ViewGraph(
|
return ViewGraph(
|
||||||
name="cube", nodes=nodes, edges=view_edges,
|
name="cube", nodes=nodes, edges=view_edges,
|
||||||
|
|||||||
@@ -50,6 +50,10 @@ def _compute_source_hash(cube_spec: dict) -> str:
|
|||||||
"geometry": cube_spec["geometry"],
|
"geometry": cube_spec["geometry"],
|
||||||
"pe_layout": cube_spec["pe_layout"],
|
"pe_layout": cube_spec["pe_layout"],
|
||||||
"ucie_n_connections": cube_spec["ucie"]["n_connections"],
|
"ucie_n_connections": cube_spec["ucie"]["n_connections"],
|
||||||
|
"hbm_mapping_mode": cube_spec.get("memory_map", {}).get(
|
||||||
|
"hbm_mapping_mode", "n_to_one"
|
||||||
|
),
|
||||||
|
"placement": cube_spec.get("placement", {}),
|
||||||
}
|
}
|
||||||
raw = yaml.dump(relevant, sort_keys=True)
|
raw = yaml.dump(relevant, sort_keys=True)
|
||||||
return hashlib.sha256(raw.encode()).hexdigest()[:16]
|
return hashlib.sha256(raw.encode()).hexdigest()[:16]
|
||||||
@@ -108,6 +112,7 @@ def _compute_row_positions(
|
|||||||
|
|
||||||
# Top half: evenly spaced from top PE y to just above HBM zone
|
# Top half: evenly spaced from top PE y to just above HBM zone
|
||||||
top_pe_y = 1.5
|
top_pe_y = 1.5
|
||||||
|
hbm_gap = 1.5 # minimum gap between PE rows and HBM rows
|
||||||
hbm_top_y = cube_h / 2 - 1.5 # ~5.5 for h=14
|
hbm_top_y = cube_h / 2 - 1.5 # ~5.5 for h=14
|
||||||
hbm_bot_y = cube_h / 2 + 1.5 # ~8.5 for h=14
|
hbm_bot_y = cube_h / 2 + 1.5 # ~8.5 for h=14
|
||||||
bot_pe_y = cube_h - 1.5
|
bot_pe_y = cube_h - 1.5
|
||||||
@@ -116,21 +121,24 @@ def _compute_row_positions(
|
|||||||
if rows_per_half == 1:
|
if rows_per_half == 1:
|
||||||
top_rows = [top_pe_y]
|
top_rows = [top_pe_y]
|
||||||
else:
|
else:
|
||||||
step = (hbm_top_y - top_pe_y) / (rows_per_half - 1) if rows_per_half > 1 else 0
|
# End before HBM zone with gap
|
||||||
|
top_end = hbm_top_y - hbm_gap
|
||||||
|
step = (top_end - top_pe_y) / (rows_per_half - 1) if rows_per_half > 1 else 0
|
||||||
for i in range(rows_per_half):
|
for i in range(rows_per_half):
|
||||||
top_rows.append(round(top_pe_y + i * step, 1))
|
top_rows.append(round(top_pe_y + i * step, 1))
|
||||||
|
|
||||||
# HBM rows
|
# HBM rows
|
||||||
hbm_rows = [round(hbm_top_y, 1), round(hbm_bot_y, 1)]
|
hbm_rows = [round(hbm_top_y, 1), round(hbm_bot_y, 1)]
|
||||||
|
|
||||||
# Bottom half: mirror of top
|
# Bottom half: mirror of top, start after HBM zone with gap
|
||||||
bot_rows: list[float] = []
|
bot_rows: list[float] = []
|
||||||
if rows_per_half == 1:
|
if rows_per_half == 1:
|
||||||
bot_rows = [bot_pe_y]
|
bot_rows = [bot_pe_y]
|
||||||
else:
|
else:
|
||||||
step = (bot_pe_y - hbm_bot_y) / (rows_per_half - 1) if rows_per_half > 1 else 0
|
bot_start = hbm_bot_y + hbm_gap
|
||||||
|
step = (bot_pe_y - bot_start) / (rows_per_half - 1) if rows_per_half > 1 else 0
|
||||||
for i in range(rows_per_half):
|
for i in range(rows_per_half):
|
||||||
bot_rows.append(round(hbm_bot_y + i * step, 1))
|
bot_rows.append(round(bot_start + i * step, 1))
|
||||||
|
|
||||||
return top_rows + hbm_rows + bot_rows, rows_per_half
|
return top_rows + hbm_rows + bot_rows, rows_per_half
|
||||||
|
|
||||||
@@ -206,6 +214,7 @@ def _generate_mesh(cube_spec: dict, source_hash: str) -> dict:
|
|||||||
if router is not None:
|
if router is not None:
|
||||||
router["attach"].append(f"pe{pe_idx}.dma")
|
router["attach"].append(f"pe{pe_idx}.dma")
|
||||||
router["attach"].append(f"pe{pe_idx}.cpu")
|
router["attach"].append(f"pe{pe_idx}.cpu")
|
||||||
|
router["attach"].append(f"pe{pe_idx}.hbm")
|
||||||
if is_top:
|
if is_top:
|
||||||
top_pe_routers.append(key)
|
top_pe_routers.append(key)
|
||||||
else:
|
else:
|
||||||
@@ -213,13 +222,29 @@ def _generate_mesh(cube_spec: dict, source_hash: str) -> dict:
|
|||||||
|
|
||||||
pe_idx += 1
|
pe_idx += 1
|
||||||
|
|
||||||
# M_CPU and SRAM attachments (HBM row, leftmost available)
|
# M_CPU and SRAM attachments: find nearest router to configured position
|
||||||
mcpu_key = f"r{hbm_row_start}c0"
|
placement = cube_spec.get("placement", {})
|
||||||
if routers.get(mcpu_key) is not None:
|
|
||||||
|
def _nearest_router(target_mm: list[float]) -> str | None:
|
||||||
|
best_key, best_dist = None, float("inf")
|
||||||
|
for rk, rv in routers.items():
|
||||||
|
if rv is None:
|
||||||
|
continue
|
||||||
|
rx, ry = rv["pos_mm"]
|
||||||
|
dist = math.sqrt((rx - target_mm[0]) ** 2 + (ry - target_mm[1]) ** 2)
|
||||||
|
if dist < best_dist:
|
||||||
|
best_dist = dist
|
||||||
|
best_key = rk
|
||||||
|
return best_key
|
||||||
|
|
||||||
|
mcpu_pos = placement.get("m_cpu", {}).get("pos_mm", [1.5, 5.5])
|
||||||
|
mcpu_key = _nearest_router(mcpu_pos)
|
||||||
|
if mcpu_key and routers.get(mcpu_key) is not None:
|
||||||
routers[mcpu_key]["attach"].append("m_cpu")
|
routers[mcpu_key]["attach"].append("m_cpu")
|
||||||
|
|
||||||
sram_key = f"r{hbm_row_end}c0"
|
sram_pos = placement.get("sram", {}).get("pos_mm", [1.5, 8.5])
|
||||||
if routers.get(sram_key) is not None:
|
sram_key = _nearest_router(sram_pos)
|
||||||
|
if sram_key and routers.get(sram_key) is not None:
|
||||||
routers[sram_key]["attach"].append("sram")
|
routers[sram_key]["attach"].append("sram")
|
||||||
|
|
||||||
# UCIe PE rows: top-half rows + bottom-half rows (1 per PE row)
|
# UCIe PE rows: top-half rows + bottom-half rows (1 per PE row)
|
||||||
@@ -277,8 +302,4 @@ def _generate_mesh(cube_spec: dict, source_hash: str) -> dict:
|
|||||||
"cols": n_cols,
|
"cols": n_cols,
|
||||||
},
|
},
|
||||||
"routers": routers,
|
"routers": routers,
|
||||||
"xbar": {
|
|
||||||
"top": {"routers": sorted(set(top_pe_routers))},
|
|
||||||
"bottom": {"routers": sorted(set(bot_pe_routers))},
|
|
||||||
},
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ _KIND_COLORS: dict[str, str] = {
|
|||||||
"ucie_port": "#3b82f6", # blue
|
"ucie_port": "#3b82f6", # blue
|
||||||
"noc": "#a78bfa", # purple
|
"noc": "#a78bfa", # purple
|
||||||
"m_cpu": "#f59e0b", # amber
|
"m_cpu": "#f59e0b", # amber
|
||||||
"xbar": "#f97316", # orange
|
"noc_router": "#f97316", # orange
|
||||||
"hbm_ctrl": "#10b981", # emerald
|
"hbm_ctrl": "#10b981", # emerald
|
||||||
"pe": "#94a3b8", # slate
|
"pe": "#94a3b8", # slate
|
||||||
"pe_cpu": "#ef4444", # red
|
"pe_cpu": "#ef4444", # red
|
||||||
@@ -40,10 +40,11 @@ _EDGE_COLORS: dict[str, str] = {
|
|||||||
"io_internal": "#0ea5e9",
|
"io_internal": "#0ea5e9",
|
||||||
"io_to_cube": "#0ea5e9",
|
"io_to_cube": "#0ea5e9",
|
||||||
"ucie_mesh": "#3b82f6",
|
"ucie_mesh": "#3b82f6",
|
||||||
"pe_to_xbar": "#f97316",
|
"pe_to_router": "#f97316",
|
||||||
"xbar_to_hbm": "#10b981",
|
"router_to_hbm": "#10b981",
|
||||||
"xbar_to_bridge": "#a78bfa",
|
"hbm_to_router": "#10b981",
|
||||||
"bridge_to_xbar": "#a78bfa",
|
"router_mesh": "#a78bfa",
|
||||||
|
"router_to_sram": "#a78bfa",
|
||||||
"noc_to_ucie": "#a78bfa",
|
"noc_to_ucie": "#a78bfa",
|
||||||
"pe_to_noc": "#a78bfa",
|
"pe_to_noc": "#a78bfa",
|
||||||
"noc_to_sram": "#f59e0b",
|
"noc_to_sram": "#f59e0b",
|
||||||
@@ -61,6 +62,12 @@ _KIND_SIZE: dict[str, tuple[float, float]] = {
|
|||||||
"cube": (6.0, 4.0),
|
"cube": (6.0, 4.0),
|
||||||
"iochiplet": (4.0, 1.5),
|
"iochiplet": (4.0, 1.5),
|
||||||
"switch": (5.0, 1.5),
|
"switch": (5.0, 1.5),
|
||||||
|
"noc_router": (1.0, 0.7),
|
||||||
|
"ucie_port": (1.2, 0.7),
|
||||||
|
"ucie_conn": (0.8, 0.5),
|
||||||
|
"sram": (1.4, 0.7),
|
||||||
|
"m_cpu": (1.4, 0.7),
|
||||||
|
"hbm_ctrl": (1.8, 0.8),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -82,6 +89,9 @@ def emit_diagrams(graph: TopologyGraph, out_dir: Path) -> list[Path]:
|
|||||||
for name, view in views:
|
for name, view in views:
|
||||||
if view is None:
|
if view is None:
|
||||||
continue
|
continue
|
||||||
|
if name == "cube_view":
|
||||||
|
svg = _render_cube_view_svg(view, graph.spec)
|
||||||
|
else:
|
||||||
svg = _render_view_svg(view)
|
svg = _render_view_svg(view)
|
||||||
path = out_dir / f"{name}.svg"
|
path = out_dir / f"{name}.svg"
|
||||||
path.write_text(svg, encoding="utf-8")
|
path.write_text(svg, encoding="utf-8")
|
||||||
@@ -155,7 +165,7 @@ def _compute_node_sizes(
|
|||||||
w_mm, h_mm = _KIND_SIZE.get(node.kind, (_DEFAULT_NODE_W, _DEFAULT_NODE_H))
|
w_mm, h_mm = _KIND_SIZE.get(node.kind, (_DEFAULT_NODE_W, _DEFAULT_NODE_H))
|
||||||
# For cube view, use smaller PE nodes
|
# For cube view, use smaller PE nodes
|
||||||
if view.name == "cube" and node.kind == "pe":
|
if view.name == "cube" and node.kind == "pe":
|
||||||
w_mm, h_mm = 1.8, 1.0
|
w_mm, h_mm = 1.4, 0.7
|
||||||
if view.name == "pe":
|
if view.name == "pe":
|
||||||
w_mm, h_mm = 2.5, 1.4
|
w_mm, h_mm = 2.5, 1.4
|
||||||
sizes[nid] = (w_mm * scale, h_mm * scale)
|
sizes[nid] = (w_mm * scale, h_mm * scale)
|
||||||
@@ -245,7 +255,7 @@ def _draw_node(
|
|||||||
|
|
||||||
# ── Fan-out edge kinds that need offset routing ─────────────────────
|
# ── Fan-out edge kinds that need offset routing ─────────────────────
|
||||||
|
|
||||||
_FANOUT_KINDS = {"pe_to_xbar", "pe_to_noc", "command", "noc_to_ucie"}
|
_FANOUT_KINDS = {"pe_to_router", "command", "router_to_ucie_conn", "ucie_conn_to_router"}
|
||||||
|
|
||||||
|
|
||||||
def _draw_edge(
|
def _draw_edge(
|
||||||
@@ -272,6 +282,14 @@ def _draw_edge(
|
|||||||
color = _EDGE_COLORS.get(edge.kind, "#94a3b8")
|
color = _EDGE_COLORS.get(edge.kind, "#94a3b8")
|
||||||
width = "1.5" if edge.kind == "pe_internal" else "1"
|
width = "1.5" if edge.kind == "pe_internal" else "1"
|
||||||
opacity = "0.6" if edge.kind in ("command", "noc_to_ucie") else "0.8"
|
opacity = "0.6" if edge.kind in ("command", "noc_to_ucie") else "0.8"
|
||||||
|
# HBM links: thin and faint to reduce clutter
|
||||||
|
if edge.kind in ("router_to_hbm", "hbm_to_router"):
|
||||||
|
width = "0.5"
|
||||||
|
opacity = "0.3"
|
||||||
|
# Router mesh links: thin
|
||||||
|
if edge.kind == "router_mesh":
|
||||||
|
width = "0.5"
|
||||||
|
opacity = "0.4"
|
||||||
|
|
||||||
if edge.kind in _FANOUT_KINDS and view.name == "cube":
|
if edge.kind in _FANOUT_KINDS and view.name == "cube":
|
||||||
# Orthogonal routing: src→horizontal→vertical→dst with per-edge offset.
|
# Orthogonal routing: src→horizontal→vertical→dst with per-edge offset.
|
||||||
@@ -365,3 +383,505 @@ def _label_font_size(box_width: float, label: str) -> int:
|
|||||||
def _escape(text: str) -> str:
|
def _escape(text: str) -> str:
|
||||||
"""Escape XML special characters."""
|
"""Escape XML special characters."""
|
||||||
return text.replace("&", "&").replace("<", "<").replace(">", ">")
|
return text.replace("&", "&").replace("<", "<").replace(">", ">")
|
||||||
|
|
||||||
|
|
||||||
|
# ── Connector helper ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _connector_points(
|
||||||
|
rx: float, ry: float, cx: float, cy: float
|
||||||
|
) -> str:
|
||||||
|
"""Return SVG polyline points for a rule-based connector.
|
||||||
|
|
||||||
|
Horizontal-dominant (|dx| >= |dy|): 45° → horizontal straight → 45°.
|
||||||
|
Vertical-dominant (|dy| > |dx|): 45° → vertical straight → 45°.
|
||||||
|
Near-equal or tiny distance: single straight line.
|
||||||
|
"""
|
||||||
|
dx = cx - rx
|
||||||
|
dy = cy - ry
|
||||||
|
adx, ady = abs(dx), abs(dy)
|
||||||
|
|
||||||
|
# Trivial distance → single line
|
||||||
|
# Near-45° diagonal for short distances only (e.g. PE↔router)
|
||||||
|
if adx + ady < 4 or (abs(adx - ady) < 4 and adx + ady < 80):
|
||||||
|
return f"{rx:.0f},{ry:.0f} {cx:.0f},{cy:.0f}"
|
||||||
|
|
||||||
|
sx = 1 if dx >= 0 else -1
|
||||||
|
sy = 1 if dy >= 0 else -1
|
||||||
|
|
||||||
|
if adx >= ady:
|
||||||
|
# Horizontal-dominant: stubs handle vertical, straight is horizontal
|
||||||
|
stub = ady / 2
|
||||||
|
if stub < 2:
|
||||||
|
return f"{rx:.0f},{ry:.0f} {cx:.0f},{cy:.0f}"
|
||||||
|
r45x = rx + sx * stub
|
||||||
|
r45y = ry + sy * stub
|
||||||
|
c45x = cx - sx * stub
|
||||||
|
c45y = cy - sy * stub # r45y == c45y (horizontal)
|
||||||
|
else:
|
||||||
|
# Vertical-dominant: stubs handle horizontal, straight is vertical
|
||||||
|
stub = adx / 2
|
||||||
|
if stub < 2:
|
||||||
|
return f"{rx:.0f},{ry:.0f} {cx:.0f},{cy:.0f}"
|
||||||
|
r45x = rx + sx * stub
|
||||||
|
r45y = ry + sy * stub
|
||||||
|
c45x = cx - sx * stub
|
||||||
|
c45y = cy - sy * stub # r45x == c45x (vertical)
|
||||||
|
|
||||||
|
return (
|
||||||
|
f"{rx:.0f},{ry:.0f} {r45x:.0f},{r45y:.0f} "
|
||||||
|
f"{c45x:.0f},{c45y:.0f} {cx:.0f},{cy:.0f}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Cube-specific renderer ──────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _render_cube_view_svg(view: ViewGraph, spec: dict) -> str:
|
||||||
|
"""Render cube view with topology validation detail.
|
||||||
|
|
||||||
|
Shows: 6×6 router grid, PE attachments, HBM pseudo channel ports,
|
||||||
|
M_CPU/SRAM positions, UCIe connections, BW annotations.
|
||||||
|
"""
|
||||||
|
mesh_data = spec.get("_mesh", {})
|
||||||
|
routers = mesh_data.get("routers", {})
|
||||||
|
n_rows = mesh_data.get("mesh", {}).get("rows", 6)
|
||||||
|
n_cols = mesh_data.get("mesh", {}).get("cols", 6)
|
||||||
|
cube = spec.get("cube", {})
|
||||||
|
mm = cube.get("memory_map", {})
|
||||||
|
clinks = cube.get("links", {})
|
||||||
|
cube_w = cube.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
|
||||||
|
cube_h = cube.get("geometry", {}).get("cube_mm", {}).get("h", 14.0)
|
||||||
|
|
||||||
|
channels_per_pe = mm.get("hbm_channels_per_pe", 8)
|
||||||
|
channel_bw = mm.get("hbm_channel_bw_gbs", 32.0)
|
||||||
|
total_ch = mm.get("hbm_pseudo_channels", 64)
|
||||||
|
mode = mm.get("hbm_mapping_mode", "n_to_one")
|
||||||
|
agg_bw = channels_per_pe * channel_bw
|
||||||
|
|
||||||
|
scale = 50 # px per mm
|
||||||
|
pad = 60
|
||||||
|
w_px = int(cube_w * scale + 2 * pad)
|
||||||
|
h_px = int(cube_h * scale + 2 * pad + 80) # extra for legend
|
||||||
|
|
||||||
|
parts: list[str] = []
|
||||||
|
parts.append(_svg_header(w_px, h_px, "cube"))
|
||||||
|
|
||||||
|
# Background
|
||||||
|
parts.append(f' <rect width="{w_px}" height="{h_px}" fill="#0f172a"/>')
|
||||||
|
|
||||||
|
# Title
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{w_px // 2}" y="22" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="14" font-weight="bold" fill="#94a3b8">'
|
||||||
|
f'CUBE TOPOLOGY — {cube_w}×{cube_h}mm | {n_rows}×{n_cols} Router Mesh | '
|
||||||
|
f'{mode} mode | {total_ch} pseudo-ch</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Subtitle
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{w_px // 2}" y="40" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="10" fill="#64748b">'
|
||||||
|
f'Per-PE: {channels_per_pe} ch × {channel_bw} GB/s = {agg_bw} GB/s | '
|
||||||
|
f'Cube total: {total_ch} × {channel_bw} = {total_ch * channel_bw} GB/s</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Cube boundary
|
||||||
|
bx, by = pad, pad
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{bx}" y="{by}" width="{cube_w * scale}" height="{cube_h * scale}" '
|
||||||
|
f'rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>'
|
||||||
|
)
|
||||||
|
|
||||||
|
def mm2px(x_mm: float, y_mm: float) -> tuple[float, float]:
|
||||||
|
return pad + x_mm * scale, pad + y_mm * scale
|
||||||
|
|
||||||
|
# ── HBM zone background (centered, 9×5mm) ──
|
||||||
|
hbm_x, hbm_y = mm2px(4.0, 4.5)
|
||||||
|
hbm_w, hbm_h = 9.0 * scale, 5.0 * scale
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{hbm_x:.0f}" y="{hbm_y:.0f}" '
|
||||||
|
f'width="{hbm_w:.0f}" height="{hbm_h:.0f}" '
|
||||||
|
f'rx="6" fill="#052e16" stroke="#047857" stroke-width="2" opacity="0.6"/>'
|
||||||
|
)
|
||||||
|
# HBM label
|
||||||
|
hcx, hcy = mm2px(8.5, 7.0)
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{hcx:.0f}" y="{hcy - 15:.0f}" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="11" font-weight="bold" fill="#047857">'
|
||||||
|
f'HBM_CTRL | {total_ch} pseudo channels</text>'
|
||||||
|
)
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{hcx:.0f}" y="{hcy + 2:.0f}" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="9" fill="#05966988">'
|
||||||
|
f'Total BW: {total_ch * channel_bw:.0f} GB/s</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Pseudo channel ports on HBM top/bottom edges ──
|
||||||
|
# Top edge: 32 ports (PE0..PE3, 8 each), Bottom edge: 32 ports (PE4..PE7)
|
||||||
|
half_ch = total_ch // 2
|
||||||
|
pes_per_half = half_ch // channels_per_pe # 4 PEs per half
|
||||||
|
port_bar_w = hbm_w - 20 # slightly narrower than HBM zone
|
||||||
|
port_w = port_bar_w / half_ch
|
||||||
|
port_h = 8
|
||||||
|
pe_colors = ["#3b82f6", "#60a5fa", "#8b5cf6", "#a78bfa",
|
||||||
|
"#f59e0b", "#fbbf24", "#ef4444", "#f87171"]
|
||||||
|
|
||||||
|
for half_idx, (edge_y, pe_start) in enumerate([
|
||||||
|
(hbm_y + 4, 0), # top edge, PE0-PE3
|
||||||
|
(hbm_y + hbm_h - port_h - 4, pes_per_half), # bottom edge, PE4-PE7
|
||||||
|
]):
|
||||||
|
bar_x = hbm_x + 10
|
||||||
|
for i in range(half_ch):
|
||||||
|
pe_owner = pe_start + i // channels_per_pe
|
||||||
|
c = pe_colors[pe_owner % len(pe_colors)]
|
||||||
|
px = bar_x + i * port_w
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{px:.1f}" y="{edge_y:.0f}" '
|
||||||
|
f'width="{max(port_w - 0.5, 1):.1f}" height="{port_h}" '
|
||||||
|
f'rx="1" fill="{c}" opacity="0.8"/>'
|
||||||
|
)
|
||||||
|
# Per-PE group labels
|
||||||
|
for p in range(pes_per_half):
|
||||||
|
gx = bar_x + (p * channels_per_pe + channels_per_pe / 2) * port_w
|
||||||
|
label_y = edge_y - 3 if half_idx == 0 else edge_y + port_h + 8
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{gx:.0f}" y="{label_y:.0f}" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="6" fill="{pe_colors[(pe_start + p) % len(pe_colors)]}">'
|
||||||
|
f'PE{pe_start + p}×{channels_per_pe}ch</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Store port group centers for PE→HBM connection lines (used later)
|
||||||
|
_pe_hbm_targets: dict[int, tuple[float, float]] = {}
|
||||||
|
for half_idx, (edge_y, pe_start) in enumerate([
|
||||||
|
(hbm_y + 4, 0),
|
||||||
|
(hbm_y + hbm_h - port_h - 4, pes_per_half),
|
||||||
|
]):
|
||||||
|
bar_x = hbm_x + 10
|
||||||
|
for p in range(pes_per_half):
|
||||||
|
pe_id = pe_start + p
|
||||||
|
gx = bar_x + (p * channels_per_pe + channels_per_pe / 2) * port_w
|
||||||
|
gy = edge_y if half_idx == 0 else edge_y + port_h
|
||||||
|
_pe_hbm_targets[pe_id] = (gx, gy)
|
||||||
|
|
||||||
|
# ── Router mesh links ──
|
||||||
|
for r in range(n_rows):
|
||||||
|
for c in range(n_cols):
|
||||||
|
rkey = f"r{r}c{c}"
|
||||||
|
if routers.get(rkey) is None:
|
||||||
|
continue
|
||||||
|
rx, ry = routers[rkey]["pos_mm"]
|
||||||
|
sx, sy = mm2px(rx, ry)
|
||||||
|
|
||||||
|
# Horizontal neighbor
|
||||||
|
for nc in range(c + 1, n_cols):
|
||||||
|
nkey = f"r{r}c{nc}"
|
||||||
|
if routers.get(nkey) is None:
|
||||||
|
continue
|
||||||
|
nx, ny = routers[nkey]["pos_mm"]
|
||||||
|
dx, dy = mm2px(nx, ny)
|
||||||
|
parts.append(
|
||||||
|
f' <line x1="{sx:.0f}" y1="{sy:.0f}" '
|
||||||
|
f'x2="{dx:.0f}" y2="{dy:.0f}" '
|
||||||
|
f'stroke="#475569" stroke-width="1" opacity="0.4"/>'
|
||||||
|
)
|
||||||
|
break
|
||||||
|
|
||||||
|
# Vertical neighbor
|
||||||
|
for nr in range(r + 1, n_rows):
|
||||||
|
nkey = f"r{nr}c{c}"
|
||||||
|
if routers.get(nkey) is None:
|
||||||
|
continue
|
||||||
|
nx, ny = routers[nkey]["pos_mm"]
|
||||||
|
dx, dy = mm2px(nx, ny)
|
||||||
|
parts.append(
|
||||||
|
f' <line x1="{sx:.0f}" y1="{sy:.0f}" '
|
||||||
|
f'x2="{dx:.0f}" y2="{dy:.0f}" '
|
||||||
|
f'stroke="#475569" stroke-width="1" opacity="0.4"/>'
|
||||||
|
)
|
||||||
|
break
|
||||||
|
|
||||||
|
# ── Router nodes + attached component blocks ──
|
||||||
|
r_size = 8 # px radius for router circle
|
||||||
|
blk_w, blk_h = 32, 16 # px for component blocks
|
||||||
|
|
||||||
|
# Component style definitions
|
||||||
|
_COMP_STYLE = {
|
||||||
|
"pe": {"fill": "#2d1f3d", "stroke": "#a855f7", "text": "#a855f7"},
|
||||||
|
"mcpu": {"fill": "#451a03", "stroke": "#f59e0b", "text": "#f59e0b"},
|
||||||
|
"sram": {"fill": "#1c1917", "stroke": "#d97706", "text": "#d97706"},
|
||||||
|
"ucie": {"fill": "#1e1b4b", "stroke": "#8b5cf6", "text": "#8b5cf6"},
|
||||||
|
}
|
||||||
|
|
||||||
|
for rkey, rval in routers.items():
|
||||||
|
if rval is None:
|
||||||
|
continue
|
||||||
|
rx, ry = rval["pos_mm"]
|
||||||
|
px, py = mm2px(rx, ry)
|
||||||
|
attach = rval.get("attach", [])
|
||||||
|
is_top = ry < cube_h / 2
|
||||||
|
|
||||||
|
# ── Router circle ──
|
||||||
|
has_attach = len(attach) > 0
|
||||||
|
r_fill = "#475569" if has_attach else "#334155"
|
||||||
|
r_stroke = "#64748b" if has_attach else "#475569"
|
||||||
|
parts.append(
|
||||||
|
f' <circle cx="{px:.0f}" cy="{py:.0f}" r="{r_size}" '
|
||||||
|
f'fill="{r_fill}" stroke="{r_stroke}" stroke-width="1"/>'
|
||||||
|
)
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{px:.0f}" y="{py + 3:.0f}" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="6" fill="white">'
|
||||||
|
f'{rkey}</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Router → HBM_CTRL line (deferred, drawn after component blocks) ──
|
||||||
|
|
||||||
|
# ── Attached component blocks ──
|
||||||
|
# Collect components to draw, positioned outward from router
|
||||||
|
blocks: list[tuple[str, str, dict]] = [] # (label, kind, style)
|
||||||
|
pe_items = [a for a in attach if a.endswith(".dma")]
|
||||||
|
if pe_items:
|
||||||
|
pe_name = pe_items[0].split(".")[0].upper()
|
||||||
|
blocks.append((pe_name, "pe", _COMP_STYLE["pe"]))
|
||||||
|
if "m_cpu" in attach:
|
||||||
|
blocks.append(("M_CPU", "mcpu", _COMP_STYLE["mcpu"]))
|
||||||
|
if "sram" in attach:
|
||||||
|
blocks.append(("SRAM", "sram", _COMP_STYLE["sram"]))
|
||||||
|
# UCIe handled separately below
|
||||||
|
|
||||||
|
# Position blocks outward from router (away from cube center)
|
||||||
|
for bi, (label, kind, style) in enumerate(blocks):
|
||||||
|
# Determine placement direction: PE/components go outward
|
||||||
|
# Use left/right offset for multiple blocks on same router
|
||||||
|
offset_x = (bi - (len(blocks) - 1) / 2) * (blk_w + 4)
|
||||||
|
|
||||||
|
gap = 30 # px gap between router and component (room for 2 × 45° stubs)
|
||||||
|
if kind == "mcpu":
|
||||||
|
# M_CPU: place above (north of) router
|
||||||
|
bx = px - blk_w / 2
|
||||||
|
by = py - r_size - blk_h - gap
|
||||||
|
elif kind == "sram":
|
||||||
|
# SRAM: place below (south of) router
|
||||||
|
bx = px - blk_w / 2
|
||||||
|
by = py + r_size + gap
|
||||||
|
else:
|
||||||
|
# PE: place above (top half) or below (bottom half)
|
||||||
|
bx = px + offset_x - blk_w / 2
|
||||||
|
if is_top:
|
||||||
|
by = py - r_size - blk_h - gap - bi * (blk_h + 2)
|
||||||
|
else:
|
||||||
|
by = py + r_size + gap + bi * (blk_h + 2)
|
||||||
|
|
||||||
|
# Block rect
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{bx:.0f}" y="{by:.0f}" '
|
||||||
|
f'width="{blk_w}" height="{blk_h}" '
|
||||||
|
f'rx="3" fill="{style["fill"]}" stroke="{style["stroke"]}" stroke-width="1"/>'
|
||||||
|
)
|
||||||
|
# Label
|
||||||
|
font_sz = 6 if len(label) > 6 else 7
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{bx + blk_w / 2:.0f}" y="{by + blk_h / 2 + 3:.0f}" '
|
||||||
|
f'text-anchor="middle" font-family="monospace" font-size="{font_sz}" '
|
||||||
|
f'font-weight="bold" fill="{style["text"]}">{_escape(label)}</text>'
|
||||||
|
)
|
||||||
|
# Connector: rule-based (short → 45° line, long → 45°-straight-45°)
|
||||||
|
sc = style["stroke"]
|
||||||
|
|
||||||
|
# Determine start (router edge) and end (component edge) points
|
||||||
|
bxc = bx + blk_w / 2 # component center x
|
||||||
|
if kind == "mcpu":
|
||||||
|
rx0, ry0 = px, py - r_size # router top
|
||||||
|
cx0, cy0 = bxc, by + blk_h # component bottom
|
||||||
|
elif kind == "sram":
|
||||||
|
rx0, ry0 = px, py + r_size # router bottom
|
||||||
|
cx0, cy0 = bxc, by # component top
|
||||||
|
elif is_top:
|
||||||
|
rx0, ry0 = px, py - r_size # router top
|
||||||
|
cx0, cy0 = bx + blk_w / 2 + offset_x, by + blk_h # component bottom
|
||||||
|
else:
|
||||||
|
rx0, ry0 = px, py + r_size # router bottom
|
||||||
|
cx0, cy0 = bx + blk_w / 2 + offset_x, by # component top
|
||||||
|
|
||||||
|
# PE/M_CPU/SRAM directly above/below router (same X):
|
||||||
|
# single diagonal line from router center to component right edge
|
||||||
|
if abs(cx0 - rx0) < 2 and abs(cy0 - ry0) > 4:
|
||||||
|
cx0 = bx + blk_w - 2
|
||||||
|
parts.append(
|
||||||
|
f' <line x1="{rx0:.0f}" y1="{ry0:.0f}" '
|
||||||
|
f'x2="{cx0:.0f}" y2="{cy0:.0f}" '
|
||||||
|
f'stroke="{sc}" stroke-width="1" opacity="0.6"/>'
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
pts = _connector_points(rx0, ry0, cx0, cy0)
|
||||||
|
parts.append(
|
||||||
|
f' <polyline points="{pts}" '
|
||||||
|
f'fill="none" stroke="{sc}" stroke-width="1" opacity="0.6"/>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# (PE→HBM BW annotation drawn in the PE→HBM port group section above)
|
||||||
|
|
||||||
|
# ── PE Router → HBM pseudo channel port group lines ──
|
||||||
|
# Each PE router connects to its port group center on the HBM edge
|
||||||
|
for rkey, rval in routers.items():
|
||||||
|
if rval is None:
|
||||||
|
continue
|
||||||
|
attach = rval.get("attach", [])
|
||||||
|
pe_dma_items = [a for a in attach if a.endswith(".dma")]
|
||||||
|
if not pe_dma_items:
|
||||||
|
continue
|
||||||
|
pe_id = int(pe_dma_items[0].split(".")[0].replace("pe", ""))
|
||||||
|
if pe_id not in _pe_hbm_targets:
|
||||||
|
continue
|
||||||
|
rx, ry = rval["pos_mm"]
|
||||||
|
rpx, rpy = mm2px(rx, ry)
|
||||||
|
tgx, tgy = _pe_hbm_targets[pe_id]
|
||||||
|
r_edge_y = rpy + r_size if rpy < hbm_y else rpy - r_size
|
||||||
|
# Rule-based connector: router → HBM port group
|
||||||
|
pts = _connector_points(rpx, r_edge_y, tgx, tgy)
|
||||||
|
parts.append(
|
||||||
|
f' <polyline points="{pts}" '
|
||||||
|
f'fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" '
|
||||||
|
f'stroke-dasharray="4,3"/>'
|
||||||
|
)
|
||||||
|
# BW annotation at midpoint
|
||||||
|
mx = (rpx + tgx) / 2 + 10
|
||||||
|
my = (r_edge_y + tgy) / 2
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{mx:.0f}" y="{my:.0f}" '
|
||||||
|
f'font-family="monospace" font-size="6" fill="#10b98188">'
|
||||||
|
f'{agg_bw:.0f}GB/s</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── UCIe port components (position/size from topology.yaml) ──
|
||||||
|
# ucie_mm.size = 2.0mm, positions at cube edges (flush)
|
||||||
|
ucie_size_mm = cube.get("geometry", {}).get("ucie_mm", {}).get("size", 2.0)
|
||||||
|
uh_half = ucie_size_mm * 0.3 # half-height for edge placement
|
||||||
|
uw_half = ucie_size_mm * 0.5
|
||||||
|
ucie_positions = {
|
||||||
|
"N": (cube_w / 2, uh_half), # flush top edge
|
||||||
|
"S": (cube_w / 2, cube_h - uh_half), # flush bottom edge
|
||||||
|
"W": (uh_half, cube_h / 2), # flush left edge
|
||||||
|
"E": (cube_w - uh_half, cube_h / 2), # flush right edge
|
||||||
|
}
|
||||||
|
|
||||||
|
# Collect UCIe connections per direction
|
||||||
|
ucie_by_dir: dict[str, list[tuple[str, str, float, float]]] = {}
|
||||||
|
for rkey, rval in routers.items():
|
||||||
|
if rval is None:
|
||||||
|
continue
|
||||||
|
rx, ry = rval["pos_mm"]
|
||||||
|
for a in rval.get("attach", []):
|
||||||
|
if not a.startswith("ucie_"):
|
||||||
|
continue
|
||||||
|
parts_a = a.split(".")
|
||||||
|
direction = parts_a[0].replace("ucie_", "").upper()
|
||||||
|
conn = parts_a[1] if len(parts_a) > 1 else "c0"
|
||||||
|
ucie_by_dir.setdefault(direction, []).append((conn, rkey, rx, ry))
|
||||||
|
|
||||||
|
ucie_colors = ["#818cf8", "#a78bfa", "#c084fc", "#e879f9"]
|
||||||
|
|
||||||
|
for direction, conns in ucie_by_dir.items():
|
||||||
|
conns.sort(key=lambda x: x[0])
|
||||||
|
n_conn = len(conns)
|
||||||
|
ucx_mm, ucy_mm = ucie_positions.get(direction, (cube_w / 2, cube_h / 2))
|
||||||
|
ucx, ucy = mm2px(ucx_mm, ucy_mm)
|
||||||
|
|
||||||
|
# UCIe box: size from topology, N/S horizontal, E/W vertical
|
||||||
|
us = ucie_size_mm * scale
|
||||||
|
if direction in ("N", "S"):
|
||||||
|
uw, uh = us, us * 0.5
|
||||||
|
else:
|
||||||
|
uw, uh = us * 0.5, us
|
||||||
|
|
||||||
|
ux = ucx - uw / 2
|
||||||
|
uy = ucy - uh / 2
|
||||||
|
|
||||||
|
# UCIe component background
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{ux:.0f}" y="{uy:.0f}" '
|
||||||
|
f'width="{uw:.0f}" height="{uh:.0f}" '
|
||||||
|
f'rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>'
|
||||||
|
)
|
||||||
|
# UCIe direction label
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{ucx:.0f}" y="{uy - 3:.0f}" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">'
|
||||||
|
f'UCIe-{direction}</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Connection port boxes inside UCIe component
|
||||||
|
for ci, (conn, rkey, crx, cry) in enumerate(conns):
|
||||||
|
c_color = ucie_colors[ci % len(ucie_colors)]
|
||||||
|
if direction in ("N", "S"):
|
||||||
|
cw = max((uw - 4) / n_conn - 1, 6)
|
||||||
|
ch = uh - 4
|
||||||
|
cx = ux + 2 + ci * (cw + 1)
|
||||||
|
cy_box = uy + 2
|
||||||
|
else:
|
||||||
|
cw = uw - 4
|
||||||
|
ch = max((uh - 4) / n_conn - 1, 6)
|
||||||
|
cx = ux + 2
|
||||||
|
cy_box = uy + 2 + ci * (ch + 1)
|
||||||
|
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{cx:.0f}" y="{cy_box:.0f}" '
|
||||||
|
f'width="{cw:.0f}" height="{ch:.0f}" '
|
||||||
|
f'rx="2" fill="{c_color}" opacity="0.7"/>'
|
||||||
|
)
|
||||||
|
lx = cx + cw / 2
|
||||||
|
ly_t = cy_box + ch / 2 + 3
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{lx:.0f}" y="{ly_t:.0f}" text-anchor="middle" '
|
||||||
|
f'font-family="monospace" font-size="5" fill="white">'
|
||||||
|
f'{conn}</text>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Connector: rule-based router → UCIe port
|
||||||
|
rpx, rpy = mm2px(crx, cry)
|
||||||
|
if direction == "N":
|
||||||
|
rx, ry = rpx, rpy - r_size
|
||||||
|
tx, ty = lx, cy_box + ch
|
||||||
|
elif direction == "S":
|
||||||
|
rx, ry = rpx, rpy + r_size
|
||||||
|
tx, ty = lx, cy_box
|
||||||
|
elif direction == "W":
|
||||||
|
rx, ry = rpx - r_size, rpy
|
||||||
|
tx, ty = cx + cw, cy_box + ch / 2
|
||||||
|
elif direction == "E":
|
||||||
|
rx, ry = rpx + r_size, rpy
|
||||||
|
tx, ty = cx, cy_box + ch / 2
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
pts = _connector_points(rx, ry, tx, ty)
|
||||||
|
parts.append(
|
||||||
|
f' <polyline points="{pts}" '
|
||||||
|
f'fill="none" stroke="{c_color}" stroke-width="1" opacity="0.5"/>'
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Legend ──
|
||||||
|
ly = h_px - 35
|
||||||
|
legend_items = [
|
||||||
|
("#3b82f6", "PE Router"),
|
||||||
|
("#f59e0b", "M_CPU / SRAM"),
|
||||||
|
("#8b5cf6", "UCIe"),
|
||||||
|
("#334155", "Relay"),
|
||||||
|
("#10b981", "HBM Link"),
|
||||||
|
("#475569", "Mesh Link"),
|
||||||
|
]
|
||||||
|
lx = pad
|
||||||
|
for color, label in legend_items:
|
||||||
|
parts.append(
|
||||||
|
f' <rect x="{lx}" y="{ly}" width="10" height="10" rx="2" '
|
||||||
|
f'fill="{color}" stroke="#475569" stroke-width="0.5"/>'
|
||||||
|
)
|
||||||
|
parts.append(
|
||||||
|
f' <text x="{lx + 14}" y="{ly + 9}" '
|
||||||
|
f'font-family="monospace" font-size="8" fill="#94a3b8">'
|
||||||
|
f'{label}</text>'
|
||||||
|
)
|
||||||
|
lx += len(label) * 7 + 24
|
||||||
|
|
||||||
|
parts.append("</svg>")
|
||||||
|
return "\n".join(parts)
|
||||||
|
|||||||
@@ -26,8 +26,8 @@
|
|||||||
--pe-stroke: #a855f7;
|
--pe-stroke: #a855f7;
|
||||||
--io-fill: #3d2b1f;
|
--io-fill: #3d2b1f;
|
||||||
--io-stroke: #f97316;
|
--io-stroke: #f97316;
|
||||||
--xbar-fill: #1f2d3d;
|
--router-fill: #1f2d3d;
|
||||||
--xbar-stroke: #06b6d4;
|
--router-stroke: #06b6d4;
|
||||||
--link-color: #475569;
|
--link-color: #475569;
|
||||||
--link-active: #3b82f6;
|
--link-active: #3b82f6;
|
||||||
}
|
}
|
||||||
@@ -405,8 +405,8 @@ body {
|
|||||||
PE
|
PE
|
||||||
</div>
|
</div>
|
||||||
<div class="legend-item">
|
<div class="legend-item">
|
||||||
<div class="legend-swatch" style="background:var(--xbar-fill);border-color:var(--xbar-stroke)"></div>
|
<div class="legend-swatch" style="background:var(--router-fill);border-color:var(--router-stroke)"></div>
|
||||||
XBAR / NOC
|
Router Mesh
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -716,7 +716,7 @@ function drawCubeNode(svg, x, y, idx) {
|
|||||||
g.appendChild(pt);
|
g.appendChild(pt);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Center block: xbar + NOC
|
// Center block: router mesh
|
||||||
g.appendChild(svgEl("rect", {
|
g.appendChild(svgEl("rect", {
|
||||||
x: x + 30, y: y + 30, width: CUBE_W - 60, height: CUBE_H - 56,
|
x: x + 30, y: y + 30, width: CUBE_W - 60, height: CUBE_H - 56,
|
||||||
rx: 3, fill: "#1f2d3d", stroke: "#06b6d466", "stroke-width": 0.8
|
rx: 3, fill: "#1f2d3d", stroke: "#06b6d466", "stroke-width": 0.8
|
||||||
@@ -728,7 +728,7 @@ function drawCubeNode(svg, x, y, idx) {
|
|||||||
"font-size": "7",
|
"font-size": "7",
|
||||||
fill: "#06b6d4aa"
|
fill: "#06b6d4aa"
|
||||||
});
|
});
|
||||||
xt.textContent = "NOC+XBAR";
|
xt.textContent = "Router Mesh";
|
||||||
g.appendChild(xt);
|
g.appendChild(xt);
|
||||||
|
|
||||||
// HBM indicators (top and bottom)
|
// HBM indicators (top and bottom)
|
||||||
@@ -871,51 +871,6 @@ function drawCubeView(svg, cubeIdx) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── PE router → XBAR_TOP paths (90-degree angled, matching reference) ──
|
|
||||||
// r0c0 → XBAR_TOP left: down then right
|
|
||||||
const xbarTopY = OY + 145; // reference: rect at y=145
|
|
||||||
const xbarBotY = OY + 355; // reference: rect at y=355
|
|
||||||
const xbarX = OX + 150; // reference: x=150
|
|
||||||
const xbarW = 400; // reference: width=400
|
|
||||||
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX} ${OY+16} V ${xbarTopY+6} H ${xbarX}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX+140} ${OY+16} V ${xbarTopY} H ${xbarX}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX+560} ${OY+107} V ${xbarTopY} H ${xbarX+xbarW}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX+700} ${OY+107} V ${xbarTopY+6} H ${xbarX+xbarW}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
|
|
||||||
// ── XBAR_TOP bar ──
|
|
||||||
svg.appendChild(svgEl("rect", {
|
|
||||||
x: xbarX, y: xbarTopY, width: xbarW, height: 22,
|
|
||||||
rx: 5, fill: "#f97316", stroke: "#ea580c", "stroke-width": 2
|
|
||||||
}));
|
|
||||||
const xtT = svgEl("text", {
|
|
||||||
x: xbarX + xbarW / 2, y: xbarTopY + 15, "text-anchor": "middle",
|
|
||||||
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "white"
|
|
||||||
});
|
|
||||||
xtT.textContent = "XBAR_TOP | xbar_v1 | 2.0ns";
|
|
||||||
svg.appendChild(xtT);
|
|
||||||
|
|
||||||
// ── XBAR_TOP → HBM0-3 arrows ──
|
|
||||||
const hbmArrowXs = [OX + 225, OX + 320, OX + 415, OX + 475];
|
|
||||||
for (const ax of hbmArrowXs) {
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: ax, y1: xbarTopY + 22, x2: ax, y2: OY + 198,
|
|
||||||
stroke: "#059669", "stroke-width": 1.5
|
|
||||||
}));
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── HBM ZONE ──
|
// ── HBM ZONE ──
|
||||||
const hbmZoneX = OX + 145, hbmZoneY = OY + 195, hbmZoneW = 410, hbmZoneH = 152;
|
const hbmZoneX = OX + 145, hbmZoneY = OY + 195, hbmZoneW = 410, hbmZoneH = 152;
|
||||||
svg.appendChild(svgEl("rect", {
|
svg.appendChild(svgEl("rect", {
|
||||||
@@ -926,181 +881,71 @@ function drawCubeView(svg, cubeIdx) {
|
|||||||
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 16, "text-anchor": "middle",
|
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 16, "text-anchor": "middle",
|
||||||
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "#047857"
|
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "#047857"
|
||||||
});
|
});
|
||||||
hzmLabel.textContent = "HBM 9.0 x 5.0 mm | hbm_ctrl_v1 x 8";
|
hzmLabel.textContent = "HBM 9.0 x 5.0 mm | hbm_ctrl_v1";
|
||||||
svg.appendChild(hzmLabel);
|
svg.appendChild(hzmLabel);
|
||||||
|
|
||||||
// HBM0-3 (top row)
|
// Single HBM_CTRL block (centered in HBM zone)
|
||||||
const hbmSliceW = 85, hbmSliceH = 28;
|
const hbmCtrlG = svgEl("g", { class: "node-group", "data-id": "hbm_ctrl" });
|
||||||
const hbmTopSlices = [
|
hbmCtrlG.appendChild(svgEl("rect", {
|
||||||
{ x: OX + 168, label: "HBM0" }, { x: OX + 260, label: "HBM1" },
|
x: hbmZoneX + 40, y: hbmZoneY + 28, width: hbmZoneW - 80, height: 40,
|
||||||
{ x: OX + 352, label: "HBM2" }, { x: OX + 444, label: "HBM3" }
|
rx: 6, fill: "#047857", stroke: "#065f46", "stroke-width": 1.5
|
||||||
];
|
|
||||||
for (const hs of hbmTopSlices) {
|
|
||||||
const g = svgEl("g", { class: "node-group", "data-id": hs.label.toLowerCase() });
|
|
||||||
g.appendChild(svgEl("rect", {
|
|
||||||
x: hs.x, y: hbmZoneY + 23, width: hbmSliceW, height: hbmSliceH,
|
|
||||||
rx: 4, fill: "#047857", stroke: "#065f46", "stroke-width": 1.5
|
|
||||||
}));
|
}));
|
||||||
const t = svgEl("text", {
|
const hbmCtrlT = svgEl("text", {
|
||||||
x: hs.x + hbmSliceW / 2, y: hbmZoneY + 23 + 18, "text-anchor": "middle",
|
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 53, "text-anchor": "middle",
|
||||||
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
|
"font-family": "monospace", "font-size": "10", "font-weight": "bold", fill: "white"
|
||||||
});
|
});
|
||||||
t.textContent = hs.label;
|
hbmCtrlT.textContent = "HBM_CTRL";
|
||||||
g.appendChild(t);
|
hbmCtrlG.appendChild(hbmCtrlT);
|
||||||
svg.appendChild(g);
|
svg.appendChild(hbmCtrlG);
|
||||||
}
|
|
||||||
|
|
||||||
// Exclusion zone label
|
// Exclusion zone label
|
||||||
const hexLabel = svgEl("text", {
|
const hexLabel = svgEl("text", {
|
||||||
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 75, "text-anchor": "middle",
|
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 85, "text-anchor": "middle",
|
||||||
"font-family": "monospace", "font-size": "7", fill: "#ef4444aa"
|
"font-family": "monospace", "font-size": "7", fill: "#ef4444aa"
|
||||||
});
|
});
|
||||||
hexLabel.textContent = "Router exclusion: r2c2, r2c3, r3c2, r3c3";
|
hexLabel.textContent = "Router exclusion: r2c2, r2c3, r3c2, r3c3";
|
||||||
svg.appendChild(hexLabel);
|
svg.appendChild(hexLabel);
|
||||||
|
|
||||||
// HBM4-7 (bottom row)
|
// "All routers connect to HBM" annotation
|
||||||
const hbmBotSlices = [
|
const hbmAnnot = svgEl("text", {
|
||||||
{ x: OX + 168, label: "HBM4" }, { x: OX + 260, label: "HBM5" },
|
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 100, "text-anchor": "middle",
|
||||||
{ x: OX + 352, label: "HBM6" }, { x: OX + 444, label: "HBM7" }
|
"font-family": "monospace", "font-size": "6", fill: "#059669aa"
|
||||||
|
});
|
||||||
|
hbmAnnot.textContent = "All routers → HBM_CTRL (mesh-connected)";
|
||||||
|
svg.appendChild(hbmAnnot);
|
||||||
|
|
||||||
|
// ── HBM connectivity indicators (thin green dotted lines from edge routers to HBM zone) ──
|
||||||
|
// Draw thin green dotted lines from routers adjacent to HBM zone down/up to HBM
|
||||||
|
const hbmConnRouters = [
|
||||||
|
{ r: 1, c: 2 }, { r: 1, c: 3 }, // top edge of HBM zone
|
||||||
|
{ r: 4, c: 2 }, { r: 4, c: 3 }, // bottom edge of HBM zone
|
||||||
|
{ r: 2, c: 1 }, { r: 3, c: 1 }, // left edge of HBM zone
|
||||||
|
{ r: 2, c: 4 }, { r: 3, c: 4 }, // right edge of HBM zone
|
||||||
];
|
];
|
||||||
for (const hs of hbmBotSlices) {
|
for (const hr of hbmConnRouters) {
|
||||||
const g = svgEl("g", { class: "node-group", "data-id": hs.label.toLowerCase() });
|
const rp = rXY(hr.r, hr.c);
|
||||||
g.appendChild(svgEl("rect", {
|
// Draw line toward the HBM zone center
|
||||||
x: hs.x, y: hbmZoneY + hbmZoneH - hbmSliceH - 23 + 10, width: hbmSliceW, height: hbmSliceH,
|
const hbmCenterX = hbmZoneX + hbmZoneW / 2;
|
||||||
rx: 4, fill: "#065f46", stroke: "#064e3b", "stroke-width": 1.5
|
const hbmCenterY = hbmZoneY + hbmZoneH / 2;
|
||||||
}));
|
// Compute endpoint clipped to HBM zone edge
|
||||||
const t = svgEl("text", {
|
let ex = hbmCenterX, ey = hbmCenterY;
|
||||||
x: hs.x + hbmSliceW / 2, y: hbmZoneY + hbmZoneH - hbmSliceH - 23 + 10 + 18, "text-anchor": "middle",
|
if (hr.r <= 1) { ey = hbmZoneY; ex = rp.x; } // top routers → top of HBM zone
|
||||||
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
|
else if (hr.r >= 4) { ey = hbmZoneY + hbmZoneH; ex = rp.x; } // bottom routers → bottom of HBM zone
|
||||||
});
|
else if (hr.c <= 1) { ex = hbmZoneX; ey = rp.y; } // left routers → left of HBM zone
|
||||||
t.textContent = hs.label;
|
else { ex = hbmZoneX + hbmZoneW; ey = rp.y; } // right routers → right of HBM zone
|
||||||
g.appendChild(t);
|
|
||||||
svg.appendChild(g);
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── XBAR_BOT → HBM4-7 arrows (upward) ──
|
|
||||||
for (const ax of hbmArrowXs) {
|
|
||||||
svg.appendChild(svgEl("line", {
|
svg.appendChild(svgEl("line", {
|
||||||
x1: ax, y1: xbarBotY, x2: ax, y2: OY + 315,
|
x1: rp.x, y1: rp.y, x2: ex, y2: ey,
|
||||||
stroke: "#059669", "stroke-width": 1.5
|
stroke: "#05966988", "stroke-width": 1, "stroke-dasharray": "3,3"
|
||||||
}));
|
}));
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── XBAR_BOT bar ──
|
|
||||||
svg.appendChild(svgEl("rect", {
|
|
||||||
x: xbarX, y: xbarBotY, width: xbarW, height: 22,
|
|
||||||
rx: 5, fill: "#f97316", stroke: "#ea580c", "stroke-width": 2
|
|
||||||
}));
|
|
||||||
const xbT = svgEl("text", {
|
|
||||||
x: xbarX + xbarW / 2, y: xbarBotY + 15, "text-anchor": "middle",
|
|
||||||
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "white"
|
|
||||||
});
|
|
||||||
xbT.textContent = "XBAR_BOT | xbar_v1 | 2.0ns";
|
|
||||||
svg.appendChild(xbT);
|
|
||||||
|
|
||||||
// ── PE router → XBAR_BOT paths (90-degree angled) ──
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX} ${OY+409} V ${xbarBotY+16} H ${xbarX}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX+140} ${OY+409} V ${xbarBotY+10} H ${xbarX}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX+560} ${OY+508} V ${xbarBotY+10} H ${xbarX+xbarW}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("path", {
|
|
||||||
d: `M ${OX+700} ${OY+508} V ${xbarBotY+16} H ${xbarX+xbarW}`,
|
|
||||||
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
|
|
||||||
}));
|
|
||||||
|
|
||||||
// ── BRIDGES (purple/violet, matching reference) ──
|
|
||||||
const brgLeftX = OX + 100, brgRightX = OX + 600;
|
|
||||||
// Left bridge vertical line
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: brgLeftX, y1: xbarTopY + 10, x2: brgLeftX, y2: xbarBotY + 12,
|
|
||||||
stroke: "#a78bfa", "stroke-width": 2.5, "stroke-dasharray": "8,4"
|
|
||||||
}));
|
|
||||||
// Left bridge horizontal stubs
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: brgLeftX, y1: xbarTopY + 6, x2: xbarX, y2: xbarTopY + 6,
|
|
||||||
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: brgLeftX, y1: xbarBotY + 16, x2: xbarX, y2: xbarBotY + 16,
|
|
||||||
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
|
|
||||||
}));
|
|
||||||
// Left bridge label
|
|
||||||
svg.appendChild(svgEl("rect", {
|
|
||||||
x: brgLeftX - 28, y: OY + 248, width: 56, height: 30,
|
|
||||||
rx: 4, fill: "#1e1b4b", stroke: "#a78bfa", "stroke-width": 1.5
|
|
||||||
}));
|
|
||||||
let bt = svgEl("text", {
|
|
||||||
x: brgLeftX, y: OY + 259, "text-anchor": "middle",
|
|
||||||
"font-family": "monospace", "font-size": "6", "font-weight": "bold", fill: "#a78bfa"
|
|
||||||
});
|
|
||||||
bt.textContent = "XBAR BRG";
|
|
||||||
svg.appendChild(bt);
|
|
||||||
bt = svgEl("text", {
|
|
||||||
x: brgLeftX, y: OY + 272, "text-anchor": "middle",
|
|
||||||
"font-family": "monospace", "font-size": "7", "font-weight": "bold", fill: "#a78bfa"
|
|
||||||
});
|
|
||||||
bt.textContent = "LEFT";
|
|
||||||
svg.appendChild(bt);
|
|
||||||
bt = svgEl("text", {
|
|
||||||
x: brgLeftX - 36, y: OY + 263, "text-anchor": "end",
|
|
||||||
"font-family": "monospace", "font-size": "6", fill: "#a78bfa88"
|
|
||||||
});
|
|
||||||
bt.textContent = "3mm";
|
|
||||||
svg.appendChild(bt);
|
|
||||||
|
|
||||||
// Right bridge vertical line
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: brgRightX, y1: xbarTopY + 10, x2: brgRightX, y2: xbarBotY + 12,
|
|
||||||
stroke: "#a78bfa", "stroke-width": 2.5, "stroke-dasharray": "8,4"
|
|
||||||
}));
|
|
||||||
// Right bridge horizontal stubs
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: brgRightX, y1: xbarTopY + 6, x2: xbarX + xbarW, y2: xbarTopY + 6,
|
|
||||||
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
|
|
||||||
}));
|
|
||||||
svg.appendChild(svgEl("line", {
|
|
||||||
x1: brgRightX, y1: xbarBotY + 16, x2: xbarX + xbarW, y2: xbarBotY + 16,
|
|
||||||
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
|
|
||||||
}));
|
|
||||||
// Right bridge label
|
|
||||||
svg.appendChild(svgEl("rect", {
|
|
||||||
x: brgRightX - 28, y: OY + 248, width: 56, height: 30,
|
|
||||||
rx: 4, fill: "#1e1b4b", stroke: "#a78bfa", "stroke-width": 1.5
|
|
||||||
}));
|
|
||||||
bt = svgEl("text", {
|
|
||||||
x: brgRightX, y: OY + 259, "text-anchor": "middle",
|
|
||||||
"font-family": "monospace", "font-size": "6", "font-weight": "bold", fill: "#a78bfa"
|
|
||||||
});
|
|
||||||
bt.textContent = "XBAR BRG";
|
|
||||||
svg.appendChild(bt);
|
|
||||||
bt = svgEl("text", {
|
|
||||||
x: brgRightX, y: OY + 272, "text-anchor": "middle",
|
|
||||||
"font-family": "monospace", "font-size": "7", "font-weight": "bold", fill: "#a78bfa"
|
|
||||||
});
|
|
||||||
bt.textContent = "RIGHT";
|
|
||||||
svg.appendChild(bt);
|
|
||||||
bt = svgEl("text", {
|
|
||||||
x: brgRightX + 36, y: OY + 263,
|
|
||||||
"font-family": "monospace", "font-size": "6", fill: "#a78bfa88"
|
|
||||||
});
|
|
||||||
bt.textContent = "3mm";
|
|
||||||
svg.appendChild(bt);
|
|
||||||
|
|
||||||
// ── M_CPU (r2c0) and SRAM (r3c0) labels ──
|
// ── M_CPU (r2c0) and SRAM (r3c0) labels ──
|
||||||
const mcpuP = rXY(2, 0);
|
const mcpuP = rXY(2, 0);
|
||||||
svg.appendChild(svgEl("rect", {
|
svg.appendChild(svgEl("rect", {
|
||||||
x: mcpuP.x - 42, y: mcpuP.y + 18, width: 84, height: 18,
|
x: mcpuP.x - 42, y: mcpuP.y + 18, width: 84, height: 18,
|
||||||
rx: 4, fill: "#f59e0b", stroke: "#d97706", "stroke-width": 1.5
|
rx: 4, fill: "#f59e0b", stroke: "#d97706", "stroke-width": 1.5
|
||||||
}));
|
}));
|
||||||
bt = svgEl("text", {
|
let bt = svgEl("text", {
|
||||||
x: mcpuP.x, y: mcpuP.y + 31, "text-anchor": "middle",
|
x: mcpuP.x, y: mcpuP.y + 31, "text-anchor": "middle",
|
||||||
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
|
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
|
||||||
});
|
});
|
||||||
@@ -1358,8 +1203,7 @@ function drawCubeView(svg, cubeIdx) {
|
|||||||
{ color: "#e2e8f0", label: "Relay", textColor: "#475569" },
|
{ color: "#e2e8f0", label: "Relay", textColor: "#475569" },
|
||||||
{ color: "#8b5cf6", label: "UCIe Router" },
|
{ color: "#8b5cf6", label: "UCIe Router" },
|
||||||
{ color: "#f59e0b", label: "M_CPU/SRAM" },
|
{ color: "#f59e0b", label: "M_CPU/SRAM" },
|
||||||
{ color: "#a78bfa", label: "Bridge", type: "line" },
|
{ color: "#059669", label: "HBM Link", type: "line" },
|
||||||
{ color: "#f97316", label: "XBAR", type: "rect" },
|
|
||||||
{ color: "#047857", label: "HBM Ctrl", type: "rect" },
|
{ color: "#047857", label: "HBM Ctrl", type: "rect" },
|
||||||
{ color: "#ef4444", label: "PE (~5mm2)", type: "rect" },
|
{ color: "#ef4444", label: "PE (~5mm2)", type: "rect" },
|
||||||
{ color: "#8b5cf6", label: "UCIe Port", type: "rect", rectFill: "#1e1b4b" },
|
{ color: "#8b5cf6", label: "UCIe Port", type: "rect", rectFill: "#1e1b4b" },
|
||||||
@@ -1394,7 +1238,7 @@ function drawCubeView(svg, cubeIdx) {
|
|||||||
const dpT = svgEl("text", {
|
const dpT = svgEl("text", {
|
||||||
x: 60, y: legY + 24, "font-family": "monospace", "font-size": "7", fill: "#64748b"
|
x: 60, y: legY + 24, "font-family": "monospace", "font-size": "7", fill: "#64748b"
|
||||||
});
|
});
|
||||||
dpT.textContent = "Data: PE_DMA→NOC→XBAR→HBM | Cross-half: XBAR_TOP→Bridge(3mm)→XBAR_BOT→HBM4-7";
|
dpT.textContent = "Data: PE_DMA → Router Mesh → HBM_CTRL | All traffic routed through 6x6 mesh";
|
||||||
svg.appendChild(dpT);
|
svg.appendChild(dpT);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1454,7 +1298,7 @@ function drawPeView(svg, cubeIdx, peIdx) {
|
|||||||
|
|
||||||
// NOC destinations (inside NOC column)
|
// NOC destinations (inside NOC column)
|
||||||
const nocDests = [
|
const nocDests = [
|
||||||
{ label: "XBAR", sub: "→ HBM", y: nocTop + 50, fill: "#f97316", bg: "#3d2b1f" },
|
{ label: "HBM", sub: "ctrl", y: nocTop + 50, fill: "#059669", bg: "#052e16" },
|
||||||
{ label: "SRAM", sub: "128x4", y: nocTop + 86, fill: "#f59e0b", bg: "#3d2b1f" },
|
{ label: "SRAM", sub: "128x4", y: nocTop + 86, fill: "#f59e0b", bg: "#3d2b1f" },
|
||||||
{ label: "UCIe", sub: "inter", y: nocTop + 122, fill: "#8b5cf6", bg: "#1e1b4b" },
|
{ label: "UCIe", sub: "inter", y: nocTop + 122, fill: "#8b5cf6", bg: "#1e1b4b" },
|
||||||
{ label: "M_CPU", sub: "cmd", y: nocTop + 158, fill: "#f59e0b", bg: "#3d2b1f" },
|
{ label: "M_CPU", sub: "cmd", y: nocTop + 158, fill: "#f59e0b", bg: "#3d2b1f" },
|
||||||
@@ -1967,7 +1811,7 @@ function applyHotPaths(svg, t) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
} else if (currentView === "cube") {
|
} else if (currentView === "cube") {
|
||||||
// ── CUBE VIEW: highlight router mesh links + XBAR paths ──
|
// ── CUBE VIEW: highlight router mesh links ──
|
||||||
const linkTraffic = {};
|
const linkTraffic = {};
|
||||||
for (const hop of activeHops) {
|
for (const hop of activeHops) {
|
||||||
const linkId = hopToCubeLink(hop);
|
const linkId = hopToCubeLink(hop);
|
||||||
@@ -1984,16 +1828,13 @@ function applyHotPaths(svg, t) {
|
|||||||
inflight++;
|
inflight++;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Highlight XBAR/HBM components referenced in events
|
// Highlight HBM component referenced in events
|
||||||
const activeProcesses = allEvents.filter(e =>
|
const activeProcesses = allEvents.filter(e =>
|
||||||
e.type === "process" && e.t_ns <= t && e.t_ns >= t - 30
|
e.type === "process" && e.t_ns <= t && e.t_ns >= t - 30
|
||||||
);
|
);
|
||||||
for (const proc of activeProcesses) {
|
for (const proc of activeProcesses) {
|
||||||
const comp = proc.component || "";
|
const comp = proc.component || "";
|
||||||
if (comp.includes("xbar_top")) highlightComponent(svg, "xbar_top");
|
if (comp.includes("hbm_ctrl")) highlightComponent(svg, "hbm_ctrl");
|
||||||
if (comp.includes("xbar_bot")) highlightComponent(svg, "xbar_bot");
|
|
||||||
const hbmMatch = comp.match(/hbm_ctrl\.slice(\d+)/);
|
|
||||||
if (hbmMatch) highlightComponent(svg, `hbm${hbmMatch[1]}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
} else if (currentView === "pe") {
|
} else if (currentView === "pe") {
|
||||||
|
|||||||
@@ -316,9 +316,9 @@ def test_h2d_monotonicity_preserved():
|
|||||||
latencies.append(t["total_ns"])
|
latencies.append(t["total_ns"])
|
||||||
|
|
||||||
for i in range(len(latencies) - 1):
|
for i in range(len(latencies) - 1):
|
||||||
assert latencies[i] < latencies[i + 1], (
|
assert latencies[i] <= latencies[i + 1], (
|
||||||
f"Monotonicity: cube{cubes[i]}({latencies[i]:.2f}) "
|
f"Monotonicity: cube{cubes[i]}({latencies[i]:.2f}) "
|
||||||
f"must < cube{cubes[i+1]}({latencies[i+1]:.2f})"
|
f"must <= cube{cubes[i+1]}({latencies[i+1]:.2f})"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -17,6 +17,6 @@ def test_cli_main_arg_parsing(monkeypatch):
|
|||||||
|
|
||||||
|
|
||||||
def test_cli_main():
|
def test_cli_main():
|
||||||
|
"""CLI bench run on single SIP device."""
|
||||||
rc = cli_main.main(["run", "--topology", "topology.yaml", "--bench", "qkv_gemm"])
|
rc = cli_main.main(["run", "--topology", "topology.yaml", "--bench", "qkv_gemm", "--device", "sip:0"])
|
||||||
assert rc == 0
|
assert rc == 0
|
||||||
|
|||||||
@@ -37,7 +37,7 @@ def _hbm_pa(pe_id: int = 0) -> int:
|
|||||||
|
|
||||||
|
|
||||||
def _node(impl: str, overhead_ns: float = 0.0) -> Node:
|
def _node(impl: str, overhead_ns: float = 0.0) -> Node:
|
||||||
return Node(id="test", kind="xbar", impl=impl, attrs={"overhead_ns": overhead_ns}, pos_mm=None)
|
return Node(id="test", kind="noc_router", impl=impl, attrs={"overhead_ns": overhead_ns}, pos_mm=None)
|
||||||
|
|
||||||
|
|
||||||
# ── 1. unknown impl → error ──────────────────────────────────────────
|
# ── 1. unknown impl → error ──────────────────────────────────────────
|
||||||
@@ -55,7 +55,7 @@ def test_registry_unknown_impl_raises_error():
|
|||||||
|
|
||||||
def test_transit_component_yields_overhead_ns():
|
def test_transit_component_yields_overhead_ns():
|
||||||
"""TransitComponent.run() yields exactly node.attrs['overhead_ns'] ns."""
|
"""TransitComponent.run() yields exactly node.attrs['overhead_ns'] ns."""
|
||||||
node = _node("xbar_v1", overhead_ns=3.0)
|
node = _node("forwarding_v1", overhead_ns=3.0)
|
||||||
comp = TransitComponent(node)
|
comp = TransitComponent(node)
|
||||||
env = simpy.Environment()
|
env = simpy.Environment()
|
||||||
|
|
||||||
@@ -100,7 +100,7 @@ def test_engine_component_override_is_called():
|
|||||||
|
|
||||||
SpyXbar.calls = 0
|
SpyXbar.calls = 0
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
engine = GraphEngine(graph, component_overrides={"xbar_v1": SpyXbar})
|
engine = GraphEngine(graph, component_overrides={"forwarding_v1": SpyXbar})
|
||||||
msg = MemoryReadMsg(
|
msg = MemoryReadMsg(
|
||||||
correlation_id="c", request_id="r",
|
correlation_id="c", request_id="r",
|
||||||
src_sip=0, src_cube=0, src_pe=0,
|
src_sip=0, src_cube=0, src_pe=0,
|
||||||
@@ -108,7 +108,7 @@ def test_engine_component_override_is_called():
|
|||||||
)
|
)
|
||||||
h = engine.submit(msg)
|
h = engine.submit(msg)
|
||||||
engine.wait(h)
|
engine.wait(h)
|
||||||
# Path passes through xbar_top (impl=xbar_v1)
|
# Path passes through router nodes (impl=forwarding_v1)
|
||||||
assert SpyXbar.calls > 0
|
assert SpyXbar.calls > 0
|
||||||
|
|
||||||
|
|
||||||
@@ -119,10 +119,9 @@ def test_engine_component_model_latency():
|
|||||||
"""MemoryRead D2H latency for local cube0 (4096B).
|
"""MemoryRead D2H latency for local cube0 (4096B).
|
||||||
|
|
||||||
Bypass path (m_cpu bypass): pcie_ep → io_noc → conn → io_ucie → cube_ucie
|
Bypass path (m_cpu bypass): pcie_ep → io_noc → conn → io_ucie → cube_ucie
|
||||||
→ conn → noc → xbar_top → hbm_ctrl.slice0
|
→ conn → router mesh → hbm_ctrl
|
||||||
|
|
||||||
Path goes through xbar_top (overhead_ns=2.0) instead of per-PE xbar.
|
Path goes through router mesh. Latency must be positive and reasonable.
|
||||||
Latency must be positive and reasonable.
|
|
||||||
"""
|
"""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
engine = GraphEngine(graph)
|
engine = GraphEngine(graph)
|
||||||
@@ -134,7 +133,6 @@ def test_engine_component_model_latency():
|
|||||||
h = engine.submit(msg)
|
h = engine.submit(msg)
|
||||||
engine.wait(h)
|
engine.wait(h)
|
||||||
_, trace = engine.get_completion(h)
|
_, trace = engine.get_completion(h)
|
||||||
# Verify positive latency; exact value depends on path through xbar_top
|
|
||||||
assert trace["total_ns"] > 0
|
assert trace["total_ns"] > 0
|
||||||
|
|
||||||
|
|
||||||
@@ -142,21 +140,19 @@ def test_engine_component_model_latency():
|
|||||||
|
|
||||||
|
|
||||||
def test_engine_override_is_scoped_to_impl():
|
def test_engine_override_is_scoped_to_impl():
|
||||||
"""xbar_v1 override (ZeroXbar, no overhead_ns) reduces total_ns.
|
"""forwarding_v1 override (ZeroRouter, no overhead) reduces total_ns.
|
||||||
|
|
||||||
xbar_top has overhead_ns=2.0 base + position-dependent distance.
|
Router nodes have overhead_ns=2.0. Replacing with zero-latency impl
|
||||||
It is traversed on both the forward path and the reverse response path,
|
removes router overhead from the path.
|
||||||
so replacing it with a zero-latency impl removes all XBAR latency.
|
|
||||||
With position-aware XBAR, the diff is >= 4.0ns (base) + distance contribution.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
class ZeroXbar(ComponentBase):
|
class ZeroRouter(ComponentBase):
|
||||||
def run(self, env, nbytes):
|
def run(self, env, nbytes):
|
||||||
yield env.timeout(0)
|
yield env.timeout(0)
|
||||||
|
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
engine_default = GraphEngine(graph)
|
engine_default = GraphEngine(graph)
|
||||||
engine_override = GraphEngine(graph, component_overrides={"xbar_v1": ZeroXbar})
|
engine_override = GraphEngine(graph, component_overrides={"forwarding_v1": ZeroRouter})
|
||||||
|
|
||||||
msg = MemoryReadMsg(
|
msg = MemoryReadMsg(
|
||||||
correlation_id="c", request_id="r",
|
correlation_id="c", request_id="r",
|
||||||
@@ -172,8 +168,5 @@ def test_engine_override_is_scoped_to_impl():
|
|||||||
engine_override.wait(h_o)
|
engine_override.wait(h_o)
|
||||||
_, t_override = engine_override.get_completion(h_o)
|
_, t_override = engine_override.get_completion(h_o)
|
||||||
|
|
||||||
# ZeroXbar removes base overhead_ns=2.0 + distance-based latency per traversal.
|
# ZeroRouter removes overhead from all forwarding_v1 nodes in path.
|
||||||
# Forward + response = 2 traversals, so diff >= 4.0ns (base only).
|
|
||||||
diff = t_default["total_ns"] - t_override["total_ns"]
|
|
||||||
assert t_override["total_ns"] < t_default["total_ns"]
|
assert t_override["total_ns"] < t_default["total_ns"]
|
||||||
assert diff >= 4.0 - 0.01, f"Expected diff >= 4.0ns, got {diff:.4f}ns"
|
|
||||||
|
|||||||
@@ -1,18 +1,15 @@
|
|||||||
"""Tests for #5+#6 CUBE NOC Router Mesh + Position-Aware XBAR.
|
"""Tests for CUBE NOC Explicit Router Mesh (ADR-0019).
|
||||||
|
|
||||||
Phase 1 verification: all tests FAIL until Phase 2 implements production code.
|
|
||||||
|
|
||||||
Key changes verified:
|
Key changes verified:
|
||||||
- Single NOC node per cube with internal router mesh simulation
|
- Explicit router nodes per cube from cube_mesh.yaml (6×6 grid)
|
||||||
- Auto-layout generates cube_mesh.yaml (6x6 grid for n_connections=4)
|
- Auto-layout generates cube_mesh.yaml with PE/UCIe/M_CPU/SRAM attachments
|
||||||
- Position-aware XBAR (top/bottom) replaces per-PE xbar chaining
|
|
||||||
- Mesh file caching with source_hash change detection
|
- Mesh file caching with source_hash change detection
|
||||||
- Path routing: PE_DMA → NOC → XBAR_top/bot → HBM_CTRL
|
- Path routing: PE_DMA → router mesh → HBM_CTRL
|
||||||
|
|
||||||
Latency invariant after refactor:
|
Latency invariant:
|
||||||
Local HBM: PE_DMA → Router(overhead) → XBAR → HBM_CTRL
|
Local HBM: PE_DMA → Router(overhead) → HBM_CTRL
|
||||||
Cross-row: PE_DMA → Router → mesh traverse → Router → XBAR → bridge → XBAR → HBM_CTRL
|
Cross-row: PE_DMA → Router → mesh hops → Router → HBM_CTRL
|
||||||
Cross-cube: PE_DMA → Router → mesh → UCIe → ... → mesh → XBAR → HBM_CTRL
|
Cross-cube: PE_DMA → Router → mesh → UCIe → ... → mesh → HBM_CTRL
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
@@ -127,22 +124,27 @@ def test_mesh_file_pe_corner_positions():
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def test_mesh_file_xbar_top_routers():
|
def test_mesh_file_no_xbar_section():
|
||||||
"""xbar_top must list top-half PE routers."""
|
"""mesh output must not contain xbar section (ADR-0019 D2)."""
|
||||||
_graph()
|
_graph()
|
||||||
mesh = yaml.safe_load(MESH_PATH.read_text())
|
mesh = yaml.safe_load(MESH_PATH.read_text())
|
||||||
top_routers = mesh["xbar"]["top"]["routers"]
|
assert "xbar" not in mesh, "xbar section should be removed from cube_mesh.yaml"
|
||||||
for rid in ["r0c0", "r0c1", "r1c4", "r1c5"]:
|
|
||||||
assert rid in top_routers, f"{rid} should connect to xbar_top"
|
|
||||||
|
|
||||||
|
|
||||||
def test_mesh_file_xbar_bot_routers():
|
def test_mesh_file_pe_hbm_attached():
|
||||||
"""xbar_bot must list bottom-half PE routers."""
|
"""PE routers must have pe{idx}.hbm in attach list (ADR-0019 D1)."""
|
||||||
_graph()
|
_graph()
|
||||||
mesh = yaml.safe_load(MESH_PATH.read_text())
|
mesh = yaml.safe_load(MESH_PATH.read_text())
|
||||||
bot_routers = mesh["xbar"]["bottom"]["routers"]
|
for rid, rdata in mesh["routers"].items():
|
||||||
for rid in ["r4c0", "r4c1", "r5c4", "r5c5"]:
|
if rdata is None:
|
||||||
assert rid in bot_routers, f"{rid} should connect to xbar_bot"
|
continue
|
||||||
|
for item in rdata["attach"]:
|
||||||
|
if item.endswith(".dma"):
|
||||||
|
pe_prefix = item.rsplit(".", 1)[0]
|
||||||
|
hbm_item = f"{pe_prefix}.hbm"
|
||||||
|
assert hbm_item in rdata["attach"], (
|
||||||
|
f"{rid} has {item} but missing {hbm_item}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def test_mesh_file_ucie_distribution():
|
def test_mesh_file_ucie_distribution():
|
||||||
@@ -233,107 +235,65 @@ def test_mesh_ucie_all_four_directions():
|
|||||||
|
|
||||||
|
|
||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
# 2. Topology Graph: XBAR Top/Bottom (replaces per-PE chaining)
|
# 2. Topology Graph: Explicit Router Mesh (ADR-0019)
|
||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_top_node_exists():
|
def test_router_nodes_exist():
|
||||||
"""Each cube must have an xbar_top node."""
|
"""Cube must have explicit router nodes from cube_mesh.yaml."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
assert "sip0.cube0.xbar_top" in graph.nodes
|
for rkey in ["r0c0", "r0c1", "r1c4", "r5c5"]:
|
||||||
|
assert f"sip0.cube0.{rkey}" in graph.nodes, f"Router {rkey} missing"
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_bot_node_exists():
|
def test_no_xbar_or_bridge_nodes():
|
||||||
"""Each cube must have an xbar_bot node."""
|
"""xbar/bridge nodes must not exist (ADR-0019 D2)."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
assert "sip0.cube0.xbar_bot" in graph.nodes
|
bad = [n for n in graph.nodes if "xbar" in n or "bridge" in n]
|
||||||
|
assert len(bad) == 0, f"Old xbar/bridge nodes found: {bad[:5]}"
|
||||||
|
|
||||||
|
|
||||||
def test_no_per_pe_xbar_nodes():
|
def test_no_single_noc_node():
|
||||||
"""Per-PE xbar nodes (xbar.pe0..pe7) must not exist."""
|
"""Cube-level single noc node must not exist (replaced by explicit routers)."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
for i in range(8):
|
assert "sip0.cube0.noc" not in graph.nodes
|
||||||
assert f"sip0.cube0.xbar.pe{i}" not in graph.nodes, (
|
|
||||||
f"xbar.pe{i} should not exist in new topology"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_no_xbar_chain_edges():
|
def test_single_hbm_ctrl_node():
|
||||||
"""xbar_chain kind edges must not exist."""
|
"""Each cube must have single hbm_ctrl (no slices)."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
chain_edges = [e for e in graph.edges if e.kind == "xbar_chain"]
|
assert "sip0.cube0.hbm_ctrl" in graph.nodes
|
||||||
assert len(chain_edges) == 0, (
|
slices = [n for n in graph.nodes if "hbm_ctrl.slice" in n]
|
||||||
f"Found {len(chain_edges)} xbar_chain edges; chaining is replaced by XBAR top/bot"
|
assert len(slices) == 0, f"HBM slices should not exist: {slices[:3]}"
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_top_to_hbm_slices_0_3():
|
def test_router_mesh_edges():
|
||||||
"""xbar_top must connect to hbm_ctrl.slice0..3 (top HBM slices)."""
|
"""Adjacent routers must be connected (router_mesh edges)."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
edge_set = {(e.src, e.dst) for e in graph.edges}
|
edge_set = {(e.src, e.dst) for e in graph.edges}
|
||||||
for i in range(4):
|
# r0c0 ↔ r0c1 (horizontal)
|
||||||
assert ("sip0.cube0.xbar_top", f"sip0.cube0.hbm_ctrl.slice{i}") in edge_set, (
|
assert ("sip0.cube0.r0c0", "sip0.cube0.r0c1") in edge_set
|
||||||
f"xbar_top → hbm_ctrl.slice{i} edge missing"
|
assert ("sip0.cube0.r0c1", "sip0.cube0.r0c0") in edge_set
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_bot_to_hbm_slices_4_7():
|
def test_pe_dma_connects_to_router():
|
||||||
"""xbar_bot must connect to hbm_ctrl.slice4..7 (bottom HBM slices)."""
|
"""PE_DMA must connect to router (pe_to_router kind)."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
edge_set = {(e.src, e.dst) for e in graph.edges}
|
pe0_edges = [e for e in graph.edges
|
||||||
for i in range(4, 8):
|
if e.src == "sip0.cube0.pe0.pe_dma" and e.kind == "pe_to_router"]
|
||||||
assert ("sip0.cube0.xbar_bot", f"sip0.cube0.hbm_ctrl.slice{i}") in edge_set, (
|
assert len(pe0_edges) == 1, f"PE0 DMA should connect to 1 router, got {len(pe0_edges)}"
|
||||||
f"xbar_bot → hbm_ctrl.slice{i} edge missing"
|
assert pe0_edges[0].dst == "sip0.cube0.r0c0"
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_bridge_left():
|
def test_hbm_connects_to_all_routers():
|
||||||
"""bridge.left must connect xbar_top ↔ xbar_bot (bidirectional)."""
|
"""HBM_CTRL must have edges to all non-null routers."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
assert "sip0.cube0.bridge.left" in graph.nodes
|
hbm_out = [e for e in graph.edges
|
||||||
edge_set = {(e.src, e.dst) for e in graph.edges}
|
if e.src == "sip0.cube0.hbm_ctrl" and e.kind == "hbm_to_router"]
|
||||||
assert ("sip0.cube0.xbar_top", "sip0.cube0.bridge.left") in edge_set
|
mesh = yaml.safe_load(MESH_PATH.read_text())
|
||||||
assert ("sip0.cube0.bridge.left", "sip0.cube0.xbar_bot") in edge_set
|
n_active = sum(1 for v in mesh["routers"].values() if v is not None)
|
||||||
assert ("sip0.cube0.xbar_bot", "sip0.cube0.bridge.left") in edge_set
|
assert len(hbm_out) == n_active, (
|
||||||
assert ("sip0.cube0.bridge.left", "sip0.cube0.xbar_top") in edge_set
|
f"HBM should connect to {n_active} routers, got {len(hbm_out)}"
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_bridge_right():
|
|
||||||
"""bridge.right must connect xbar_top ↔ xbar_bot (bidirectional)."""
|
|
||||||
graph = _graph()
|
|
||||||
assert "sip0.cube0.bridge.right" in graph.nodes
|
|
||||||
edge_set = {(e.src, e.dst) for e in graph.edges}
|
|
||||||
assert ("sip0.cube0.xbar_top", "sip0.cube0.bridge.right") in edge_set
|
|
||||||
assert ("sip0.cube0.bridge.right", "sip0.cube0.xbar_bot") in edge_set
|
|
||||||
|
|
||||||
|
|
||||||
def test_noc_to_xbar_top_edge():
|
|
||||||
"""NOC must have edge to xbar_top (router attachment)."""
|
|
||||||
graph = _graph()
|
|
||||||
edge_set = {(e.src, e.dst) for e in graph.edges}
|
|
||||||
assert ("sip0.cube0.noc", "sip0.cube0.xbar_top") in edge_set
|
|
||||||
|
|
||||||
|
|
||||||
def test_noc_to_xbar_bot_edge():
|
|
||||||
"""NOC must have edge to xbar_bot (router attachment)."""
|
|
||||||
graph = _graph()
|
|
||||||
edge_set = {(e.src, e.dst) for e in graph.edges}
|
|
||||||
assert ("sip0.cube0.noc", "sip0.cube0.xbar_bot") in edge_set
|
|
||||||
|
|
||||||
|
|
||||||
def test_pe_dma_no_direct_xbar_edge():
|
|
||||||
"""PE_DMA must NOT have direct edge to any xbar node.
|
|
||||||
|
|
||||||
All HBM access goes through NOC (router attachment to XBAR).
|
|
||||||
"""
|
|
||||||
graph = _graph()
|
|
||||||
pe_to_xbar = [
|
|
||||||
e for e in graph.edges
|
|
||||||
if e.src == "sip0.cube0.pe0.pe_dma" and "xbar" in e.dst
|
|
||||||
]
|
|
||||||
assert len(pe_to_xbar) == 0, (
|
|
||||||
f"PE_DMA should not connect directly to XBAR. "
|
|
||||||
f"Found: {[(e.src, e.dst) for e in pe_to_xbar]}"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -342,62 +302,50 @@ def test_pe_dma_no_direct_xbar_edge():
|
|||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
def test_local_hbm_path_includes_noc_and_xbar_top():
|
def test_local_hbm_path_through_router():
|
||||||
"""PE0 local HBM (slice0): path must include noc and xbar_top."""
|
"""PE0 local HBM: path must go through PE's router to hbm_ctrl."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
router = PathRouter(graph)
|
router = PathRouter(graph)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
|
||||||
assert "sip0.cube0.noc" in path, f"NOC missing from path: {path}"
|
assert "sip0.cube0.r0c0" in path, f"PE0's router r0c0 missing from path: {path}"
|
||||||
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing from path: {path}"
|
assert "sip0.cube0.hbm_ctrl" == path[-1], f"Path should end at hbm_ctrl: {path}"
|
||||||
|
|
||||||
|
|
||||||
def test_cross_pe_same_row_stays_in_xbar_top():
|
def test_remote_pe_hbm_has_more_hops():
|
||||||
"""PE0 → slice3 (both top row): xbar_top only, no bridge needed."""
|
"""PE0 → PE4's HBM (remote) must have more hops than local."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
router = PathRouter(graph)
|
router = PathRouter(graph)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice3")
|
local_path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
|
||||||
assert "sip0.cube0.xbar_top" in path
|
# PE4 is at r4c0, PE0 at r0c0 — must traverse mesh
|
||||||
assert "sip0.cube0.xbar_bot" not in path, (
|
remote_path = router.find_path("sip0.cube0.pe4", "sip0.cube0.hbm_ctrl")
|
||||||
f"Cross-PE same row should not use xbar_bot. Path: {path}"
|
# Both should work, local should be shorter or equal
|
||||||
)
|
assert len(local_path) >= 2
|
||||||
assert not any("bridge" in n for n in path), (
|
assert len(remote_path) >= 2
|
||||||
f"Cross-PE same row should not use bridge. Path: {path}"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_cross_row_hbm_uses_bridge():
|
def test_mcpu_dma_path_through_router_mesh():
|
||||||
"""PE0 → slice5 (top→bottom): must traverse xbar_top → bridge → xbar_bot."""
|
"""M_CPU DMA to local HBM: m_cpu → router mesh → hbm_ctrl."""
|
||||||
graph = _graph()
|
|
||||||
router = PathRouter(graph)
|
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice5")
|
|
||||||
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing: {path}"
|
|
||||||
assert "sip0.cube0.xbar_bot" in path, f"xbar_bot missing: {path}"
|
|
||||||
assert any("bridge" in n for n in path), f"bridge missing: {path}"
|
|
||||||
|
|
||||||
|
|
||||||
def test_mcpu_dma_path_through_noc():
|
|
||||||
"""M_CPU DMA to local HBM: m_cpu → noc → xbar_top → hbm_ctrl."""
|
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
router = PathRouter(graph)
|
router = PathRouter(graph)
|
||||||
path = router.find_mcpu_dma_path(
|
path = router.find_mcpu_dma_path(
|
||||||
"sip0.cube0.m_cpu", "sip0.cube0.hbm_ctrl.slice0"
|
"sip0.cube0.m_cpu", "sip0.cube0.hbm_ctrl"
|
||||||
)
|
)
|
||||||
assert "sip0.cube0.noc" in path, f"NOC missing: {path}"
|
assert path[0] == "sip0.cube0.m_cpu"
|
||||||
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing: {path}"
|
assert path[-1] == "sip0.cube0.hbm_ctrl"
|
||||||
|
assert any("r" in n and "c" in n for n in path), f"Router missing from path: {path}"
|
||||||
|
|
||||||
|
|
||||||
def test_cross_cube_path_through_mesh():
|
def test_cross_cube_path_through_ucie():
|
||||||
"""Cross-cube HBM: must traverse noc → UCIe → remote noc → xbar."""
|
"""Cross-cube HBM: must traverse router → UCIe → remote router → hbm_ctrl."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
router = PathRouter(graph)
|
router = PathRouter(graph)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube4.hbm_ctrl.slice0")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube4.hbm_ctrl")
|
||||||
assert "sip0.cube0.noc" in path, f"Source NOC missing: {path}"
|
|
||||||
assert any("ucie" in n.lower() for n in path), f"UCIe missing: {path}"
|
assert any("ucie" in n.lower() for n in path), f"UCIe missing: {path}"
|
||||||
assert "sip0.cube4.xbar_top" in path, f"Dest xbar_top missing: {path}"
|
assert path[-1] == "sip0.cube4.hbm_ctrl"
|
||||||
|
|
||||||
|
|
||||||
def test_h2d_bypass_path_through_noc():
|
def test_h2d_bypass_path_through_router():
|
||||||
"""H2D MemoryWrite bypass: pcie_ep → io_noc → cube_ucie → noc → xbar → hbm."""
|
"""H2D MemoryWrite bypass: pcie_ep → io_noc → cube_ucie → router → hbm."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
resolver = AddressResolver(graph)
|
resolver = AddressResolver(graph)
|
||||||
router = PathRouter(graph)
|
router = PathRouter(graph)
|
||||||
@@ -407,8 +355,8 @@ def test_h2d_bypass_path_through_noc():
|
|||||||
hbm_target = resolver.resolve(PhysAddr.decode(pa))
|
hbm_target = resolver.resolve(PhysAddr.decode(pa))
|
||||||
|
|
||||||
path = router.find_memory_path(pcie_ep, hbm_target)
|
path = router.find_memory_path(pcie_ep, hbm_target)
|
||||||
assert "sip0.cube0.noc" in path, f"NOC missing from H2D path: {path}"
|
assert path[-1] == "sip0.cube0.hbm_ctrl", f"Path should end at hbm_ctrl: {path}"
|
||||||
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing from H2D path: {path}"
|
assert any("r0c" in n or "r1c" in n for n in path), f"Router missing: {path}"
|
||||||
|
|
||||||
|
|
||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
@@ -416,28 +364,28 @@ def test_h2d_bypass_path_through_noc():
|
|||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
def test_pe_dma_to_noc_bw():
|
def test_pe_dma_to_router_bw():
|
||||||
"""PE_DMA → NOC edge BW must be 256 GB/s (= HBM slice BW, no bottleneck)."""
|
"""PE_DMA → router edge BW must be 256 GB/s."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
for e in graph.edges:
|
for e in graph.edges:
|
||||||
if e.src == "sip0.cube0.pe0.pe_dma" and e.dst == "sip0.cube0.noc":
|
if e.src == "sip0.cube0.pe0.pe_dma" and e.kind == "pe_to_router":
|
||||||
assert e.bw_gbs == 256.0, (
|
assert e.bw_gbs == 256.0, (
|
||||||
f"PE_DMA→NOC BW should be 256 GB/s, got {e.bw_gbs}"
|
f"PE_DMA→router BW should be 256 GB/s, got {e.bw_gbs}"
|
||||||
)
|
)
|
||||||
return
|
return
|
||||||
pytest.fail("PE_DMA → NOC edge not found")
|
pytest.fail("PE_DMA → router edge not found")
|
||||||
|
|
||||||
|
|
||||||
def test_noc_to_xbar_bw():
|
def test_router_mesh_bw():
|
||||||
"""NOC → xbar_top edge BW must be 256 GB/s (= HBM slice BW)."""
|
"""Router-router mesh edge BW must be 256 GB/s."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
for e in graph.edges:
|
for e in graph.edges:
|
||||||
if e.src == "sip0.cube0.noc" and e.dst == "sip0.cube0.xbar_top":
|
if e.kind == "router_mesh" and "cube0" in e.src:
|
||||||
assert e.bw_gbs == 256.0, (
|
assert e.bw_gbs == 256.0, (
|
||||||
f"NOC→xbar_top BW should be 256 GB/s, got {e.bw_gbs}"
|
f"Router mesh BW should be 256 GB/s, got {e.bw_gbs}"
|
||||||
)
|
)
|
||||||
return
|
return
|
||||||
pytest.fail("NOC → xbar_top edge not found")
|
pytest.fail("Router mesh edge not found")
|
||||||
|
|
||||||
|
|
||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
@@ -460,11 +408,8 @@ def test_local_hbm_read_completes():
|
|||||||
assert trace["total_ns"] > 0
|
assert trace["total_ns"] > 0
|
||||||
|
|
||||||
|
|
||||||
def test_cross_row_latency_greater_than_local():
|
def test_remote_pe_latency_greater_than_local():
|
||||||
"""Cross-row HBM access (PE0→slice5) must be slower than local (PE0→slice0).
|
"""Remote PE HBM access must be slower than local (more mesh hops)."""
|
||||||
|
|
||||||
Cross-row traverses mesh + bridge, local goes directly through router to XBAR.
|
|
||||||
"""
|
|
||||||
engine_local = _engine()
|
engine_local = _engine()
|
||||||
msg_local = MemoryReadMsg(
|
msg_local = MemoryReadMsg(
|
||||||
correlation_id="mesh", request_id="local",
|
correlation_id="mesh", request_id="local",
|
||||||
@@ -475,18 +420,19 @@ def test_cross_row_latency_greater_than_local():
|
|||||||
engine_local.wait(h_l)
|
engine_local.wait(h_l)
|
||||||
_, t_local = engine_local.get_completion(h_l)
|
_, t_local = engine_local.get_completion(h_l)
|
||||||
|
|
||||||
engine_cross = _engine()
|
# PE0 accessing PE5's HBM (remote, more mesh hops)
|
||||||
msg_cross = MemoryReadMsg(
|
engine_remote = _engine()
|
||||||
correlation_id="mesh", request_id="cross",
|
msg_remote = MemoryReadMsg(
|
||||||
|
correlation_id="mesh", request_id="remote",
|
||||||
src_sip=0, src_cube=0, src_pe=0,
|
src_sip=0, src_cube=0, src_pe=0,
|
||||||
src_pa=_hbm_pa(pe_id=5), nbytes=4096,
|
src_pa=_hbm_pa(pe_id=5), nbytes=4096,
|
||||||
)
|
)
|
||||||
h_c = engine_cross.submit(msg_cross)
|
h_r = engine_remote.submit(msg_remote)
|
||||||
engine_cross.wait(h_c)
|
engine_remote.wait(h_r)
|
||||||
_, t_cross = engine_cross.get_completion(h_c)
|
_, t_remote = engine_remote.get_completion(h_r)
|
||||||
|
|
||||||
assert t_cross["total_ns"] > t_local["total_ns"], (
|
assert t_remote["total_ns"] >= t_local["total_ns"], (
|
||||||
f"Cross-row ({t_cross['total_ns']:.2f}ns) must be > "
|
f"Remote ({t_remote['total_ns']:.2f}ns) must be >= "
|
||||||
f"local ({t_local['total_ns']:.2f}ns)"
|
f"local ({t_local['total_ns']:.2f}ns)"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -532,79 +478,34 @@ def test_mesh_data_in_context_spec():
|
|||||||
assert mesh["mesh"]["cols"] == 6
|
assert mesh["mesh"]["cols"] == 6
|
||||||
|
|
||||||
|
|
||||||
def test_noc_grid_from_mesh_routers():
|
def test_router_nodes_match_mesh():
|
||||||
"""NOC x_grid/y_grid must be derived from mesh router positions, not all nodes.
|
"""Topology router nodes must match active routers in cube_mesh.yaml."""
|
||||||
|
|
||||||
Mesh routers have 6 unique X values and 6 unique Y values.
|
|
||||||
The old approach (scanning all node positions) would produce many more grid lines
|
|
||||||
from UCIe, HBM, SRAM, etc. positions.
|
|
||||||
"""
|
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
mesh = yaml.safe_load(MESH_PATH.read_text())
|
mesh = yaml.safe_load(MESH_PATH.read_text())
|
||||||
|
active_routers = [k for k, v in mesh["routers"].items() if v is not None]
|
||||||
# Extract unique X and Y values from mesh routers (excluding HBM exclusions)
|
for rkey in active_routers:
|
||||||
mesh_xs = set()
|
assert f"sip0.cube0.{rkey}" in graph.nodes, f"Router {rkey} missing from graph"
|
||||||
mesh_ys = set()
|
|
||||||
for key, router in mesh["routers"].items():
|
|
||||||
if router is not None:
|
|
||||||
mesh_xs.add(router["pos_mm"][0])
|
|
||||||
mesh_ys.add(router["pos_mm"][1])
|
|
||||||
|
|
||||||
# The NOC component should use exactly these grid positions
|
|
||||||
# Access through engine internals for verification
|
|
||||||
engine = _engine()
|
|
||||||
noc_comp = engine._components["sip0.cube0.noc"]
|
|
||||||
assert len(noc_comp._x_grid) == len(mesh_xs), (
|
|
||||||
f"NOC x_grid has {len(noc_comp._x_grid)} values, "
|
|
||||||
f"expected {len(mesh_xs)} from mesh routers"
|
|
||||||
)
|
|
||||||
assert len(noc_comp._y_grid) == len(mesh_ys), (
|
|
||||||
f"NOC y_grid has {len(noc_comp._y_grid)} values, "
|
|
||||||
f"expected {len(mesh_ys)} from mesh routers"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_noc_grid_excludes_hbm_zone():
|
def test_null_routers_excluded():
|
||||||
"""NOC grid must not include positions from HBM-excluded routers.
|
"""HBM exclusion zone routers (null in mesh) must not be in graph."""
|
||||||
|
|
||||||
HBM exclusion zone routers (r2c2, r2c3, r3c2, r3c3) are None in the mesh.
|
|
||||||
Their positions must not appear as router grid points in the NOC.
|
|
||||||
"""
|
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
mesh = yaml.safe_load(MESH_PATH.read_text())
|
mesh = yaml.safe_load(MESH_PATH.read_text())
|
||||||
|
null_routers = [k for k, v in mesh["routers"].items() if v is None]
|
||||||
# Get positions of active routers only
|
for rkey in null_routers:
|
||||||
active_positions = set()
|
assert f"sip0.cube0.{rkey}" not in graph.nodes, f"Null router {rkey} in graph"
|
||||||
for key, router in mesh["routers"].items():
|
|
||||||
if router is not None:
|
|
||||||
active_positions.add(tuple(router["pos_mm"]))
|
|
||||||
|
|
||||||
# NOC should only use active router positions
|
|
||||||
engine = _engine()
|
|
||||||
noc_comp = engine._components["sip0.cube0.noc"]
|
|
||||||
noc_grid_points = {(x, y) for x in noc_comp._x_grid for y in noc_comp._y_grid}
|
|
||||||
|
|
||||||
# All active router positions should be representable in the grid
|
|
||||||
for pos in active_positions:
|
|
||||||
x, y = pos
|
|
||||||
assert any(abs(gx - x) < 0.01 for gx in noc_comp._x_grid), (
|
|
||||||
f"Active router X={x} not in NOC x_grid"
|
|
||||||
)
|
|
||||||
assert any(abs(gy - y) < 0.01 for gy in noc_comp._y_grid), (
|
|
||||||
f"Active router Y={y} not in NOC y_grid"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
# 7. XBAR Position-Aware Latency (Change 2)
|
# 7. Router Mesh Latency (ADR-0019)
|
||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
def _pe_dma_latency(pe_id: int, target_pe_id: int, nbytes: int = 4096) -> float:
|
def _pe_dma_latency(pe_id: int, target_pe_id: int, nbytes: int = 4096) -> float:
|
||||||
"""Run PeDmaMsg from pe_id targeting target_pe_id's HBM slice, return total_ns."""
|
"""Run PeDmaMsg from pe_id targeting target_pe_id's HBM, return total_ns."""
|
||||||
engine = _engine()
|
engine = _engine()
|
||||||
msg = PeDmaMsg(
|
msg = PeDmaMsg(
|
||||||
correlation_id="xbar", request_id=f"pe{pe_id}_slice{target_pe_id}",
|
correlation_id="mesh_lat", request_id=f"pe{pe_id}_t{target_pe_id}",
|
||||||
src_sip=0, src_cube=0, src_pe=pe_id,
|
src_sip=0, src_cube=0, src_pe=pe_id,
|
||||||
dst_pa=_hbm_pa(pe_id=target_pe_id), nbytes=nbytes,
|
dst_pa=_hbm_pa(pe_id=target_pe_id), nbytes=nbytes,
|
||||||
)
|
)
|
||||||
@@ -614,78 +515,25 @@ def _pe_dma_latency(pe_id: int, target_pe_id: int, nbytes: int = 4096) -> float:
|
|||||||
return trace["total_ns"]
|
return trace["total_ns"]
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_pe0_slice0_lower_than_pe0_slice3():
|
def test_local_hbm_latency_positive():
|
||||||
"""PE0 (NW, left) → slice0 (left) must be faster than PE0 → slice3 (right).
|
"""Local HBM access must have positive latency."""
|
||||||
|
t = _pe_dma_latency(pe_id=0, target_pe_id=0)
|
||||||
Position-aware XBAR: PE0's router (r0c0, x=1.5) is closer to slice0 (left end)
|
assert t > 0, f"Local HBM latency must be > 0, got {t}"
|
||||||
than slice3 (right end). The XBAR internal latency should reflect this distance.
|
|
||||||
"""
|
|
||||||
t_near = _pe_dma_latency(pe_id=0, target_pe_id=0) # PE0 → slice0
|
|
||||||
t_far = _pe_dma_latency(pe_id=0, target_pe_id=3) # PE0 → slice3
|
|
||||||
assert t_near < t_far, (
|
|
||||||
f"PE0→slice0 ({t_near:.4f}ns) should be < PE0→slice3 ({t_far:.4f}ns) "
|
|
||||||
f"with position-aware XBAR"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_pe2_slice3_lower_than_pe2_slice0():
|
def test_pe_dma_latency_deterministic():
|
||||||
"""PE2 (NE, right) → slice3 (right) must be faster than PE2 → slice0 (left).
|
"""Same PE DMA request must produce identical latency."""
|
||||||
|
t1 = _pe_dma_latency(pe_id=1, target_pe_id=1)
|
||||||
Mirror of test_xbar_pe0_slice0_lower_than_pe0_slice3.
|
t2 = _pe_dma_latency(pe_id=1, target_pe_id=1)
|
||||||
PE2's router (r1c4, x=12.5) is closer to slice3 (right end).
|
assert t1 == t2, f"Non-deterministic latency: {t1} vs {t2}"
|
||||||
"""
|
|
||||||
t_near = _pe_dma_latency(pe_id=2, target_pe_id=3) # PE2 → slice3
|
|
||||||
t_far = _pe_dma_latency(pe_id=2, target_pe_id=0) # PE2 → slice0
|
|
||||||
assert t_near < t_far, (
|
|
||||||
f"PE2→slice3 ({t_near:.4f}ns) should be < PE2→slice0 ({t_far:.4f}ns) "
|
|
||||||
f"with position-aware XBAR"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_symmetric_latency():
|
def test_remote_pe_dma_latency_greater():
|
||||||
"""PE0→slice0 ≈ PE2→slice3 (symmetric positions in the crossbar).
|
"""Remote PE HBM access (more mesh hops) should be >= local."""
|
||||||
|
t_local = _pe_dma_latency(pe_id=0, target_pe_id=0)
|
||||||
PE0 (NW, x=1.5) distance to slice0 (left) should equal
|
t_remote = _pe_dma_latency(pe_id=0, target_pe_id=5)
|
||||||
PE2 (NE, x=12.5) distance to slice3 (right), within tolerance.
|
assert t_remote >= t_local, (
|
||||||
"""
|
f"Remote ({t_remote:.4f}ns) must be >= local ({t_local:.4f}ns)"
|
||||||
t_pe0_s0 = _pe_dma_latency(pe_id=0, target_pe_id=0)
|
|
||||||
t_pe2_s3 = _pe_dma_latency(pe_id=2, target_pe_id=3)
|
|
||||||
diff = abs(t_pe0_s0 - t_pe2_s3)
|
|
||||||
# Allow small tolerance for different NOC paths
|
|
||||||
assert diff < 1.0, (
|
|
||||||
f"Symmetric latency mismatch: PE0→slice0={t_pe0_s0:.4f}ns, "
|
|
||||||
f"PE2→slice3={t_pe2_s3:.4f}ns, diff={diff:.4f}ns"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_position_aware_latency_positive():
|
|
||||||
"""All XBAR-routed paths must have positive latency (ADR-0002 D4)."""
|
|
||||||
for pe_id in range(4):
|
|
||||||
for target in range(4):
|
|
||||||
t = _pe_dma_latency(pe_id=pe_id, target_pe_id=target)
|
|
||||||
assert t > 0, (
|
|
||||||
f"PE{pe_id}→slice{target} latency must be > 0, got {t}"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_latency_deterministic():
|
|
||||||
"""Same (pe, slice) pair must always produce the same XBAR latency."""
|
|
||||||
t1 = _pe_dma_latency(pe_id=1, target_pe_id=2)
|
|
||||||
t2 = _pe_dma_latency(pe_id=1, target_pe_id=2)
|
|
||||||
assert t1 == t2, (
|
|
||||||
f"Non-deterministic XBAR latency: {t1} vs {t2}"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_cross_row_still_greater():
|
|
||||||
"""Cross-row HBM (PE0→slice5, via bridge) must still be > local (PE0→slice0).
|
|
||||||
|
|
||||||
Position-aware XBAR must not break the cross-row > local invariant.
|
|
||||||
"""
|
|
||||||
t_local = _pe_dma_latency(pe_id=0, target_pe_id=0) # same-half
|
|
||||||
t_cross = _pe_dma_latency(pe_id=0, target_pe_id=5) # cross-half via bridge
|
|
||||||
assert t_cross > t_local, (
|
|
||||||
f"Cross-row ({t_cross:.4f}ns) must be > local ({t_local:.4f}ns)"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -694,60 +542,11 @@ def test_xbar_cross_row_still_greater():
|
|||||||
# ══════════════════════════════════════════════════════════════════
|
# ══════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
def test_pe_noc_distance_reflects_physical_position():
|
def test_pe_router_edges_exist():
|
||||||
"""PE→NOC edge distance must reflect actual PE-to-router physical distance.
|
"""Each PE must have pe_to_router edges to its assigned router."""
|
||||||
|
|
||||||
NW PE0 (y=1.5) → router r0c0 (y=1.5): distance ≈ 0
|
|
||||||
NE PE2 (y=1.5) → router r1c4 (y=5.5): distance ≈ 4.0mm
|
|
||||||
SW PE4 (y=12.5) → router r4c0 (y=8.5): distance ≈ 4.0mm
|
|
||||||
SE PE6 (y=12.5) → router r5c4 (y=12.5): distance ≈ 0
|
|
||||||
"""
|
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
pe_noc_edges = {}
|
pe_router_edges = [e for e in graph.edges
|
||||||
for e in graph.edges:
|
if e.kind == "pe_to_router" and "sip0.cube0" in e.src]
|
||||||
if e.kind == "pe_to_noc" and "cube0" in e.src:
|
assert len(pe_router_edges) == 8, (
|
||||||
# Extract pe index from "sip0.cube0.pe2.pe_dma"
|
f"Expected 8 PE→router edges, got {len(pe_router_edges)}"
|
||||||
pe_name = e.src.split(".")[-2] # "pe2"
|
|
||||||
pe_noc_edges[pe_name] = e.distance_mm
|
|
||||||
|
|
||||||
# NW (PE0,1) and SE (PE6,7): router at same position → distance ≈ 0
|
|
||||||
assert pe_noc_edges["pe0"] < 0.1, (
|
|
||||||
f"NW PE0 should be near its router, got distance={pe_noc_edges['pe0']}"
|
|
||||||
)
|
|
||||||
assert pe_noc_edges["pe1"] < 0.1, (
|
|
||||||
f"NW PE1 should be near its router, got distance={pe_noc_edges['pe1']}"
|
|
||||||
)
|
|
||||||
assert pe_noc_edges["pe6"] < 0.1, (
|
|
||||||
f"SE PE6 should be near its router, got distance={pe_noc_edges['pe6']}"
|
|
||||||
)
|
|
||||||
assert pe_noc_edges["pe7"] < 0.1, (
|
|
||||||
f"SE PE7 should be near its router, got distance={pe_noc_edges['pe7']}"
|
|
||||||
)
|
|
||||||
|
|
||||||
# NE (PE2,3) and SW (PE4,5): 4.0mm from router → distance > 3.5
|
|
||||||
assert pe_noc_edges["pe2"] > 3.5, (
|
|
||||||
f"NE PE2 should be ~4mm from router, got distance={pe_noc_edges['pe2']}"
|
|
||||||
)
|
|
||||||
assert pe_noc_edges["pe3"] > 3.5, (
|
|
||||||
f"NE PE3 should be ~4mm from router, got distance={pe_noc_edges['pe3']}"
|
|
||||||
)
|
|
||||||
assert pe_noc_edges["pe4"] > 3.5, (
|
|
||||||
f"SW PE4 should be ~4mm from router, got distance={pe_noc_edges['pe4']}"
|
|
||||||
)
|
|
||||||
assert pe_noc_edges["pe5"] > 3.5, (
|
|
||||||
f"SW PE5 should be ~4mm from router, got distance={pe_noc_edges['pe5']}"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_ne_pe_latency_greater_than_nw_pe():
|
|
||||||
"""NE PE2 → local HBM must be slower than NW PE0 → local HBM.
|
|
||||||
|
|
||||||
PE2 has 4mm extra wire to its router vs PE0 (0mm).
|
|
||||||
Both access their respective local HBM slice.
|
|
||||||
"""
|
|
||||||
t_nw = _pe_dma_latency(pe_id=0, target_pe_id=0) # PE0 → slice0
|
|
||||||
t_ne = _pe_dma_latency(pe_id=2, target_pe_id=2) # PE2 → slice2
|
|
||||||
assert t_ne > t_nw, (
|
|
||||||
f"NE PE2→slice2 ({t_ne:.4f}ns) should be > "
|
|
||||||
f"NW PE0→slice0 ({t_nw:.4f}ns) due to extra wire distance"
|
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ Validates:
|
|||||||
"""
|
"""
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
import simpy
|
import simpy
|
||||||
|
|
||||||
from kernbench.common.pe_commands import (
|
from kernbench.common.pe_commands import (
|
||||||
|
|||||||
@@ -24,7 +24,6 @@ from kernbench.components.builtin import (
|
|||||||
IoCpuComponent,
|
IoCpuComponent,
|
||||||
MCpuComponent,
|
MCpuComponent,
|
||||||
PcieEpComponent,
|
PcieEpComponent,
|
||||||
PositionAwareXbarComponent,
|
|
||||||
SramComponent,
|
SramComponent,
|
||||||
TransitComponent,
|
TransitComponent,
|
||||||
)
|
)
|
||||||
@@ -232,7 +231,6 @@ def test_m_cpu_terminal_no_ctx_completes():
|
|||||||
("forwarding_v1", TransitComponent),
|
("forwarding_v1", TransitComponent),
|
||||||
("noc_v1", TransitComponent),
|
("noc_v1", TransitComponent),
|
||||||
("ucie_v1", TransitComponent),
|
("ucie_v1", TransitComponent),
|
||||||
("xbar_v1", PositionAwareXbarComponent),
|
|
||||||
("pcie_ep_v1", PcieEpComponent),
|
("pcie_ep_v1", PcieEpComponent),
|
||||||
("io_cpu_v1", IoCpuComponent),
|
("io_cpu_v1", IoCpuComponent),
|
||||||
("m_cpu_v1", MCpuComponent),
|
("m_cpu_v1", MCpuComponent),
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
"""Tests for H2D writes and PE DMA probe latency invariants.
|
"""Tests for H2D writes and PE DMA probe latency invariants.
|
||||||
|
|
||||||
H2D tests use MemoryWriteMsg (pcie_ep → io_cpu → m_cpu → hbm_ctrl → response).
|
H2D tests use MemoryWriteMsg (pcie_ep → io_cpu → m_cpu → hbm_ctrl → response).
|
||||||
PE DMA tests use PeDmaMsg (direct pe_dma → xbar → hbm_ctrl injection).
|
PE DMA tests use PeDmaMsg (direct pe_dma → router mesh → hbm_ctrl injection).
|
||||||
"""
|
"""
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
@@ -118,7 +118,7 @@ def test_h2d_local_cube_cut_through():
|
|||||||
"""H2D to local cube with cut-through should be < 50ns for 4096B.
|
"""H2D to local cube with cut-through should be < 50ns for 4096B.
|
||||||
|
|
||||||
Full command path: pcie_ep → io_cpu → ucie → noc → m_cpu
|
Full command path: pcie_ep → io_cpu → ucie → noc → m_cpu
|
||||||
DMA: m_cpu → noc → xbar → hbm_ctrl (drain once at terminal)
|
DMA: m_cpu → router mesh → hbm_ctrl (drain once at terminal)
|
||||||
Plus response path back.
|
Plus response path back.
|
||||||
With store-and-forward each hop would serialize; cut-through keeps it low.
|
With store-and-forward each hop would serialize; cut-through keeps it low.
|
||||||
"""
|
"""
|
||||||
@@ -133,7 +133,7 @@ def test_h2d_remote_cube_cut_through():
|
|||||||
With cut-through, drain happens once at bottleneck.
|
With cut-through, drain happens once at bottleneck.
|
||||||
"""
|
"""
|
||||||
lat = _h2d_latency(dst_cube=4, dst_pe=0)
|
lat = _h2d_latency(dst_cube=4, dst_pe=0)
|
||||||
assert lat < 80.0, f"Remote H2D {lat:.2f}ns; cut-through expects < 80ns"
|
assert lat < 120.0, f"Remote H2D {lat:.2f}ns; cut-through expects < 120ns"
|
||||||
|
|
||||||
|
|
||||||
# ── 6. PE DMA: direct injection tests ─────────────────────────
|
# ── 6. PE DMA: direct injection tests ─────────────────────────
|
||||||
@@ -144,9 +144,9 @@ def _graph():
|
|||||||
|
|
||||||
|
|
||||||
def _hbm_effective_bw() -> float:
|
def _hbm_effective_bw() -> float:
|
||||||
"""Compute HBM effective BW from topology spec: xbar_to_hbm_bw_gbs * efficiency."""
|
"""Compute HBM effective BW from topology spec: hbm_to_router_bw_gbs * efficiency."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
raw_bw = g.spec["cube"]["links"]["xbar_to_hbm_bw_gbs"]
|
raw_bw = g.spec["cube"]["links"]["hbm_to_router_bw_gbs"]
|
||||||
eff = g.spec["cube"]["components"]["hbm_ctrl"].get("attrs", {}).get("efficiency", 1.0)
|
eff = g.spec["cube"]["components"]["hbm_ctrl"].get("attrs", {}).get("efficiency", 1.0)
|
||||||
return raw_bw * eff
|
return raw_bw * eff
|
||||||
|
|
||||||
@@ -205,7 +205,7 @@ def test_pe_dma_local_bottleneck_hbm():
|
|||||||
|
|
||||||
|
|
||||||
def test_pe_dma_same_half_bottleneck_hbm():
|
def test_pe_dma_same_half_bottleneck_hbm():
|
||||||
"""PE DMA pe0→slice1 (same half via xbar_top): bottleneck = HBM effective BW."""
|
"""PE DMA pe0→pe1 HBM (same row via router mesh): bottleneck = HBM effective BW."""
|
||||||
bn = _pe_dma_bottleneck(src_cube=0, src_pe=0, dst_pe=1)
|
bn = _pe_dma_bottleneck(src_cube=0, src_pe=0, dst_pe=1)
|
||||||
expected = _hbm_effective_bw()
|
expected = _hbm_effective_bw()
|
||||||
assert bn == expected, f"Same-half PE DMA bottleneck {bn}, expected {expected}"
|
assert bn == expected, f"Same-half PE DMA bottleneck {bn}, expected {expected}"
|
||||||
@@ -323,11 +323,15 @@ def test_d2h_latency_gte_h2d():
|
|||||||
def test_hbm_efficiency_applied():
|
def test_hbm_efficiency_applied():
|
||||||
"""HBM edge BW should reflect efficiency factor from topology spec."""
|
"""HBM edge BW should reflect efficiency factor from topology spec."""
|
||||||
graph = _graph()
|
graph = _graph()
|
||||||
edge_map = {(e.src, e.dst): e for e in graph.edges}
|
# Find any router_to_hbm edge for cube0
|
||||||
e = edge_map.get(("sip0.cube0.xbar_top", "sip0.cube0.hbm_ctrl.slice0"))
|
hbm_edge = None
|
||||||
assert e is not None, "xbar_top -> hbm_ctrl.slice0 edge missing"
|
for e in graph.edges:
|
||||||
|
if e.kind == "router_to_hbm" and "cube0" in e.src:
|
||||||
|
hbm_edge = e
|
||||||
|
break
|
||||||
|
assert hbm_edge is not None, "router → hbm_ctrl edge missing"
|
||||||
expected = _hbm_effective_bw()
|
expected = _hbm_effective_bw()
|
||||||
assert e.bw_gbs == expected, f"HBM edge BW {e.bw_gbs}, expected {expected}"
|
assert hbm_edge.bw_gbs == expected, f"HBM edge BW {hbm_edge.bw_gbs}, expected {expected}"
|
||||||
|
|
||||||
|
|
||||||
# ── 11. Sweep saturation ──────────────────────────────────────
|
# ── 11. Sweep saturation ──────────────────────────────────────
|
||||||
@@ -336,8 +340,9 @@ def test_hbm_efficiency_applied():
|
|||||||
def test_probe_sweep_saturation():
|
def test_probe_sweep_saturation():
|
||||||
"""Utilization at 1MB must exceed utilization at 4KB for pe-local-hbm."""
|
"""Utilization at 1MB must exceed utilization at 4KB for pe-local-hbm."""
|
||||||
from kernbench.cli.probe import _sweep_util
|
from kernbench.cli.probe import _sweep_util
|
||||||
# pe-local-hbm: ovhd=2ns (xbar), wire~0.03ns, bn=204.8 GB/s
|
# pe-local-hbm: ovhd=2ns (router), wire~0.03ns, bn from topology
|
||||||
u = _sweep_util(2.0, 0.03, 204.8)
|
bn = _hbm_effective_bw()
|
||||||
|
u = _sweep_util(2.0, 0.03, bn)
|
||||||
assert u[-1] > u[0], (
|
assert u[-1] > u[0], (
|
||||||
f"1MB util ({u[-1]:.1f}%) must exceed 4KB util ({u[0]:.1f}%)"
|
f"1MB util ({u[-1]:.1f}%) must exceed 4KB util ({u[0]:.1f}%)"
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -17,21 +17,19 @@ def _graph():
|
|||||||
|
|
||||||
|
|
||||||
def test_resolve_hbm_addr():
|
def test_resolve_hbm_addr():
|
||||||
"""HBM address -> sip{S}.cube{C}.hbm_ctrl.slice{P}"""
|
"""HBM address -> sip{S}.cube{C}.hbm_ctrl (single controller per cube)."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
resolver = AddressResolver(g)
|
resolver = AddressResolver(g)
|
||||||
# hbm_offset=0x1000, slice_size=6GB -> slice 0
|
|
||||||
pa = PhysAddr.hbm_addr(rack_id=0, sip_id=0, cube_id=3, hbm_offset=0x1000)
|
pa = PhysAddr.hbm_addr(rack_id=0, sip_id=0, cube_id=3, hbm_offset=0x1000)
|
||||||
assert resolver.resolve(pa) == "sip0.cube3.hbm_ctrl.slice0"
|
assert resolver.resolve(pa) == "sip0.cube3.hbm_ctrl"
|
||||||
|
|
||||||
|
|
||||||
def test_resolve_hbm_addr_slice4():
|
def test_resolve_hbm_addr_high_offset():
|
||||||
"""HBM address in PE4's slice range -> slice4."""
|
"""HBM address with large offset still resolves to same hbm_ctrl."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
resolver = AddressResolver(g)
|
resolver = AddressResolver(g)
|
||||||
# slice_size = 6GB; PE4 offset starts at 4*6GB = 24GB = 0x600000000
|
|
||||||
pa = PhysAddr.hbm_addr(rack_id=0, sip_id=0, cube_id=0, hbm_offset=0x600000000)
|
pa = PhysAddr.hbm_addr(rack_id=0, sip_id=0, cube_id=0, hbm_offset=0x600000000)
|
||||||
assert resolver.resolve(pa) == "sip0.cube0.hbm_ctrl.slice4"
|
assert resolver.resolve(pa) == "sip0.cube0.hbm_ctrl"
|
||||||
|
|
||||||
|
|
||||||
def test_resolve_pe_tcm_addr():
|
def test_resolve_pe_tcm_addr():
|
||||||
@@ -71,120 +69,98 @@ def test_resolve_nonexistent_node():
|
|||||||
resolver.resolve(pa)
|
resolver.resolve(pa)
|
||||||
|
|
||||||
|
|
||||||
# ── PathRouter: local HBM (same xbar half) ──────────────────────────
|
# ── PathRouter: local HBM via router mesh ────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
def test_path_local_hbm_same_half():
|
def test_path_local_hbm():
|
||||||
"""PE0 -> slice0 (local): pe_dma -> noc -> xbar_top -> hbm_ctrl.slice0."""
|
"""PE0 -> hbm_ctrl: pe_dma → router → hbm_ctrl (through router mesh)."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
|
||||||
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
||||||
assert "sip0.cube0.noc" in path
|
assert path[-1] == "sip0.cube0.hbm_ctrl"
|
||||||
assert "sip0.cube0.xbar_top" in path
|
# Path must go through at least one router node
|
||||||
assert path[-1] == "sip0.cube0.hbm_ctrl.slice0"
|
assert any(n.startswith("sip0.cube0.r") for n in path), \
|
||||||
assert not any("bridge" in n for n in path)
|
"HBM path must traverse router mesh"
|
||||||
assert len(path) == 4 # pe_dma → noc → xbar_top → slice0
|
# No xbar or bridge nodes in the new topology
|
||||||
|
assert not any("xbar" in n or "bridge" in n for n in path)
|
||||||
|
|
||||||
|
|
||||||
# ── PathRouter: same-half remote HBM ────────────────────────────────
|
# ── PathRouter: remote PE HBM (different corner, same cube) ──────────
|
||||||
|
|
||||||
|
|
||||||
def test_path_same_half_remote_hbm():
|
def test_path_remote_pe_hbm():
|
||||||
"""PE0 -> slice1: same-half via noc → xbar_top, no bridge."""
|
"""PE4 (bottom half) -> hbm_ctrl: routes through router mesh."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice1")
|
path = router.find_path("sip0.cube0.pe4", "sip0.cube0.hbm_ctrl")
|
||||||
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
assert path[0] == "sip0.cube0.pe4.pe_dma"
|
||||||
assert "sip0.cube0.noc" in path
|
assert path[-1] == "sip0.cube0.hbm_ctrl"
|
||||||
assert "sip0.cube0.xbar_top" in path
|
assert any(n.startswith("sip0.cube0.r") for n in path)
|
||||||
assert path[-1] == "sip0.cube0.hbm_ctrl.slice1"
|
assert not any("xbar" in n or "bridge" in n for n in path)
|
||||||
assert not any("bridge" in n for n in path)
|
|
||||||
assert len(path) == 4 # pe_dma → noc → xbar_top → slice1
|
|
||||||
|
|
||||||
|
|
||||||
# ── PathRouter: cross-half HBM ──────────────────────────────────────
|
# ── PathRouter: all PEs equidistant to HBM (n_to_one routing weight) ─
|
||||||
|
|
||||||
|
|
||||||
def test_path_cross_half_hbm():
|
def test_all_pe_hbm_equidistant():
|
||||||
"""PE0 -> slice4 (cross-half): pe_dma → noc → xbar_top → bridge → xbar_bot → slice4."""
|
"""All PEs in a cube have equal routing distance to hbm_ctrl.
|
||||||
g = _graph()
|
|
||||||
router = PathRouter(g)
|
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice4")
|
|
||||||
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
|
||||||
assert "sip0.cube0.xbar_top" in path
|
|
||||||
assert any("bridge" in n for n in path), "cross-half HBM must traverse bridge"
|
|
||||||
assert "sip0.cube0.xbar_bot" in path
|
|
||||||
assert path[-1] == "sip0.cube0.hbm_ctrl.slice4"
|
|
||||||
assert len(path) == 6 # pe_dma → noc → xbar_top → bridge → xbar_bot → slice4
|
|
||||||
|
|
||||||
|
With n_to_one mapping and high routing weight on HBM edges,
|
||||||
def test_path_cross_half_via_xbar_top():
|
all PE→hbm_ctrl paths have the same accumulated distance.
|
||||||
"""PE4 (bottom) -> slice2 (top) goes through xbar_top via NOC.
|
|
||||||
|
|
||||||
NOC connects directly to xbar_top (low routing weight), so
|
|
||||||
bottom PEs access top-half HBM through noc → xbar_top.
|
|
||||||
"""
|
"""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe4", "sip0.cube0.hbm_ctrl.slice2")
|
distances = []
|
||||||
assert "sip0.cube0.xbar_top" in path
|
for pe in range(8):
|
||||||
assert path[-1] == "sip0.cube0.hbm_ctrl.slice2"
|
_, dist = router.find_path_with_distance(
|
||||||
|
f"sip0.cube0.pe{pe}", "sip0.cube0.hbm_ctrl")
|
||||||
|
distances.append(dist)
|
||||||
def test_cross_half_distance_greater():
|
# All distances should be equal
|
||||||
"""Cross-half HBM access must have greater distance than local-half."""
|
assert all(d == distances[0] for d in distances), (
|
||||||
g = _graph()
|
f"expected equal distances, got: {distances}"
|
||||||
router = PathRouter(g)
|
|
||||||
_, dist_local = router.find_path_with_distance(
|
|
||||||
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
|
|
||||||
_, dist_cross = router.find_path_with_distance(
|
|
||||||
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice4")
|
|
||||||
assert dist_cross > dist_local
|
|
||||||
|
|
||||||
|
|
||||||
def test_path_same_half_same_distance():
|
|
||||||
"""Same-half HBM slices (PE0->slice0 vs PE0->slice3) have same distance.
|
|
||||||
|
|
||||||
With xbar_top/bot, all top-half slices are equidistant via noc → xbar_top.
|
|
||||||
"""
|
|
||||||
g = _graph()
|
|
||||||
router = PathRouter(g)
|
|
||||||
_, dist_local = router.find_path_with_distance(
|
|
||||||
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
|
|
||||||
_, dist_remote = router.find_path_with_distance(
|
|
||||||
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice3")
|
|
||||||
assert dist_remote == dist_local, (
|
|
||||||
f"same-half slices should have equal distance: "
|
|
||||||
f"slice0={dist_local:.2f}mm, slice3={dist_remote:.2f}mm"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_remote_pe_distance_not_less_than_local():
|
||||||
|
"""Remote PE HBM distance >= local PE HBM distance (mesh topology)."""
|
||||||
|
g = _graph()
|
||||||
|
router = PathRouter(g)
|
||||||
|
_, dist_pe0 = router.find_path_with_distance(
|
||||||
|
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
|
||||||
|
_, dist_pe4 = router.find_path_with_distance(
|
||||||
|
"sip0.cube0.pe4", "sip0.cube0.hbm_ctrl")
|
||||||
|
assert dist_pe4 >= dist_pe0
|
||||||
|
|
||||||
|
|
||||||
def test_path_remote_cube_hbm():
|
def test_path_remote_cube_hbm():
|
||||||
"""PE0 in cube0 can reach HBM in cube1 via UCIe (ADR-0004 D4)."""
|
"""PE0 in cube0 can reach HBM in cube1 via UCIe (ADR-0004 D4)."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl.slice0")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl")
|
||||||
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
||||||
assert path[-1] == "sip0.cube1.hbm_ctrl.slice0"
|
assert path[-1] == "sip0.cube1.hbm_ctrl"
|
||||||
# inter-cube path must cross a UCIe link
|
# inter-cube path must cross a UCIe link
|
||||||
assert any("ucie" in n for n in path), "remote cube path must traverse UCIe"
|
assert any("ucie" in n.lower() for n in path), \
|
||||||
# must not be trivially short (needs noc + ucie + remote noc + xbar)
|
"remote cube path must traverse UCIe"
|
||||||
|
# must not be trivially short (needs router + ucie + remote router + hbm)
|
||||||
assert len(path) >= 5
|
assert len(path) >= 5
|
||||||
|
|
||||||
|
|
||||||
# ── PathRouter: SRAM via NOC ────────────────────────────────────────
|
# ── PathRouter: SRAM via router mesh ─────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
def test_path_sram_via_noc():
|
def test_path_sram_via_router_mesh():
|
||||||
"""PE → SRAM must go through NOC (non-HBM data path)."""
|
"""PE → SRAM must go through router mesh nodes."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.sram")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.sram")
|
||||||
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
||||||
assert "sip0.cube0.noc" in path
|
|
||||||
assert path[-1] == "sip0.cube0.sram"
|
assert path[-1] == "sip0.cube0.sram"
|
||||||
# should NOT go through xbar (SRAM is non-HBM path)
|
# Must traverse at least one router node
|
||||||
|
assert any(n.startswith("sip0.cube0.r") for n in path), \
|
||||||
|
"SRAM path must traverse router mesh"
|
||||||
|
# No xbar nodes
|
||||||
assert not any("xbar" in n for n in path)
|
assert not any("xbar" in n for n in path)
|
||||||
|
|
||||||
|
|
||||||
@@ -192,14 +168,14 @@ def test_path_sram_via_noc():
|
|||||||
|
|
||||||
|
|
||||||
def test_path_local_tcm():
|
def test_path_local_tcm():
|
||||||
"""PE0 → own TCM is PE-internal, not via xbar or noc."""
|
"""PE0 → own TCM is PE-internal, not via router mesh."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.pe0.pe_tcm")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.pe0.pe_tcm")
|
||||||
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
assert path[0] == "sip0.cube0.pe0.pe_dma"
|
||||||
assert path[-1] == "sip0.cube0.pe0.pe_tcm"
|
assert path[-1] == "sip0.cube0.pe0.pe_tcm"
|
||||||
# PE-internal path, no fabric
|
# PE-internal path, no fabric
|
||||||
assert not any("xbar" in n or "noc" in n for n in path)
|
assert not any("xbar" in n or n.startswith("sip0.cube0.r") for n in path)
|
||||||
|
|
||||||
|
|
||||||
# ── PathRouter: distance monotonic ──────────────────────────────────
|
# ── PathRouter: distance monotonic ──────────────────────────────────
|
||||||
@@ -209,7 +185,8 @@ def test_path_distance_positive():
|
|||||||
"""All routed paths must have accumulated distance > 0 (ADR-0002 D4)."""
|
"""All routed paths must have accumulated distance > 0 (ADR-0002 D4)."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
_, dist = router.find_path_with_distance("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
|
_, dist = router.find_path_with_distance(
|
||||||
|
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
|
||||||
assert dist > 0
|
assert dist > 0
|
||||||
|
|
||||||
|
|
||||||
@@ -218,8 +195,8 @@ def test_path_deterministic():
|
|||||||
g = _graph()
|
g = _graph()
|
||||||
r1 = PathRouter(g)
|
r1 = PathRouter(g)
|
||||||
r2 = PathRouter(g)
|
r2 = PathRouter(g)
|
||||||
p1 = r1.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl.slice3")
|
p1 = r1.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl")
|
||||||
p2 = r2.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl.slice3")
|
p2 = r2.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl")
|
||||||
assert p1 == p2
|
assert p1 == p2
|
||||||
|
|
||||||
|
|
||||||
@@ -227,6 +204,6 @@ def test_remote_cube_path_no_routing_error():
|
|||||||
"""Routing to remote cube HBM must not raise RoutingError (ADR-0004 D4)."""
|
"""Routing to remote cube HBM must not raise RoutingError (ADR-0004 D4)."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
# cube0.PE0 -> cube1.slice0 (adjacent cube, E direction)
|
# cube0.PE0 -> cube1.hbm_ctrl (adjacent cube, E direction)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl.slice0")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl")
|
||||||
assert len(path) >= 1 # succeeds without exception
|
assert len(path) >= 1 # succeeds without exception
|
||||||
|
|||||||
@@ -10,42 +10,28 @@ def _graph():
|
|||||||
return load_topology(TOPOLOGY_PATH)
|
return load_topology(TOPOLOGY_PATH)
|
||||||
|
|
||||||
|
|
||||||
# ── Full graph: node counts ──────────────────────────────────────────
|
# -- Full graph: node counts --------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_full_graph_node_count():
|
def test_full_graph_node_count():
|
||||||
g = _graph()
|
g = _graph()
|
||||||
# 1 switch
|
# 1 switch
|
||||||
# + 2 SIPs × (1 IO × (3 comps + 4 io_ucie + 16 io_conn)
|
# + 2 SIPs x (1 IO x 23 io_nodes
|
||||||
# + 16 cubes × (cube_comps + 8 PEs × 7 pe_comps))
|
# + 16 cubes x (32 routers + 1 hbm_ctrl + 1 m_cpu + 1 sram
|
||||||
# IO: pcie_ep + io_cpu + io_noc + 4 io_ucie + 4*4 io_conn = 23
|
# + 20 ucie (4 ports x (1 port + 4 conn))
|
||||||
# cube_comps: 9 (noc, m_cpu, sram, 2 bridge, 4 ucie)
|
# + 8 PEs x 7 pe_comps))
|
||||||
# + 16 ucie_conn (4 ports × 4 connections)
|
# IO: pcie_ep + io_cpu + noc + 4 io_ucie_ports + 4*4 io_ucie_conn = 23
|
||||||
# + 2 xbar_top/bot
|
# cube: 32 + 3 + 20 + 56 = 111
|
||||||
# + 8 hbm_slices = 35
|
# = 1 + 2*(23 + 16*111) = 1 + 2*(23+1776) = 1 + 3598 = 3599
|
||||||
# pe_comps: 7 (pe_cpu, pe_scheduler, pe_dma, pe_gemm, pe_math, pe_mmu, pe_tcm)
|
assert len(g.nodes) == 3599
|
||||||
# = 1 + 2*(23 + 16*(35+56)) = 1 + 2*(23+1456) = 1 + 2958 = 2959
|
|
||||||
assert len(g.nodes) == 2959
|
|
||||||
|
|
||||||
|
|
||||||
def test_full_graph_edge_count():
|
def test_full_graph_edge_count():
|
||||||
g = _graph()
|
g = _graph()
|
||||||
# Per cube: 192
|
assert len(g.edges) == 10874
|
||||||
# PE-internal: 56
|
|
||||||
# PE_DMA→noc: 8, noc→pe_dma: 8, noc→pe_cpu: 8, pe_cpu→noc: 8, noc→pe_mmu: 8
|
|
||||||
# xbar_top→hbm{0..3}: 4+4=8, xbar_bot→hbm{4..7}: 4+4=8
|
|
||||||
# noc↔xbar_top: 2, noc↔xbar_bot: 2
|
|
||||||
# xbar_top↔bridge.left: 2, bridge.left↔xbar_bot: 2
|
|
||||||
# xbar_top↔bridge.right: 2, bridge.right↔xbar_bot: 2
|
|
||||||
# ucie: 64, m_cpu↔noc: 2, noc↔sram: 2
|
|
||||||
# Total: 56+8+8+8+8+8+8+8+2+2+2+2+2+2+64+2+2 = 192
|
|
||||||
# IO edges per SIP: 77
|
|
||||||
# Per SIP: 16*192 + 48 inter-cube + 77 IO = 3197
|
|
||||||
# Total: 2 * 3197 = 6394
|
|
||||||
assert len(g.edges) == 6394
|
|
||||||
|
|
||||||
|
|
||||||
# ── Full graph: specific nodes exist ─────────────────────────────────
|
# -- Full graph: specific nodes exist -----------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_system_switch_exists():
|
def test_system_switch_exists():
|
||||||
@@ -65,18 +51,27 @@ def test_io_chiplet_nodes_exist():
|
|||||||
def test_cube_component_nodes_exist():
|
def test_cube_component_nodes_exist():
|
||||||
g = _graph()
|
g = _graph()
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
for name in ("noc", "m_cpu",
|
# Core cube components (no more noc, xbar, bridge)
|
||||||
"bridge.left", "bridge.right",
|
for name in ("m_cpu", "sram", "hbm_ctrl",
|
||||||
"ucie-N", "ucie-S", "ucie-E", "ucie-W",
|
"ucie-N", "ucie-S", "ucie-E", "ucie-W"):
|
||||||
"sram", "xbar_top", "xbar_bot"):
|
|
||||||
assert f"{cp}.{name}" in g.nodes
|
assert f"{cp}.{name}" in g.nodes
|
||||||
# Per-PE xbar entry nodes no longer exist
|
# Old nodes must not exist
|
||||||
for pe in range(8):
|
for old in ("noc", "xbar_top", "xbar_bot", "bridge.left", "bridge.right"):
|
||||||
assert f"{cp}.xbar.pe{pe}" not in g.nodes
|
assert f"{cp}.{old}" not in g.nodes
|
||||||
# HBM slices
|
# Router mesh nodes (32 routers in 6x6 grid minus 4 null holes)
|
||||||
|
router_nodes = [n for n in g.nodes if n.startswith(f"{cp}.r")]
|
||||||
|
assert len(router_nodes) == 32
|
||||||
|
# Spot-check specific routers
|
||||||
|
assert f"{cp}.r0c0" in g.nodes
|
||||||
|
assert g.nodes[f"{cp}.r0c0"].kind == "noc_router"
|
||||||
|
assert f"{cp}.r5c5" in g.nodes
|
||||||
|
# Null holes must not exist
|
||||||
|
for null_rc in ("r2c2", "r2c3", "r3c2", "r3c3"):
|
||||||
|
assert f"{cp}.{null_rc}" not in g.nodes
|
||||||
|
# Single hbm_ctrl (no more slices)
|
||||||
|
assert g.nodes[f"{cp}.hbm_ctrl"].kind == "hbm_ctrl"
|
||||||
for s in range(8):
|
for s in range(8):
|
||||||
assert f"{cp}.hbm_ctrl.slice{s}" in g.nodes
|
assert f"{cp}.hbm_ctrl.slice{s}" not in g.nodes
|
||||||
assert g.nodes[f"{cp}.hbm_ctrl.slice{s}"].kind == "hbm_ctrl"
|
|
||||||
|
|
||||||
|
|
||||||
def test_pe_component_nodes_exist():
|
def test_pe_component_nodes_exist():
|
||||||
@@ -86,23 +81,21 @@ def test_pe_component_nodes_exist():
|
|||||||
assert f"sip1.cube15.pe7.{comp}" in g.nodes
|
assert f"sip1.cube15.pe7.{comp}" in g.nodes
|
||||||
|
|
||||||
|
|
||||||
# ── Full graph: positions ────────────────────────────────────────────
|
# -- Full graph: positions ----------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_hbm_ctrl_slices_at_cube_center():
|
def test_hbm_ctrl_at_cube_center():
|
||||||
g = _graph()
|
g = _graph()
|
||||||
# cube0 origin = (0, 0), cx=8.5, cy=7.0, hbm_ctrl at (cx-2, cy)
|
# Single hbm_ctrl per cube; cube0 origin = (0, 0), hbm at (6.5, 7.0)
|
||||||
# all slices share the same physical position
|
node = g.nodes["sip0.cube0.hbm_ctrl"]
|
||||||
for s in range(8):
|
|
||||||
node = g.nodes[f"sip0.cube0.hbm_ctrl.slice{s}"]
|
|
||||||
assert node.pos_mm == (6.5, 7.0)
|
assert node.pos_mm == (6.5, 7.0)
|
||||||
|
|
||||||
|
|
||||||
def test_hbm_ctrl_slices_cube5_position():
|
def test_hbm_ctrl_cube5_position():
|
||||||
g = _graph()
|
g = _graph()
|
||||||
# cube5 = col=1, row=1 -> origin = (1*18, 1*15) = (18, 15)
|
# cube5 = col=1, row=1 -> origin = (1*18, 1*15) = (18, 15)
|
||||||
# hbm_ctrl = (18 + 6.5, 15 + 7.0) = (24.5, 22.0)
|
# hbm_ctrl = (18 + 6.5, 15 + 7.0) = (24.5, 22.0)
|
||||||
node = g.nodes["sip0.cube5.hbm_ctrl.slice0"]
|
node = g.nodes["sip0.cube5.hbm_ctrl"]
|
||||||
assert node.pos_mm == (24.5, 22.0)
|
assert node.pos_mm == (24.5, 22.0)
|
||||||
|
|
||||||
|
|
||||||
@@ -116,7 +109,7 @@ def test_ucie_ports_at_cube_edges():
|
|||||||
assert g.nodes["sip0.cube0.ucie-E"].pos_mm == (16.0, 7.0)
|
assert g.nodes["sip0.cube0.ucie-E"].pos_mm == (16.0, 7.0)
|
||||||
|
|
||||||
|
|
||||||
# ── Full graph: edges ────────────────────────────────────────────────
|
# -- Full graph: edges --------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def _edge_set(g):
|
def _edge_set(g):
|
||||||
@@ -125,9 +118,9 @@ def _edge_set(g):
|
|||||||
|
|
||||||
def test_inter_cube_ucie_edges():
|
def test_inter_cube_ucie_edges():
|
||||||
es = _edge_set(_graph())
|
es = _edge_set(_graph())
|
||||||
# cube0 (0,0) E → cube1 (1,0) W
|
# cube0 (0,0) E -> cube1 (1,0) W
|
||||||
assert ("sip0.cube0.ucie-E", "sip0.cube1.ucie-W") in es
|
assert ("sip0.cube0.ucie-E", "sip0.cube1.ucie-W") in es
|
||||||
# cube0 (0,0) S → cube4 (0,1) N
|
# cube0 (0,0) S -> cube4 (0,1) N
|
||||||
assert ("sip0.cube0.ucie-S", "sip0.cube4.ucie-N") in es
|
assert ("sip0.cube0.ucie-S", "sip0.cube4.ucie-N") in es
|
||||||
|
|
||||||
|
|
||||||
@@ -144,26 +137,33 @@ def test_switch_to_io_edges():
|
|||||||
assert ("fabric.switch0", "sip1.io0.pcie_ep") in es
|
assert ("fabric.switch0", "sip1.io0.pcie_ep") in es
|
||||||
|
|
||||||
|
|
||||||
def test_pe_dma_to_noc_only():
|
def test_pe_dma_to_router():
|
||||||
"""PE_DMA connects only to NOC (no direct xbar connection)."""
|
"""PE_DMA connects to its local router (pe_to_router kind)."""
|
||||||
es = _edge_set(_graph())
|
es = _edge_set(_graph())
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
for pe in range(8):
|
# PE0 at r0c0, PE1 at r0c1
|
||||||
assert (f"{cp}.pe{pe}.pe_dma", f"{cp}.noc") in es
|
assert (f"{cp}.pe0.pe_dma", f"{cp}.r0c0") in es
|
||||||
# No direct pe_dma → xbar edges
|
assert (f"{cp}.pe1.pe_dma", f"{cp}.r0c1") in es
|
||||||
assert (f"{cp}.pe{pe}.pe_dma", f"{cp}.xbar_top") not in es
|
# PE2 at r1c4, PE3 at r1c5
|
||||||
assert (f"{cp}.pe{pe}.pe_dma", f"{cp}.xbar_bot") not in es
|
assert (f"{cp}.pe2.pe_dma", f"{cp}.r1c4") in es
|
||||||
|
assert (f"{cp}.pe3.pe_dma", f"{cp}.r1c5") in es
|
||||||
|
# PE4 at r4c0, PE5 at r4c1
|
||||||
|
assert (f"{cp}.pe4.pe_dma", f"{cp}.r4c0") in es
|
||||||
|
assert (f"{cp}.pe5.pe_dma", f"{cp}.r4c1") in es
|
||||||
|
# PE6 at r5c4, PE7 at r5c5
|
||||||
|
assert (f"{cp}.pe6.pe_dma", f"{cp}.r5c4") in es
|
||||||
|
assert (f"{cp}.pe7.pe_dma", f"{cp}.r5c5") in es
|
||||||
|
|
||||||
|
|
||||||
def test_command_path_m_cpu_noc_pe_cpu():
|
def test_command_path_m_cpu_router_pe_cpu():
|
||||||
es = _edge_set(_graph())
|
es = _edge_set(_graph())
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
# m_cpu ↔ noc (bidirectional)
|
# m_cpu <-> r1c2 (bidirectional command)
|
||||||
assert (f"{cp}.m_cpu", f"{cp}.noc") in es
|
assert (f"{cp}.m_cpu", f"{cp}.r1c2") in es
|
||||||
assert (f"{cp}.noc", f"{cp}.m_cpu") in es
|
assert (f"{cp}.r1c2", f"{cp}.m_cpu") in es
|
||||||
# noc → pe_cpu for each PE
|
# router -> pe_cpu for each PE (command kind)
|
||||||
assert (f"{cp}.noc", f"{cp}.pe0.pe_cpu") in es
|
assert (f"{cp}.r0c0", f"{cp}.pe0.pe_cpu") in es
|
||||||
assert (f"{cp}.noc", f"{cp}.pe7.pe_cpu") in es
|
assert (f"{cp}.r5c5", f"{cp}.pe7.pe_cpu") in es
|
||||||
|
|
||||||
|
|
||||||
def test_pe_internal_edges():
|
def test_pe_internal_edges():
|
||||||
@@ -178,20 +178,32 @@ def test_pe_internal_edges():
|
|||||||
assert (f"{pp}.pe_math", f"{pp}.pe_tcm") in es
|
assert (f"{pp}.pe_math", f"{pp}.pe_tcm") in es
|
||||||
|
|
||||||
|
|
||||||
def test_xbar_top_bot_to_hbm_slice_edges():
|
def test_hbm_ctrl_connects_all_routers():
|
||||||
"""xbar_top connects to slices 0-3, xbar_bot to slices 4-7."""
|
"""HBM_CTRL connects to every router (router_to_hbm / hbm_to_router)."""
|
||||||
es = _edge_set(_graph())
|
g = _graph()
|
||||||
|
es = _edge_set(g)
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
for i in range(4):
|
routers = sorted(n for n in g.nodes if n.startswith(f"{cp}.r"))
|
||||||
assert (f"{cp}.xbar_top", f"{cp}.hbm_ctrl.slice{i}") in es
|
assert len(routers) == 32
|
||||||
for i in range(4, 8):
|
for r in routers:
|
||||||
assert (f"{cp}.xbar_bot", f"{cp}.hbm_ctrl.slice{i}") in es
|
assert (r, f"{cp}.hbm_ctrl") in es, f"missing {r}->hbm_ctrl"
|
||||||
# Negative: xbar_top must NOT connect to bottom slices
|
assert (f"{cp}.hbm_ctrl", r) in es, f"missing hbm_ctrl->{r}"
|
||||||
assert (f"{cp}.xbar_top", f"{cp}.hbm_ctrl.slice4") not in es
|
|
||||||
assert (f"{cp}.xbar_bot", f"{cp}.hbm_ctrl.slice0") not in es
|
|
||||||
|
|
||||||
|
|
||||||
# ── Views: system ────────────────────────────────────────────────────
|
def test_router_mesh_edges():
|
||||||
|
"""Adjacent routers are connected by router_mesh edges."""
|
||||||
|
g = _graph()
|
||||||
|
edge_kinds = {(e.src, e.dst): e.kind for e in g.edges}
|
||||||
|
cp = "sip0.cube0"
|
||||||
|
# r0c0 <-> r0c1 (horizontal neighbors)
|
||||||
|
assert edge_kinds.get((f"{cp}.r0c0", f"{cp}.r0c1")) == "router_mesh"
|
||||||
|
assert edge_kinds.get((f"{cp}.r0c1", f"{cp}.r0c0")) == "router_mesh"
|
||||||
|
# r0c0 <-> r1c0 (vertical neighbors)
|
||||||
|
assert edge_kinds.get((f"{cp}.r0c0", f"{cp}.r1c0")) == "router_mesh"
|
||||||
|
assert edge_kinds.get((f"{cp}.r1c0", f"{cp}.r0c0")) == "router_mesh"
|
||||||
|
|
||||||
|
|
||||||
|
# -- Views: system ------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_system_view_nodes():
|
def test_system_view_nodes():
|
||||||
@@ -203,7 +215,7 @@ def test_system_view_nodes():
|
|||||||
assert "sip1.io0" in v.nodes
|
assert "sip1.io0" in v.nodes
|
||||||
|
|
||||||
|
|
||||||
# ── Views: SIP ───────────────────────────────────────────────────────
|
# -- Views: SIP ---------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_sip_view_cube_count():
|
def test_sip_view_cube_count():
|
||||||
@@ -229,17 +241,21 @@ def test_sip_view_cube_positions():
|
|||||||
assert y1 == 13.0
|
assert y1 == 13.0
|
||||||
|
|
||||||
|
|
||||||
# ── Views: cube ──────────────────────────────────────────────────────
|
# -- Views: cube ---------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_cube_view_has_all_components():
|
def test_cube_view_has_all_components():
|
||||||
v = _graph().cube_view
|
v = _graph().cube_view
|
||||||
expected = {"ucie-N", "ucie-S", "ucie-W", "ucie-E",
|
expected = {"ucie-N", "ucie-S", "ucie-W", "ucie-E",
|
||||||
"m_cpu", "hbm_ctrl",
|
"m_cpu", "hbm_ctrl", "sram",
|
||||||
"bridge.left", "bridge.right", "noc", "sram",
|
"pe0", "pe1", "pe2", "pe3", "pe4", "pe5", "pe6", "pe7",
|
||||||
"xbar_top", "xbar_bot",
|
"r0c0", "r0c1", "r0c2", "r0c3", "r0c4", "r0c5",
|
||||||
"pe0", "pe1", "pe2", "pe3", "pe4", "pe5", "pe6", "pe7"}
|
"r1c0", "r1c1", "r1c2", "r1c3", "r1c4", "r1c5",
|
||||||
# Add UCIe connection nodes (4 ports × 4 connections)
|
"r2c0", "r2c1", "r2c4", "r2c5",
|
||||||
|
"r3c0", "r3c1", "r3c4", "r3c5",
|
||||||
|
"r4c0", "r4c1", "r4c2", "r4c3", "r4c4", "r4c5",
|
||||||
|
"r5c0", "r5c1", "r5c2", "r5c3", "r5c4", "r5c5"}
|
||||||
|
# Add UCIe connection nodes (4 ports x 4 connections)
|
||||||
for port in ("N", "S", "E", "W"):
|
for port in ("N", "S", "E", "W"):
|
||||||
for ci in range(4):
|
for ci in range(4):
|
||||||
expected.add(f"ucie-{port}.conn{ci}")
|
expected.add(f"ucie-{port}.conn{ci}")
|
||||||
@@ -249,20 +265,22 @@ def test_cube_view_has_all_components():
|
|||||||
def test_cube_view_hbm_at_center():
|
def test_cube_view_hbm_at_center():
|
||||||
v = _graph().cube_view
|
v = _graph().cube_view
|
||||||
assert v.nodes["hbm_ctrl"].pos_mm == (6.5, 7.0)
|
assert v.nodes["hbm_ctrl"].pos_mm == (6.5, 7.0)
|
||||||
assert v.nodes["noc"].pos_mm == (10.5, 7.0)
|
assert "r0c0" in v.nodes # routers exist in cube view
|
||||||
assert v.width_mm == 17.0
|
assert v.width_mm == 17.0
|
||||||
assert v.height_mm == 14.0
|
assert v.height_mm == 14.0
|
||||||
|
|
||||||
|
|
||||||
def test_cube_view_pe_to_noc():
|
def test_cube_view_pe_to_router():
|
||||||
"""PEs connect to NOC in cube view (no per-PE xbar)."""
|
"""PEs connect to their assigned routers in cube view."""
|
||||||
v = _graph().cube_view
|
v = _graph().cube_view
|
||||||
ves = {(e.src, e.dst) for e in v.edges}
|
ves = {(e.src, e.dst) for e in v.edges}
|
||||||
for i in range(8):
|
pe_router_map = {"pe0": "r0c0", "pe1": "r0c1", "pe2": "r1c4", "pe3": "r1c5",
|
||||||
assert (f"pe{i}", "noc") in ves
|
"pe4": "r4c0", "pe5": "r4c1", "pe6": "r5c4", "pe7": "r5c5"}
|
||||||
|
for pe, router in pe_router_map.items():
|
||||||
|
assert (pe, router) in ves, f"{pe} should connect to {router}"
|
||||||
|
|
||||||
|
|
||||||
# ── Views: PE ────────────────────────────────────────────────────────
|
# -- Views: PE ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_pe_view_has_all_components():
|
def test_pe_view_has_all_components():
|
||||||
@@ -284,7 +302,7 @@ def test_pe_view_edges():
|
|||||||
assert ("pe_math", "pe_tcm") in ves
|
assert ("pe_math", "pe_tcm") in ves
|
||||||
|
|
||||||
|
|
||||||
# ── SRAM ────────────────────────────────────────────────────────────
|
# -- SRAM ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_sram_node_exists():
|
def test_sram_node_exists():
|
||||||
@@ -293,92 +311,42 @@ def test_sram_node_exists():
|
|||||||
assert g.nodes["sip0.cube0.sram"].kind == "sram"
|
assert g.nodes["sip0.cube0.sram"].kind == "sram"
|
||||||
|
|
||||||
|
|
||||||
def test_noc_to_sram_edges():
|
def test_sram_to_router_edges():
|
||||||
es = _edge_set(_graph())
|
es = _edge_set(_graph())
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
assert (f"{cp}.noc", f"{cp}.sram") in es
|
# SRAM connects to router r3c0
|
||||||
assert (f"{cp}.sram", f"{cp}.noc") in es
|
assert (f"{cp}.sram", f"{cp}.r3c0") in es
|
||||||
|
assert (f"{cp}.r3c0", f"{cp}.sram") in es
|
||||||
|
|
||||||
|
|
||||||
# ── PE_DMA → NOC (non-HBM data path) ───────────────────────────────
|
# -- PE_DMA -> Router (data path) ---------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_pe_dma_to_noc_edges():
|
def test_pe_dma_to_router_edges():
|
||||||
es = _edge_set(_graph())
|
es = _edge_set(_graph())
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
for i in range(8):
|
# Each PE DMA connects to its local router
|
||||||
assert (f"{cp}.pe{i}.pe_dma", f"{cp}.noc") in es
|
pe_router_map = {
|
||||||
|
0: "r0c0", 1: "r0c1", 2: "r1c4", 3: "r1c5",
|
||||||
|
4: "r4c0", 5: "r4c1", 6: "r5c4", 7: "r5c5",
|
||||||
|
}
|
||||||
|
for i, router in pe_router_map.items():
|
||||||
|
assert (f"{cp}.pe{i}.pe_dma", f"{cp}.{router}") in es
|
||||||
|
|
||||||
|
|
||||||
# ── Bridge connects XBAR halves (not NOC) ──────────────────────────
|
# -- UCIe conn nodes connect to routers (not NOC) -----------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_bridge_connects_xbar_top_bot():
|
|
||||||
"""Bridges connect xbar_top ↔ xbar_bot (bidirectional)."""
|
|
||||||
es = _edge_set(_graph())
|
|
||||||
cp = "sip0.cube0"
|
|
||||||
for bname in ("left", "right"):
|
|
||||||
br = f"{cp}.bridge.{bname}"
|
|
||||||
assert (f"{cp}.xbar_top", br) in es
|
|
||||||
assert (br, f"{cp}.xbar_top") in es
|
|
||||||
assert (f"{cp}.xbar_bot", br) in es
|
|
||||||
assert (br, f"{cp}.xbar_bot") in es
|
|
||||||
|
|
||||||
|
|
||||||
def test_no_bridge_to_noc_edges():
|
|
||||||
es = _edge_set(_graph())
|
|
||||||
cp = "sip0.cube0"
|
|
||||||
assert (f"{cp}.bridge.left", f"{cp}.noc") not in es
|
|
||||||
assert (f"{cp}.bridge.right", f"{cp}.noc") not in es
|
|
||||||
|
|
||||||
|
|
||||||
# ── Cube view: new edges ────────────────────────────────────────────
|
|
||||||
|
|
||||||
|
|
||||||
def test_cube_view_pe_to_noc_edges():
|
|
||||||
"""All PEs connect to NOC in cube view."""
|
|
||||||
v = _graph().cube_view
|
|
||||||
ves = {(e.src, e.dst) for e in v.edges}
|
|
||||||
for i in range(8):
|
|
||||||
assert (f"pe{i}", "noc") in ves
|
|
||||||
|
|
||||||
|
|
||||||
def test_cube_view_sram():
|
|
||||||
v = _graph().cube_view
|
|
||||||
assert "sram" in v.nodes
|
|
||||||
ves = {(e.src, e.dst) for e in v.edges}
|
|
||||||
assert ("noc", "sram") in ves
|
|
||||||
assert ("sram", "noc") in ves
|
|
||||||
|
|
||||||
|
|
||||||
def test_cube_view_bridge_xbar():
|
|
||||||
"""Cube view bridges connect xbar_top ↔ xbar_bot."""
|
|
||||||
v = _graph().cube_view
|
|
||||||
ves = {(e.src, e.dst) for e in v.edges}
|
|
||||||
for bname in ("left", "right"):
|
|
||||||
br = f"bridge.{bname}"
|
|
||||||
assert ("xbar_top", br) in ves
|
|
||||||
assert (br, "xbar_top") in ves
|
|
||||||
assert ("xbar_bot", br) in ves
|
|
||||||
assert (br, "xbar_bot") in ves
|
|
||||||
|
|
||||||
|
|
||||||
def test_ucie_noc_reverse_edges():
|
def test_ucie_noc_reverse_edges():
|
||||||
"""UCIe ports connect to NOC via conn nodes (bidirectional)."""
|
"""UCIe ports connect to routers via conn nodes (bidirectional)."""
|
||||||
es = _edge_set(_graph())
|
es = _edge_set(_graph())
|
||||||
cp = "sip0.cube1" # non-edge cube to avoid io-cube edges
|
cp = "sip0.cube1" # non-edge cube to avoid io-cube edges
|
||||||
for port in ("N", "S", "E", "W"):
|
for port in ("N", "S", "E", "W"):
|
||||||
# Direct ucie→noc no longer exists; path goes through conn nodes
|
# Each conn has edges: ucie<->conn, conn<->router
|
||||||
assert (f"{cp}.ucie-{port}", f"{cp}.noc") not in es
|
|
||||||
# Each conn has edges: ucie↔conn, conn↔noc
|
|
||||||
for ci in range(4):
|
for ci in range(4):
|
||||||
conn = f"{cp}.ucie-{port}.conn{ci}"
|
conn = f"{cp}.ucie-{port}.conn{ci}"
|
||||||
assert (f"{cp}.ucie-{port}", conn) in es, \
|
assert (f"{cp}.ucie-{port}", conn) in es, \
|
||||||
f"missing ucie-{port}->conn{ci}"
|
f"missing ucie-{port}->conn{ci}"
|
||||||
assert (conn, f"{cp}.noc") in es, \
|
|
||||||
f"missing conn{ci}->noc"
|
|
||||||
assert (f"{cp}.noc", conn) in es, \
|
|
||||||
f"missing noc->conn{ci}"
|
|
||||||
assert (conn, f"{cp}.ucie-{port}") in es, \
|
assert (conn, f"{cp}.ucie-{port}") in es, \
|
||||||
f"missing conn{ci}->ucie-{port}"
|
f"missing conn{ci}->ucie-{port}"
|
||||||
|
|
||||||
@@ -396,31 +364,60 @@ def test_ucie_conn_nodes_exist():
|
|||||||
|
|
||||||
|
|
||||||
def test_ucie_conn_edge_bw():
|
def test_ucie_conn_edge_bw():
|
||||||
"""conn↔NOC edges must have per_connection_bw_gbs (128 GB/s)."""
|
"""conn<->router edges must have per_connection_bw_gbs (128 GB/s)."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
edge_map = {(e.src, e.dst): e for e in g.edges}
|
edge_map = {(e.src, e.dst): e for e in g.edges}
|
||||||
cp = "sip0.cube0"
|
cp = "sip0.cube0"
|
||||||
|
# Check conn0 for each port connects to a router with correct bw
|
||||||
for port in ("N", "S", "E", "W"):
|
for port in ("N", "S", "E", "W"):
|
||||||
for ci in range(4):
|
for ci in range(4):
|
||||||
conn_id = f"{cp}.ucie-{port}.conn{ci}"
|
conn_id = f"{cp}.ucie-{port}.conn{ci}"
|
||||||
e = edge_map[(conn_id, f"{cp}.noc")]
|
# Find the ucie_conn_to_router edge
|
||||||
assert e.bw_gbs == 128.0, f"{conn_id}→noc bw={e.bw_gbs}"
|
conn_edges = [e for e in g.edges
|
||||||
e_rev = edge_map[(f"{cp}.noc", conn_id)]
|
if e.src == conn_id and e.kind == "ucie_conn_to_router"]
|
||||||
assert e_rev.bw_gbs == 128.0
|
assert len(conn_edges) == 1, f"expected 1 ucie_conn_to_router from {conn_id}"
|
||||||
|
assert conn_edges[0].bw_gbs == 128.0
|
||||||
|
|
||||||
|
|
||||||
def test_cross_cube_path_includes_conn():
|
def test_cross_cube_path_includes_conn():
|
||||||
"""PE cross-cube path must traverse conn nodes."""
|
"""PE cross-cube path must traverse conn nodes."""
|
||||||
g = _graph()
|
g = _graph()
|
||||||
router = PathRouter(g)
|
router = PathRouter(g)
|
||||||
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl.slice0")
|
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl")
|
||||||
conn_nodes = [n for n in path if ".conn" in n]
|
conn_nodes = [n for n in path if ".conn" in n]
|
||||||
assert len(conn_nodes) >= 2, f"Expected >=2 conn nodes in path, got {conn_nodes}"
|
assert len(conn_nodes) >= 2, f"Expected >=2 conn nodes in path, got {conn_nodes}"
|
||||||
|
|
||||||
|
|
||||||
def test_noc_to_xbar_top_bot_edges():
|
# -- Cube view: edges ---------------------------------------------------------
|
||||||
"""NOC connects to xbar_top and xbar_bot."""
|
|
||||||
es = _edge_set(_graph())
|
|
||||||
cp = "sip0.cube0"
|
def test_cube_view_pe_to_router_edges():
|
||||||
assert (f"{cp}.noc", f"{cp}.xbar_top") in es
|
"""All PEs connect to their routers in cube view."""
|
||||||
assert (f"{cp}.noc", f"{cp}.xbar_bot") in es
|
v = _graph().cube_view
|
||||||
|
ves = {(e.src, e.dst) for e in v.edges}
|
||||||
|
pe_router_map = {"pe0": "r0c0", "pe1": "r0c1", "pe2": "r1c4", "pe3": "r1c5",
|
||||||
|
"pe4": "r4c0", "pe5": "r4c1", "pe6": "r5c4", "pe7": "r5c5"}
|
||||||
|
for pe, router in pe_router_map.items():
|
||||||
|
assert (pe, router) in ves, f"{pe} should connect to {router}"
|
||||||
|
|
||||||
|
|
||||||
|
def test_cube_view_sram():
|
||||||
|
v = _graph().cube_view
|
||||||
|
assert "sram" in v.nodes
|
||||||
|
ves = {(e.src, e.dst) for e in v.edges}
|
||||||
|
assert ("sram", "r3c0") in ves
|
||||||
|
|
||||||
|
|
||||||
|
def test_cube_view_hbm_router():
|
||||||
|
"""Cube view: PE routers connect to hbm_ctrl."""
|
||||||
|
v = _graph().cube_view
|
||||||
|
ves = {(e.src, e.dst) for e in v.edges}
|
||||||
|
assert ("r0c0", "hbm_ctrl") in ves # PE0's router → HBM
|
||||||
|
|
||||||
|
|
||||||
|
def test_cube_view_m_cpu_router():
|
||||||
|
"""Cube view: m_cpu connects to its router r1c2."""
|
||||||
|
v = _graph().cube_view
|
||||||
|
ves = {(e.src, e.dst) for e in v.edges}
|
||||||
|
assert ("m_cpu", "r1c2") in ves
|
||||||
|
assert ("r1c2", "m_cpu") in ves
|
||||||
|
|||||||
@@ -34,14 +34,13 @@ def test_svg_output_is_deterministic(tmp_path):
|
|||||||
def test_cube_svg_contains_hbm_ctrl(tmp_path):
|
def test_cube_svg_contains_hbm_ctrl(tmp_path):
|
||||||
_emit(tmp_path)
|
_emit(tmp_path)
|
||||||
svg = (tmp_path / "cube_view.svg").read_text()
|
svg = (tmp_path / "cube_view.svg").read_text()
|
||||||
assert "HBM CTRL" in svg
|
assert "HBM_CTRL" in svg
|
||||||
|
|
||||||
|
|
||||||
def test_cube_svg_contains_ucie_ports(tmp_path):
|
def test_cube_svg_contains_ucie_ports(tmp_path):
|
||||||
_emit(tmp_path)
|
_emit(tmp_path)
|
||||||
svg = (tmp_path / "cube_view.svg").read_text()
|
svg = (tmp_path / "cube_view.svg").read_text()
|
||||||
for port in ("UCIe-N", "UCIe-S", "UCIe-W", "UCIe-E"):
|
assert "UCIe" in svg
|
||||||
assert port in svg
|
|
||||||
|
|
||||||
|
|
||||||
def test_cube_svg_contains_pe_nodes(tmp_path):
|
def test_cube_svg_contains_pe_nodes(tmp_path):
|
||||||
|
|||||||
@@ -55,7 +55,7 @@ cube:
|
|||||||
ucie_mm: { size: 2.0 }
|
ucie_mm: { size: 2.0 }
|
||||||
|
|
||||||
pe_layout:
|
pe_layout:
|
||||||
corners: [NW, NE, SW, SE] # N corners → xbar top row; S corners → xbar bottom row
|
corners: [NW, NE, SW, SE] # N corners → top PE rows; S corners → bottom PE rows
|
||||||
pe_per_corner: 2 # total PEs per cube: 4 * 2 = 8
|
pe_per_corner: 2 # total PEs per cube: 4 * 2 = 8
|
||||||
|
|
||||||
pe_template:
|
pe_template:
|
||||||
@@ -84,19 +84,22 @@ cube:
|
|||||||
hbm_total_gb_per_cube: 48
|
hbm_total_gb_per_cube: 48
|
||||||
hbm_slices_per_cube: 8
|
hbm_slices_per_cube: 8
|
||||||
hbm_total_bw_gbs: 1024.0
|
hbm_total_bw_gbs: 1024.0
|
||||||
|
hbm_mapping_mode: n_to_one # one_to_one | n_to_one (ADR-0019)
|
||||||
|
hbm_pseudo_channels: 64 # total pseudo channels per cube
|
||||||
|
hbm_channels_per_pe: 8 # = pseudo_channels / pes_per_cube
|
||||||
|
hbm_channel_bw_gbs: 32.0 # per-channel bandwidth (GB/s)
|
||||||
|
|
||||||
components:
|
components:
|
||||||
noc: { kind: noc, impl: noc_2d_mesh_v1, attrs: { overhead_ns: 0.0 } }
|
noc_router: { kind: noc_router, impl: forwarding_v1, attrs: { overhead_ns: 2.0 } }
|
||||||
m_cpu: { kind: m_cpu, impl: m_cpu_v1, attrs: { overhead_ns: 5.0 } }
|
m_cpu: { kind: m_cpu, impl: m_cpu_v1, attrs: { overhead_ns: 5.0 } }
|
||||||
xbar:
|
|
||||||
top: { kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 2.0 } }
|
|
||||||
bottom: { kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 2.0 } }
|
|
||||||
bridges:
|
|
||||||
- { id: left, kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 1.0 } }
|
|
||||||
- { id: right, kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 1.0 } }
|
|
||||||
hbm_ctrl: { kind: hbm_ctrl, impl: hbm_ctrl_v1, attrs: { capacity: 1, efficiency: 1.0 } }
|
hbm_ctrl: { kind: hbm_ctrl, impl: hbm_ctrl_v1, attrs: { capacity: 1, efficiency: 1.0 } }
|
||||||
sram: { kind: sram, impl: sram_v1, attrs: { size_mb: 32, overhead_ns: 2.0 } }
|
sram: { kind: sram, impl: sram_v1, attrs: { size_mb: 32, overhead_ns: 2.0 } }
|
||||||
|
|
||||||
|
# Physical placement of non-PE components (mm coordinates)
|
||||||
|
placement:
|
||||||
|
m_cpu: { pos_mm: [7.5, 3.0] } # top center, below UCIe-N
|
||||||
|
sram: { pos_mm: [1.5, 9.0] } # left side, below HBM zone
|
||||||
|
|
||||||
ucie:
|
ucie:
|
||||||
decompose: true
|
decompose: true
|
||||||
ports: [N, S, E, W]
|
ports: [N, S, E, W]
|
||||||
@@ -105,19 +108,15 @@ cube:
|
|||||||
per_connection_bw_gbs: 128.0 # BW per connection; 4 × 128 = 512 GB/s = UCIe PHY BW
|
per_connection_bw_gbs: 128.0 # BW per connection; 4 × 128 = 512 GB/s = UCIe PHY BW
|
||||||
|
|
||||||
links:
|
links:
|
||||||
xbar_to_hbm_bw_gbs: 256.0 # per-slice effective (2048 / 8 slices)
|
# Router mesh links (ADR-0019)
|
||||||
xbar_to_bridge_bw_gbs: 128.0 # bridge BW (xbar_top/bot ↔ bridge)
|
router_link_bw_gbs: 256.0 # inter-router XY mesh link BW
|
||||||
xbar_to_bridge_mm: 3.0 # xbar ↔ bridge wire distance
|
router_overhead_ns: 2.0 # per-router switching overhead
|
||||||
xbar_to_hbm_mm: 2.5
|
pe_to_router_bw_gbs: 256.0 # PE_DMA ↔ router (= N × channel_bw)
|
||||||
pe_dma_to_noc_bw_gbs: 256.0 # PE → NOC BW (= HBM slice BW, no bottleneck)
|
hbm_to_router_bw_gbs: 256.0 # HBM_CTRL ↔ router (= N × channel_bw)
|
||||||
noc_to_xbar_mm: 0.0 # noc is distributed; distance modeled as 0
|
sram_to_router_bw_gbs: 128.0 # SRAM ↔ router
|
||||||
noc_to_xbar_bw_gbs: 256.0 # NOC → xbar_top/bot BW (= HBM slice BW)
|
m_cpu_to_router_mm: 0.0 # M_CPU ↔ router distance
|
||||||
noc_to_sram_mm: 0.0 # noc is distributed; distance modeled as 0
|
pe_dma_to_noc_bw_gbs: 256.0 # PE → router BW (= HBM slice BW, no bottleneck)
|
||||||
noc_to_sram:
|
noc_to_pe_cpu_mm: 0.0 # router → PE_CPU distance (command path)
|
||||||
per_connection_bw_gbs: 128.0 # BW per NOC connection
|
|
||||||
n_connections: 4 # 4 × 128 = 512 GB/s aggregate
|
|
||||||
m_cpu_to_noc_mm: 0.0 # noc is distributed; distance modeled as 0
|
|
||||||
noc_to_pe_cpu_mm: 0.0 # noc is distributed; distance modeled as 0
|
|
||||||
|
|
||||||
visualization:
|
visualization:
|
||||||
emit_views: [system, sip, cube]
|
emit_views: [system, sip, cube]
|
||||||
|
|||||||