21 Commits

Author SHA1 Message Date
ywkang eb792e6212 Remove xbar/noc remnants, rule-based cube-view connectors
- Delete xbar.py and noc.py (TwoDMeshNocComponent) — unused since router mesh
- Remove xbar_v1/noc_2d_mesh_v1 from components.yaml
- Fix pe_to_xbar → pe_to_router in routing exclusion set
- Fix xbar_to_hbm_bw_gbs → hbm_to_router_bw_gbs in report.py
- Update all docstrings/comments referencing xbar/bridge → router mesh
- Cube-view connectors: rule-based _connector_points helper
  - PE↔router: single diagonal line (not chevron)
  - UCIe N/S: 45°→horizontal→45°
  - UCIe E/W: 45°→vertical→45°
  - HBM ports: 45°→horizontal→45°

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:59:12 -07:00
ywkang 7640635f90 M_CPU/SRAM placement via pos_mm in topology.yaml (nearest router)
Component placement uses mm coordinates in topology.yaml, mesh_gen
finds the nearest router automatically. M_CPU moved to pos_mm=[7.5,2.0]
(→ r0c2), SRAM at pos_mm=[1.5,9.0] (→ r3c0).

No hardcoded router references in topology config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:48:20 -07:00
ywkang 3ea4fa90f8 Cube-view: increase 45° stub length and component gap for visibility
Stub length increased to 12px (PE/HBM) and 10px (UCIe).
Gap between router and component increased to 30px so both
45° stubs (router end + component end) are clearly visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:38:27 -07:00
ywkang 5125d92c17 Cube-view: M_CPU north, 45° stub-straight-stub connector pattern
- M_CPU placed north (above) its router
- All connectors: 45° stub from router → straight → 45° stub to component
- Consistent 4-point polyline pattern for PE, M_CPU, SRAM, HBM, UCIe

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:34:48 -07:00
ywkang 72acc5c8bb Cube-view: UCIe flush against cube edges
UCIe position calculated with minimal inset (0.3 × size) to
place components flush against cube boundary edges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:28:58 -07:00
ywkang bde76ec959 Cube-view: 45° diagonal from router, then straight to component
All connectors now start with 45° diagonal from router edge,
then go straight (vertical/horizontal) to the component block.
Applies to PE, M_CPU/SRAM, PE→HBM, and UCIe connectors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:25:41 -07:00
ywkang d3de982ea4 Cube-view: 90° router mesh links, 45° component connectors
Router-router mesh links remain straight (horizontal/vertical).
All component→router connectors use 45° L-bend polylines:
- PE blocks: vertical then 45° diagonal to router
- M_CPU/SRAM: horizontal then 45° diagonal to router
- PE→HBM port group: vertical then 45° diagonal
- UCIe port→router: direction-aware 45° bend

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:20:28 -07:00
ywkang df81835d84 Cube-view: UCIe position/size from topology.yaml (ucie_mm.size=2.0)
UCIe components placed at defined positions from _cube_local_positions
with size from cube.geometry.ucie_mm.size. N/S horizontal, E/W vertical.
Connection ports rendered as color-coded boxes inside UCIe component.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:11:11 -07:00
ywkang 66ec6cd40c Cube-view: UCIe components inside cube boundary with port boxes
- UCIe-N/S/E/W drawn as component blocks inside cube boundary
  (inset 3mm from edge)
- Each UCIe has c0-c3 connection ports as color-coded boxes inside
- Connector lines from each port box to its attached router
- Removed old UCIe rendering that placed blocks outside cube

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:58:32 -07:00
ywkang e766163a25 Cube-view: HBM pseudo channel ports on edges, UCIe flush to cube border
- HBM pseudo channel ports split to top/bottom edges of HBM zone
  (32 ports each, 8 per PE, color-coded)
- PE→HBM lines connect router to its port group center
- Per-PE label: "PE0×8ch" with BW annotation
- UCIe blocks flush against cube edges at router positions
- UCIe blocks smaller (22×10px)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:38:10 -07:00
ywkang 24faf2e1d4 Cube-view: angle HBM lines, offset M_CPU/SRAM blocks
- HBM connection lines angled 30% toward HBM center (not vertical)
  to distinguish from mesh links
- M_CPU/SRAM blocks placed to the left of their router
  with horizontal connector lines (avoid mesh overlap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:30:56 -07:00
ywkang 7cd30e106e Fix Router→HBM_CTRL lines visibility in cube_view
Draw HBM connection lines last (on top of component blocks).
PE routers: thicker (1.5px, opacity 0.6) with dashed style.
Relay routers: thinner (0.7px, opacity 0.2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:25:40 -07:00
ywkang 109c9b4483 Cube-view: draw all attached components as separate blocks
All router-attached components (PE, M_CPU, SRAM, UCIe) rendered as
labeled blocks with explicit connector lines to their router.
UCIe blocks positioned at cube edges matching port direction.
Router→HBM_CTRL lines shown for all 32 routers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:09:08 -07:00
ywkang e94f1de078 Cube-view SVG: detailed topology validation rendering
- Dedicated cube_view renderer showing 6×6 router grid with attachments
- PE blocks drawn next to their router (above/below)
- HBM pseudo channel port bar (64 ports, color-coded by PE owner)
- Per-PE BW annotations on HBM links
- Router color-coded by type (PE/M_CPU/SRAM/UCIe/relay)
- Title shows mode, channel count, per-PE and total BW
- Legend for all component types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:03:38 -07:00
ywkang 5c6abe6d12 Reduce SRAM/UCIe/M_CPU/HBM node sizes, thin HBM and mesh links
Shrink cube-view component nodes to avoid clutter.
HBM and router_mesh edge lines made thinner and more transparent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:51:41 -07:00
ywkang f298e3c7cc Offset PE nodes in cube_view to avoid overlapping routers
PE nodes are shifted 1.2mm above (top half) or below (bottom half)
their assigned router position. PE size reduced to 1.4x0.7mm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:50:32 -07:00
ywkang 91085733ba Show individual routers in cube_view SVG, fix row Y overlap
- cube_view now renders all 32 router nodes from cube_mesh.yaml
  instead of collapsed "router_mesh" placeholder
- Fix mesh_gen row Y position overlap (r1/r2 and r3/r4 had same Y)
  by adding hbm_gap spacing between PE rows and HBM zone
- Add noc_router to visualizer KIND_SIZE for proper sizing
- Update cube view tests for individual router nodes

339 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:22:38 -07:00
ywkang d2c92b8a18 Wire PE_MMU to router mesh for MmuMapMsg delivery
Add router → PE_MMU edge so MmuMapMsg can reach PE_MMU via
the router mesh. Unskip all PE_MMU fabric tests.

339 passed, 0 skipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:10:42 -07:00
ywkang 08256c1326 Fix cross-SIP PE_TCM access by scoping deploy to target_device SIP
RuntimeContext._ensure_allocators() now limits SIP range to
target_device (single SIP or all). Prevents cross-SIP tensor
deployment that caused PE_TCM routing errors.
Also accept 'sip0' format (without colon) in DeviceSelector.

331 passed, 8 skipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:03:11 -07:00
ywkang 624161f52f Update web viewer for router mesh topology (ADR-0019)
Remove all xbar/bridge rendering from cube detail view.
Replace 8 HBM slices with single HBM_CTRL block.
Add green dotted lines showing router-to-HBM connectivity.
Update legend, event animation, and PE view NOC destinations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:56:05 -07:00
ywkang 5917b3497c Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)
- Remove xbar_top/bot, bridge, single noc node from topology
- Each cube_mesh.yaml router becomes a separate SimPy node (r{row}c{col})
- HBM_CTRL consolidated to single node per cube, attached to all routers
- All traffic (DMA data + PE command) routes through same router mesh
- Update AddressResolver (no slice suffix), PathRouter (_adj_local)
- Update ADR-0002~0019, SPEC.md to remove xbar/bridge references
- Regenerate SVG diagrams for new topology structure
- Skip cross-SIP PE_TCM and PE_MMU routing tests (not yet wired)

326 passed, 13 skipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:51:28 -07:00
44 changed files with 1883 additions and 2066 deletions
+3 -4
View File
@@ -104,7 +104,7 @@ The simulator MUST accept multiple topologies (YAML / JSON / dict), varying:
- SIP count,
- CUBE count per SIP,
- PE count per CUBE,
- on-chip fabric structure (e.g., mesh / NoC / XBAR),
- on-chip fabric structure (e.g., mesh / NoC router grid),
- IO chiplets and interconnects,
- link bandwidth, latency, and capacity parameters.
@@ -119,8 +119,7 @@ Given a topology:
All components MUST be replaceable behind stable interfaces, including:
- routers and fabrics (NoC, bridges, switches),
- XBAR-like selectors,
- routers and fabrics (NoC router mesh, switches),
- DMA engines and queues,
- memory controllers and services (HBM, TCM, queues),
- management and control processors (modeled components).
@@ -226,7 +225,7 @@ No implicit translation or hidden latency is allowed.
### 2.1 Graph Execution Model
- Nodes represent modeled components (PE blocks, XBAR, NoC, bridges,
- Nodes represent modeled components (PE blocks, NoC routers,
HBM controllers, IO components, etc.).
- Directed edges represent interconnect links with latency and bandwidth attributes.
- Execution model:
-3
View File
@@ -28,9 +28,6 @@ components:
switch_v1: kernbench.components.builtin.forwarding:TransitComponent
noc_v1: kernbench.components.builtin.forwarding:TransitComponent
ucie_v1: kernbench.components.builtin.forwarding:TransitComponent
noc_2d_mesh_v1: kernbench.components.builtin.noc:TwoDMeshNocComponent
xbar_v1: kernbench.components.builtin.xbar:PositionAwareXbarComponent
# IO / Host interface
pcie_ep_v1: kernbench.components.builtin.pcie_ep:PcieEpComponent
io_cpu_v1: kernbench.components.builtin.io_cpu:IoCpuComponent
+5 -6
View File
@@ -34,12 +34,11 @@ shortcuts that obscure control paths.
(topology + policy + request).
### D3. Bypass is explicit and graph-represented
- Any bypass (e.g., local cube HBM access via XBAR instead of NOC) must be:
- explicitly represented as a graph path, and
- subject to latency accumulation like any other path.
- Example: PE_DMA has dual egress — one to XBAR (HBM path) and one to NOC (non-HBM path).
Both are explicit graph edges; neither is a “bypass” — they are distinct data paths
serving different memory domains.
- All paths must be explicitly represented in the graph and subject to latency accumulation.
- Example: PE_DMA connects to the NOC router mesh (ADR-0019). All destinations
(HBM, shared SRAM, inter-cube UCIe) are reached via explicit mesh hops.
Local HBM access has minimal hops (switching overhead only); remote access
traverses additional routers.
- Implicit or “magic” bypass paths are disallowed.
### D4. No zero-latency end-to-end paths
+5 -6
View File
@@ -35,12 +35,11 @@ We model the system hierarchy explicitly:
- A CUBE contains:
- HBM + memory controller (HBM_CTRL)
- XBAR (top/bottom): HBM pseudo-channel crossbar, PE's dedicated path to HBM
- Bridge (left/right): connects XBAR.top ↔ XBAR.bottom for cross-half HBM access
- NOC: 2D mesh router grid spanning the entire cube with XY routing and
per-segment contention modeling; carries all intra-cube traffic including
PE DMA to xbar (HBM), inter-cube (UCIe), command (M_CPU↔PE_CPU), and
shared SRAM access. See ADR-0017 for full NOC architecture.
- NOC router mesh: 2D grid of explicit routers (from cube_mesh.yaml) with XY routing;
carries all intra-cube traffic including HBM data, inter-cube (UCIe),
command (M_CPU↔PE_CPU), and shared SRAM access.
HBM_CTRL is attached to PE routers (local HBM = 0 hop).
See ADR-0017 and ADR-0019 for full architecture.
- Shared SRAM: cube-level shared memory accessible by all PEs via NOC
- management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation
- multiple PEs
@@ -14,9 +14,9 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
### D1. Local HBM definition
- Each PE is assigned a logically defined “local HBM” region.
- Local HBM corresponds to the pseudo-channel subset directly attached to that PEs DMA path
via the XBAR (top or bottom, depending on PE corner placement).
- The path is: PE_DMA → XBAR.top/bottom → HBM_CTRL.
- Local HBM corresponds to the pseudo-channel subset directly attached to that PEs
router in the NOC mesh (ADR-0019).
- The path is: PE_DMA → local router → HBM_CTRL (switching overhead only, 0 mesh hops).
- The mapping (HBM pseudo-channels → PE local regions) is derived from topology configuration.
### D2. Local HBM bandwidth guarantee contract
@@ -27,19 +27,18 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
The efficiency factor (configured via `hbm_ctrl.attrs.efficiency`, default 0.8)
models real-world DRAM inefficiencies (refresh cycles, bank conflicts, page
misses). For example: 256 GB/s spec x 0.8 = 204.8 GB/s effective.
- The topology builder applies the efficiency factor to xbar-to-hbm edge
- The topology builder applies the efficiency factor to router-to-hbm edge
bandwidth at graph construction time, so all downstream routing and latency
computation uses the effective value.
- This guarantee is modeled by:
- a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point,
- while still incurring non-zero latency along explicitly modeled components.
### D3. Cross-half HBM semantics
### D3. Remote PE HBM semantics (intra-cube)
- A PE connected to XBAR.bottom that accesses HBM pseudo-channels on the XBAR.top half
(or vice versa) traverses a bridge:
- PE_DMA → XBAR.bottom → bridge → XBAR.top → HBM_CTRL
- Bridge bandwidth may limit cross-half HBM access relative to local-half access.
- A PE that accesses another PE's local HBM traverses the router mesh:
- PE_DMA → local router → (mesh hops) → target PE's router → HBM_CTRL
- Router mesh bandwidth and hop count may limit remote HBM access relative to local access.
### D4. Non-local HBM semantics (inter-cube / inter-SIP)
@@ -61,7 +60,7 @@ Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth,
Tests should cover:
- local-HBM case: BW matches HBM BW regardless of fabric BW parameter
- cross-half HBM case: latency includes bridge traversal
- remote PE HBM case: latency includes mesh hop traversal
- non-local cases (inter-cube/inter-SIP): BW/latency respond to fabric/link parameters
- shared SRAM case: access via NOC with correct BW
@@ -82,9 +82,8 @@ Explain cube-internal structure and data/control flow.
**Visible elements**
- XBAR (top/bottom): HBM pseudo-channel crossbar
- Bridge (left/right): cross-half HBM connectors between XBAR.top and XBAR.bottom
- NOC: distributed on-die fabric for non-HBM traffic
- Router mesh: 2D grid of NOC routers (from cube_mesh.yaml), all traffic routes through mesh
- HBM_CTRL attached to PE routers (local HBM = 0 hop)
- HBM subsystem (HBM_CTRL)
- Shared SRAM: cube-level shared memory
- Management CPU (M_CPU)
@@ -97,14 +96,13 @@ Explain cube-internal structure and data/control flow.
**Visible links**
- PE → XBAR (HBM data path, top or bottom by corner placement)
- PE → NOC (non-HBM data path)
- XBAR ↔ bridge ↔ XBAR (cross-half HBM access)
- XBAR → HBM_CTRL
- NOC ↔ UCIe endpoints
- NOC ↔ shared SRAM
- M_CPU ↔ NOC (command path)
- NOC → PE_CPU (command delivery, collapsed into PE block)
- PE → router (HBM + non-HBM data path via mesh)
- Router ↔ HBM_CTRL (local HBM access)
- Router ↔ Router (mesh hops for remote access)
- Router ↔ UCIe endpoints
- Router ↔ shared SRAM
- M_CPU ↔ router (command path)
- Router → PE_CPU (command delivery, collapsed into PE block)
---
@@ -61,9 +61,9 @@ For each view (SIP / CUBE / PE):
- preserve connectivity semantics relevant to that view,
- compute distance buckets and assign layout layers deterministically.
- CUBE-level projection MUST include:
- XBAR (top/bottom), bridge (left/right), NOC, HBM_CTRL, shared SRAM, M_CPU, UCIe ports,
- Router mesh (from cube_mesh.yaml), HBM_CTRL, shared SRAM, M_CPU, UCIe ports,
and PEs as opaque blocks.
- Distinct edge kinds for HBM path (PE→XBAR) vs non-HBM path (PE→NOC).
- All paths (HBM, non-HBM, command) route through the same router mesh (ADR-0019).
- Default anchors are implicit (ADR-0005) and MUST NOT require instance indices.
### D6. Output formats and determinism
@@ -44,14 +44,15 @@ Each PE contains the following logical components.
**PE_DMA**
- Handles memory transfers between PE_TCM and external memory domains.
- PE_DMA has **dual egress** at the CUBE level:
- **→ XBAR**: dedicated path to HBM (local and cross-half via bridge)
- **→ NOC**: path to non-HBM destinations (shared SRAM, inter-cube UCIe, etc.)
- PE_DMA connects to the NOC router mesh at the CUBE level (ADR-0019):
- All destinations (HBM, shared SRAM, inter-cube UCIe) are reached via the router mesh
- Local HBM access: PE_DMA → local router → hbm_ctrl (switching overhead only)
- Remote/shared: PE_DMA → local router → (mesh hops) → destination
- Supported directions include:
- HBM → PE_TCM (via XBAR)
- PE_TCM → HBM (via XBAR)
- PE_TCM → shared SRAM (via NOC)
- PE_TCM → other memory domains (via NOC, if supported by topology)
- HBM → PE_TCM (via router mesh)
- PE_TCM → HBM (via router mesh)
- PE_TCM → shared SRAM (via router mesh)
- PE_TCM → other memory domains (via router mesh, if supported by topology)
**PE_GEMM**
@@ -251,7 +252,7 @@ Compute operations use a TCM-centric dataflow model.
**Input path (HBM)**
```text
HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
HBM → router mesh → PE_DMA (DMA_READ) → PE_TCM
```
**Input path (shared SRAM)**
@@ -268,14 +269,14 @@ Compute engines read input tensors from PE_TCM.
PE_TCM → GEMM / MATH
```
Weights for GEMM may optionally stream directly from HBM (via XBAR).
Weights for GEMM may optionally stream directly from HBM (via router mesh).
**Output path (HBM)**
Compute results are written to PE_TCM, then DMA writes to HBM.
```text
PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
PE_TCM → PE_DMA (DMA_WRITE) → router mesh → HBM
```
**Output path (shared SRAM)**
@@ -347,9 +348,9 @@ PE instances are derived from `cube.pe_layout`.
External connectivity such as:
- PE_DMA → XBAR (HBM data path)
- PE_DMA → NOC (non-HBM data path: shared SRAM, inter-cube UCIe)
- NOC → PE_CPU (command path from M_CPU)
- PE_DMA → router mesh → HBM (data path, ADR-0019)
- PE_DMA → router mesh → shared SRAM, inter-cube UCIe (non-HBM data path)
- router mesh → PE_CPU (command path from M_CPU)
is modeled at the CUBE level (see ADR-0003 D3).
@@ -104,13 +104,13 @@ Kernel Launch routes through M_CPU for PE fan-out.
```text
pcie_ep → io_noc → io_ucie
→ [transit cubes: ucie_in → noc → ucie_out] (zero or more)
→ target cube: ucie_in → noc → xbar → hbm_ctrl
→ target cube: ucie_in → router mesh → hbm_ctrl
```
**Memory R/W completion path:**
```text
hbm_ctrl → xbar → noc → [transit cubes: ucie → noc → ucie]
hbm_ctrl → router mesh → [transit cubes: ucie → router mesh → ucie]
→ io_ucie → io_noc → pcie_ep
```
@@ -49,7 +49,7 @@ Memory operations (MemoryWrite, MemoryRead) are routed directly from pcie_ep
through io_noc to the target cube, bypassing io_cpu entirely:
```text
pcie_ep → io_noc → conn → io_ucie → [cube UCIe] → noc → xbar → hbm_ctrl
pcie_ep → io_noc → conn → io_ucie → [cube UCIe] → router mesh → hbm_ctrl
```
This avoids the 10ns io_cpu overhead for pure data transfers. The simulation
+18 -18
View File
@@ -16,9 +16,10 @@ architecture.
### D1. NOC node and router grid
Each cube contains a single NOC topology node (`sip{S}.cube{C}.noc`)
implemented as `noc_2d_mesh_v1`. Internally, the NOC models a 2D router
grid generated by `mesh_gen.py`.
Each cube contains a 2D router mesh generated by `mesh_gen.py`.
Each router is a separate topology node (`sip{S}.cube{C}.r{row}c{col}`)
implemented as `forwarding_v1`. (Supersedes the original single-node
`noc_2d_mesh_v1` design — see ADR-0019.)
Grid properties:
@@ -82,8 +83,8 @@ PE4.cpu <--+ | | +--< PE6.cpu
|
UCIe-S (conn x4)
xbar_top attached to: r0c0, r0c1, r1c4, r1c5 (top-half PE routers)
xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
HBM attach: PE가 있는 라우터에 hbm_ctrl도 연결 (ADR-0019 D1)
(xbar_top/xbar_bot은 ADR-0019에 의해 제거됨)
```
### D5. NOC edge bandwidths and distances
@@ -92,8 +93,7 @@ xbar_bot attached to: r4c0, r4c1, r5c4, r5c5 (bottom-half PE routers)
| --- | --- | --- | --- |
| PE_DMA -> NOC | 256.0 | Physical (PE pos) | Matches HBM slice BW |
| NOC -> PE_CPU | - | 0.0 mm | Command path only |
| NOC <-> xbar_top | 256.0 | 0.0 mm | Per xbar half |
| NOC <-> xbar_bot | 256.0 | 0.0 mm | Per xbar half |
| Router <-> HBM_CTRL | 256.0 | 0.0 mm | Per PE router (ADR-0019) |
| NOC <-> M_CPU | - | 0.0 mm | Command path |
| NOC <-> SRAM | 128.0 x4 | 0.0 mm | 512 GB/s aggregate |
| NOC <-> UCIe conn | 128.0 | 0.0 mm | Per connection, 4 per port |
@@ -117,7 +117,7 @@ Inter-cube traffic path:
```text
Source: PE_DMA -> NOC -> conn{i} -> ucie-{PORT}
[UCIe link: 512 GB/s, 1.0mm seam distance]
Target: ucie-{PORT} -> conn{i} -> NOC -> xbar -> HBM
Target: ucie-{PORT} -> conn{i} -> r{x}c{y} -> (mesh hops) -> hbm_ctrl
```
UCIe overhead (8.0 ns) is applied at each ucie-{PORT} node, so a
@@ -128,31 +128,31 @@ full crossing incurs 16 ns (TX port + RX port).
**PE DMA to local HBM (same half):**
```text
PE_DMA -> NOC -> xbar_top -> HBM_CTRL.slice{0-3}
PE_DMA -> r{x}c{y} -> hbm_ctrl (local: 0 mesh hops, switching overhead only)
```
**PE DMA to cross-half HBM:**
**PE DMA to remote PE's HBM:**
```text
PE_DMA -> NOC -> xbar_top -> bridge -> xbar_bot -> HBM_CTRL.slice{4-7}
PE_DMA -> r{x}c{y} -> (mesh hops) -> r{x'}c{y'} -> hbm_ctrl
```
**PE DMA to remote cube HBM:**
```text
PE_DMA -> NOC -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> NOC -> xbar -> HBM
PE_DMA -> r{x}c{y} -> conn -> ucie-E -> [seam] -> ucie-W -> conn -> r{x'}c{y'} -> hbm_ctrl
```
**Kernel Launch command to PE:**
```text
[from io_noc] -> ucie -> conn -> NOC -> M_CPU -> NOC -> PE_CPU
[from io_noc] -> ucie -> conn -> r{x}c{y} -> (mesh hops) -> M_CPU -> (mesh hops) -> PE_CPU
```
**Shared SRAM access:**
```text
PE_DMA -> NOC -> SRAM
PE_DMA -> r{x}c{y} -> (mesh hops) -> SRAM
```
### D8. Mesh generation
@@ -169,7 +169,7 @@ The generator produces a `mesh_data` dictionary containing:
- PE-to-router attachments (pe_dma, pe_cpu per PE)
- UCIe-to-router attachments (N/S/E/W, distributed across edge routers)
- M_CPU and SRAM router attachments
- xbar_top/bot router assignments (top-half vs bottom-half PE routers)
- HBM attachment per PE router (ADR-0019)
## Consequences
@@ -182,8 +182,8 @@ The generator produces a `mesh_data` dictionary containing:
## Links
- ADR-0003 D3 (cube-level NOC definition — extended by this ADR)
- ADR-0004 D1 (PE DMA to local HBM path via xbar)
- ADR-0004 D3 (cross-half HBM via bridge)
- ADR-0014 D1 (PE_DMA dual egress: xbar for HBM, NOC for non-HBM)
- ADR-0004 D1 (PE DMA to local HBM path via router mesh)
- ADR-0014 D1 (PE_DMA egress via router mesh)
- ADR-0019 (NOC-Local HBM — xbar/bridge 제거, 명시적 라우터 mesh)
- ADR-0015 D4 (fabric paths for Memory R/W and Kernel Launch)
- ADR-0016 D1 (IOChiplet io_noc — analogous pattern at IO chiplet level)
+1 -1
View File
@@ -247,7 +247,7 @@ simulator의 routing 및 resource 모델에서 직접 사용 가능한 request
DmaReadCmd.src_addr (VA)
→ MMU.translate(VA) → PA
→ PhysAddr.decode(PA) → PhysAddr object
→ resolver.resolve(PhysAddr) → dst_node_id (e.g., "sip0.cube0.hbm_ctrl.slice3")
→ resolver.resolve(PhysAddr) → dst_node_id (e.g., "sip0.cube0.hbm_ctrl")
→ router.find_path(pe_prefix, dst_node_id) → path
→ 1개 sub-Transaction 생성 → fabric inject
```
+82 -164
View File
@@ -36,16 +36,14 @@ topology 파라미터로 결정된다.
## Decision
### D1. HBM controller는 CUBE당 단일 endpoint로 정의한
### D1. HBM은 PE 라우터에 attach된
현재의 `hbm_ctrl.slice{0-7}` (8개 노드)를 **`hbm_ctrl` 단일 노드**로 통합한다.
현재의 `hbm_ctrl.slice{0-7}` (8개 노드)를 **`hbm_ctrl` 단일 노드**로 통합하고,
PE가 attach된 라우터에 HBM access point도 함께 attach한다.
- pseudo channel은 HBM controller 노드 자체가 아니라,
controller에 연결되는 **link의 단위**로 표현한다
- HBM controller 내부의 read/write resource 모델은 유지하되,
mode에 따라 contention 단위가 달라진다:
- 1:1 mode: per-channel link가 BW contention point (controller는 terminal)
- n:1 mode: aggregated link가 BW contention point (controller는 terminal)
- n:1 mode: PE의 local HBM 접근은 자기 라우터에서 바로 (switching overhead만, 0 hop)
- remote PE의 HBM 접근: mesh hop을 거쳐 대상 PE의 라우터에 도달
- HBM controller 내부의 read/write resource 모델은 유지
노드 네이밍 변경:
@@ -53,198 +51,127 @@ topology 파라미터로 결정된다.
| ---- | ------- |
| `sip0.cube0.hbm_ctrl.slice0` ~ `slice7` | `sip0.cube0.hbm_ctrl` (단일) |
`mesh_gen.py`에서 PE attachment에 `pe{idx}.hbm`을 추가하여,
builder가 해당 라우터와 hbm_ctrl 간 edge를 생성한다.
---
### D2. xbar, bridge 완전 제거
### D2. xbar, bridge, 단일 NOC 노드 완전 제거
기존 다음 노드 및 관련 edge를 모두 제거한다:
- `{cube}.xbar_top`, `{cube}.xbar_bot`
- `{cube}.bridge.left`, `{cube}.bridge.right`
- `{cube}.noc` (단일 TwoDMeshNocComponent 노드)
- `noc_to_xbar`, `xbar_to_noc`, `xbar_to_hbm`, `hbm_to_xbar` 종류의 edge
- `xbar_to_bridge`, `bridge_to_xbar` 종류의 edge
- `pe_to_noc`, `noc_to_pe`, `noc_to_pe_cpu` 등 단일 noc 노드 참조 edge
이들의 역할(PE→HBM 라우팅, cross-half 연결)은
channel router 및 horizontal line 연결이 대체한다 (D3, D4 참조).
이들의 역할은 **cube_mesh.yaml 기반의 명시적 라우터 mesh**가 대체한다.
기존 `mesh_gen.py`가 생성하는 6×6 라우터 grid의 각 라우터(r0c0, r0c1, ...)를
별도의 SimPy 노드로 topology graph에 생성하고,
인접 라우터 간 XY mesh edge로 연결한다.
---
### D3. 1:1 mode: per-channel router 기반 연결
### D3. 명시적 라우터 mesh (n:1 / 1:1 공통 기반)
#### channel router 정의
#### cube_mesh.yaml 기반 라우터 노드
1:1 mode에서 graph compiler는 pseudo-channel 수만큼의 **channel router** 노드
생성한다. channel router는 NOC의 일부이다.
`mesh_gen.py`가 생성한 cube_mesh.yaml의 각 non-null 라우터
topology graph의 **별도 SimPy 노드**로 생성한다.
```text
파라미터 예: hbm_pseudo_channels=64, pes_per_cube=8
→ channels_per_pe = 8, 총 64개 channel router 생성
```
- 노드 ID: `{cube}.r{row}c{col}` (e.g., `sip0.cube0.r0c0`)
- kind: `noc_router`, impl: `forwarding_v1`
- pos_mm: cube_mesh.yaml에서 가져옴
노드 네이밍: `{cube}.ch_r{global_channel_id}`
기존 cube_mesh.yaml의 attach 정보에 따라 각 라우터에 component를 연결:
- `pe{p}.dma` → PE_DMA ↔ 라우터 edge
- `pe{p}.cpu` → PE_CPU ↔ 라우터 edge
- `pe{p}.hbm` → HBM_CTRL ↔ 라우터 edge (n:1에서 추가)
- `m_cpu` → M_CPU ↔ 라우터 edge
- `sram` → SRAM ↔ 라우터 edge
- `ucie_{dir}.c{i}` → UCIe conn ↔ 라우터 edge
| PE | 소유 channel routers |
| -- | -------------------- |
| PE0 | ch_r0, ch_r1, ..., ch_r7 |
| PE1 | ch_r8, ch_r9, ..., ch_r15 |
| ... | ... |
| PE7 | ch_r56, ch_r57, ..., ch_r63 |
라우터 간 XY mesh edge: 인접 라우터 간 bidirectional edge.
null 라우터(HBM exclusion zone)는 skip.
일반화: PE `p`는 channel `p * channels_per_pe` ~ `(p+1) * channels_per_pe - 1`을 소유.
#### 1:1 mode 확장 (나중에 구현)
#### PE_DMA ↔ channel router 연결
각 PE_DMA는 자신의 local channel router N개와 양방향 link로 연결된다:
```text
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.ch_r0 (bw: channel_bw_gbs)
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.ch_r1 (bw: channel_bw_gbs)
...
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.ch_r7 (bw: channel_bw_gbs)
```
- edge kind: `pe_to_ch_router` / `ch_router_to_pe`
- BW: `hbm_channel_bw_gbs` (e.g., 32 GB/s)
- distance: PE에서 channel router까지의 물리적 거리 (layout 기반)
#### channel router ↔ HBM controller 연결
각 channel router는 cube의 hbm_ctrl과 양방향 link로 연결된다:
```text
sip0.cube0.ch_r0 ←→ sip0.cube0.hbm_ctrl (bw: channel_bw_gbs)
sip0.cube0.ch_r1 ←→ sip0.cube0.hbm_ctrl (bw: channel_bw_gbs)
...
sip0.cube0.ch_r63 ←→ sip0.cube0.hbm_ctrl (bw: channel_bw_gbs)
```
- edge kind: `ch_router_to_hbm` / `hbm_to_ch_router`
- BW: `hbm_channel_bw_gbs` (e.g., 32 GB/s)
#### 1:1 mode 전체 데이터 경로
```text
PE0.pe_dma
├→ ch_r0 → hbm_ctrl (32 GB/s)
├→ ch_r1 → hbm_ctrl (32 GB/s)
├→ ...
└→ ch_r7 → hbm_ctrl (32 GB/s)
총 PE0 local BW = N × channel_bw_gbs
```
1:1 mode에서는 각 라우터가 N개 channel mini-router로 분화된다.
per-channel routing과 ChannelSplitter (LA → per-channel PA) 도입이 필요.
PE당 N개 GEMM engine도 이 시점에 추가.
---
### D4. 1:1 mode: horizontal line 연결 (cross-PE channel 접근)
### D4. cross-PE HBM 접근 (n:1 mode)
#### 배치 규칙
n:1 mode에서 PE가 다른 PE의 local HBM에 접근하는 경우,
cube_mesh.yaml의 XY mesh를 통해 대상 PE의 라우터까지 hop한다.
같은 **logical index**를 가지는 channel router들을 동일한 horizontal row에 배치한다.
logical index 정의: `logical_idx = global_channel_id % channels_per_pe`
예: PE0(r0c0)이 PE2(r1c4)의 HBM에 접근:
```text
파라미터 예: channels_per_pe=8, pes_per_cube=8
Row 0: ch_r0 (PE0) ↔ ch_r8 (PE1) ↔ ch_r16 (PE2) ↔ ... ↔ ch_r56 (PE7)
Row 1: ch_r1 (PE0) ↔ ch_r9 (PE1) ↔ ch_r17 (PE2) ↔ ... ↔ ch_r57 (PE7)
Row 2: ch_r2 (PE0) ↔ ch_r10 (PE1) ↔ ch_r18 (PE2) ↔ ... ↔ ch_r58 (PE7)
...
Row 7: ch_r7 (PE0) ↔ ch_r15 (PE1) ↔ ch_r23 (PE2) ↔ ... ↔ ch_r63 (PE7)
PE0.pe_dma → r0c0 → r0c1 → r0c2 → r0c3 → r0c4 → r1c4 → hbm_ctrl
```
일반화: Row `r`에는 `{ch_r(p * N + r) | p ∈ 0..pes_per_cube-1}`이 위치.
여기서 `N = channels_per_pe`.
Dijkstra router가 mesh에서 최단 경로를 탐색한다.
#### horizontal line edge
같은 row에서 인접한 channel router끼리 양방향 edge로 연결:
```text
ch_r0 ↔ ch_r8 ↔ ch_r16 ↔ ... ↔ ch_r56
```
- edge kind: `ch_horizontal`
- BW: `hbm_channel_bw_gbs` (or configurable inter-PE channel BW)
- distance: PE 간 물리적 거리
#### cross-PE HBM 접근 경로 (1:1 mode)
PE0이 PE1의 local channel (ch_r8)에 접근하는 경우:
```text
PE0.pe_dma → ch_r0 → ch_r8 (horizontal hop) → hbm_ctrl
```
Dijkstra router가 horizontal line을 통해 최단 경로를 탐색한다.
#### 설계 의도
이 배치 규칙은:
- routing 규칙 단순화: horizontal = cross-PE, vertical = PE-local
- 거리 계산 단순화: row 내 hop 수 = |src_pe - dst_pe|
- 구조적 반복성 확보: 모든 row가 동일한 구조
1:1 mode에서의 cross-PE channel 접근은 D3의 1:1 확장 시 정의한다.
---
### D5. n:1 mode: aggregated router 기반 연결
### D5. n:1 mode: cube_mesh.yaml 라우터 mesh 사용
#### aggregated router 정의
n:1 mode에서 graph compiler는 PE당 1개의 **aggregated router** 노드를 생성한다.
aggregated router는 NOC의 일부이다.
노드 네이밍: `{cube}.pe{p}.agg_router`
n:1 mode에서는 별도의 "aggregated router"를 생성하지 않는다.
기존 cube_mesh.yaml의 라우터 grid가 그 역할을 한다.
#### 연결 구조
```text
sip0.cube0.pe0.pe_dma ←→ sip0.cube0.pe0.agg_router (bw: N × channel_bw_gbs)
sip0.cube0.pe0.agg_router ←→ sip0.cube0.hbm_ctrl (bw: N × channel_bw_gbs)
```
- edge kind: `pe_to_agg_router` / `agg_router_to_pe`, `agg_to_hbm` / `hbm_to_agg`
- BW: `channels_per_pe × hbm_channel_bw_gbs` (e.g., 8 × 32 = 256 GB/s)
#### cross-PE 접근 (n:1 mode)
PE0이 PE1의 local HBM에 접근하는 경우:
각 PE가 attach된 라우터에 PE_DMA, PE_CPU, HBM이 함께 연결된다:
```text
PE0.pe_dma → PE0.agg_router → PE1.agg_router → hbm_ctrl
sip0.cube0.pe0.pe_dma sip0.cube0.r0c0 (bw: N × channel_bw_gbs)
sip0.cube0.hbm_ctrl ←→ sip0.cube0.r0c0 (bw: N × channel_bw_gbs)
```
aggregated router 간 연결:
```text
pe0.agg_router ↔ pe1.agg_router ↔ pe2.agg_router ↔ ... ↔ pe7.agg_router
```
- edge kind: `agg_horizontal`
- BW: configurable (inter-PE aggregated BW)
라우터 간 XY mesh edge로 연결. PE의 local HBM 접근은
자기 라우터에서 바로 (switching overhead만).
#### n:1 mode 전체 데이터 경로
**local HBM (0 hop):**
```text
PE0.pe_dma → PE0.agg_router → hbm_ctrl
(BW = N × channel_bw_gbs = 256 GB/s)
PE0.pe_dma → r0c0 → hbm_ctrl (switching overhead only)
```
**remote HBM (mesh hops):**
```text
PE0.pe_dma → r0c0 → r0c1 → ... → r1c4 → hbm_ctrl
```
**M_CPU DMA:**
```text
M_CPU → r2c0 → (mesh hops) → r{x}c{y} → hbm_ctrl
```
---
### D6. local / remote access를 NOC로 통일한다
### D6. 모든 트래픽을 동일 router mesh로 통일한다
- 모든 memory access는 NOC(channel router 또는 aggregated router)를 통해 전달된
- 모든 memory access (DMA data)와 command (PE_CPU)가 동일 router mesh를 사용한
- local access도 별도의 fast path(xbar)를 사용하지 않는다
- cross-cube (remote) access 경로:
```text
1:1 mode: PE_DMA → ch_r{local} → ch_r{...} → UCIe → remote_ch_r → remote_hbm_ctrl
n:1 mode: PE_DMA → agg_router → UCIe → remote_agg_router → remote_hbm_ctrl
PE_DMA → r{x}c{y} → (mesh hops) → ucie_conn → ucie-{PORT}
→ [UCIe link] → remote ucie → remote conn → remote r{x}c{y} → hbm_ctrl
```
UCIe 연결은 기존 구조를 유지하되,
양쪽 endpoint가 xbar 대신 channel router 또는 aggregated router가 된다.
양쪽 endpoint가 xbar 대신 mesh 라우터가 된다.
UCIe line 수는 BW 비율로 결정: `ucie_lines_per_side = ceil(ucie_bw / noc_line_bw)`.
---
@@ -266,9 +193,7 @@ return f"sip{s}.cube{c}.hbm_ctrl"
```
pe_slice 계산이 제거된다.
BAAW가 이미 dst_node를 결정하므로, PE_DMA의 1:1 mode에서는
resolver를 거치지 않고 BAAW가 직접 channel router node_id를 반환한다.
n:1 mode에서도 BAAW가 aggregated router node_id를 반환한다.
n:1 mode에서 PE_DMA는 자기 라우터에 attach된 hbm_ctrl에 직접 접근한다.
resolver.resolve()는 외부 접근(M_CPU DMA 등) 및 backward compatibility용으로 유지한다.
@@ -305,16 +230,10 @@ links:
```yaml
links:
pe_to_ch_router_bw_gbs: 32.0 # PE_DMA ↔ channel router
pe_to_ch_router_mm: 1.0 # 물리적 거리
ch_router_to_hbm_bw_gbs: 32.0 # channel router ↔ hbm_ctrl
ch_router_to_hbm_mm: 2.0 # 물리적 거리
ch_horizontal_bw_gbs: 32.0 # channel router 간 horizontal link
ch_horizontal_mm: 1.5 # PE 간 horizontal 거리
# n:1 mode용
pe_to_agg_router_bw_gbs: 256.0 # PE_DMA ↔ aggregated router
agg_to_hbm_bw_gbs: 256.0 # aggregated router ↔ hbm_ctrl
agg_horizontal_bw_gbs: 256.0 # aggregated router 간 link
router_link_bw_gbs: 256.0 # 라우터 간 XY mesh link BW
router_overhead_ns: 2.0 # 라우터 switching overhead
pe_to_router_bw_gbs: 256.0 # PE_DMA ↔ 라우터
hbm_to_router_bw_gbs: 256.0 # HBM ↔ 라우터 (= N × channel_bw)
```
---
@@ -341,19 +260,18 @@ links:
### Positive
- 1:1 mode에서 pseudo-channel 단위 BW contention 모델링이 자연스럽
- n:1 mode에서 aggregated bandwidth 모델이 단순하
- local / remote access 경로가 NOC로 통일된
- cube_mesh.yaml 기반 라우터 mesh로 물리적 배치를 정확히 반영한
- n:1 mode에서 기존 VA 체계를 유지하여 전환 비용이 낮
- local / remote / command 트래픽이 동일 mesh로 통일되어 단순하
- graph compiler 기반 topology 생성과 잘 맞는다
- channel 수, PE 수가 모두 파라미터이므로 다양한 구성을 테스트할 수 있다
- 1:1 mode 확장이 라우터 분화로 자연스럽게 가능하다
### Negative
- 1:1 mode에서 router 및 link 수가 크게 증가한다
(64 channel routers + 64 edges to HBM + 56 horizontal edges per cube)
- local access도 NOC 경로를 사용하므로 모델이 더 일반화된다
- 기존 xbar 기반 테스트 전면 재작성 필요
- SimPy 노드 수 증가에 따른 시뮬레이션 성능 영향 가능
- 명시적 라우터 노드로 인해 SimPy 노드 수가 증가한다 (6×6 = 최대 32개 라우터/cube)
- 기존 xbar/bridge/단일 NOC 기반 테스트 전면 재작성 필요
- TwoDMeshNocComponent의 내부 contention 모델을 라우터별 모델로 교체 필요
---
+310 -154
View File
@@ -1,156 +1,312 @@
<svg xmlns="http://www.w3.org/2000/svg" width="556" height="472" viewBox="0 0 556 472">
<svg xmlns="http://www.w3.org/2000/svg" width="970" height="900" viewBox="0 0 970 900">
<title>cube</title>
<rect width="556" height="472" fill="#f8fafc"/>
<text x="278" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">CUBE VIEW</text>
<rect x="40.0" y="40.0" width="476.0" height="392.0" rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>
<rect x="152.0" y="166.0" width="252.0" height="140.0" rx="4" fill="#d1fae5" stroke="#10b981" stroke-width="1.5" stroke-dasharray="6,3" opacity="0.5"/>
<text x="278.0" y="278.0" text-anchor="middle" font-family="monospace" font-size="11" fill="#047857" opacity="0.7">HBM</text>
<polyline points="82.0,82.0 82.0,95.0 82.0,95.0 82.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="82.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="82.0,82.0 82.0,144.0 334.0,144.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,144.0 82.0,144.0 82.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="166.0,82.0 166.0,95.0 166.0,95.0 166.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="166.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="166.0,82.0 166.0,154.0 334.0,154.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,144.0 166.0,144.0 166.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="390.0,82.0 390.0,95.0 390.0,95.0 390.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="390.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="390.0,82.0 390.0,164.0 334.0,164.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,144.0 390.0,144.0 390.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="474.0,82.0 474.0,95.0 474.0,95.0 474.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="474.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="474.0,82.0 474.0,174.0 334.0,174.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,144.0 474.0,144.0 474.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="82.0,390.0 82.0,347.0 82.0,347.0 82.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="82.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="82.0,390.0 82.0,338.0 334.0,338.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,298.0 82.0,298.0 82.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="166.0,390.0 166.0,347.0 166.0,347.0 166.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="166.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="166.0,390.0 166.0,348.0 334.0,348.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,298.0 166.0,298.0 166.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="390.0,390.0 390.0,347.0 390.0,347.0 390.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="390.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="390.0,390.0 390.0,358.0 334.0,358.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,298.0 390.0,298.0 390.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="474.0,390.0 474.0,347.0 474.0,347.0 474.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
<text x="474.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
<polyline points="474.0,390.0 474.0,368.0 334.0,368.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<polyline points="334.0,236.0 334.0,298.0 474.0,298.0 474.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="82.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="152.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="166.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="194.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="390.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="306.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="474.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="348.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="82.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="152.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="166.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="194.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="390.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="306.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<polyline points="474.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
<text x="348.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
<line x1="82.0" y1="138.0" x2="166.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="124.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="166.0" y1="138.0" x2="82.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="124.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="166.0" y1="138.0" x2="390.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="278.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
<line x1="390.0" y1="138.0" x2="166.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="278.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
<line x1="390.0" y1="138.0" x2="474.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="432.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="474.0" y1="138.0" x2="390.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="432.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="82.0" y1="334.0" x2="166.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="124.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="166.0" y1="334.0" x2="82.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="124.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="166.0" y1="334.0" x2="390.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="278.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
<line x1="390.0" y1="334.0" x2="166.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="278.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
<line x1="390.0" y1="334.0" x2="474.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="432.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<line x1="474.0" y1="334.0" x2="390.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
<text x="432.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
<polyline points="82.0,138.0 110.0,138.0 110.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="96.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="110.0,292.0 82.0,292.0 82.0,138.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="96.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="82.0,334.0 110.0,334.0 110.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="96.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="110.0,292.0 82.0,292.0 82.0,334.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="96.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="474.0,138.0 446.0,138.0 446.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="460.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="446.0,292.0 474.0,292.0 474.0,138.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="460.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="474.0,334.0 446.0,334.0 446.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="460.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="446.0,292.0 474.0,292.0 474.0,334.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
<text x="460.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
<polyline points="334.0,236.0 334.0,131.4 278.0,131.4 278.0,56.8" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
<polyline points="334.0,236.0 334.0,310.6 278.0,310.6 278.0,415.2" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
<polyline points="334.0,236.0 334.0,221.0 488.0,221.0 488.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
<polyline points="334.0,236.0 334.0,221.0 68.0,221.0 68.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
<polyline points="446.0,194.0 446.0,200.0 334.0,200.0 334.0,236.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="334.0,236.0 334.0,200.0 446.0,200.0 446.0,194.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<polyline points="334.0,236.0 110.0,236.0 110.0,194.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.8"/>
<polyline points="110.0,194.0 334.0,194.0 334.0,236.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.8"/>
<rect x="250.0" y="40.0" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
<text x="278.0" y="60.8" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-N</text>
<rect x="250.0" y="398.4" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
<text x="278.0" y="419.2" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-S</text>
<rect x="460.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
<text x="488.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-E</text>
<rect x="40.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
<text x="68.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-W</text>
<rect x="306.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#a78bfa" stroke="#475569" stroke-width="1"/>
<text x="334.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">NOC</text>
<rect x="418.0" y="177.2" width="56.0" height="33.6" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
<text x="446.0" y="198.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">M CPU</text>
<rect x="194.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
<text x="222.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#ffffff">HBM CTRL</text>
<rect x="82.0" y="177.2" width="56.0" height="33.6" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
<text x="110.0" y="198.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">SRAM</text>
<rect x="82.0" y="275.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="110.0" y="296.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">Bridge LEFT</text>
<rect x="418.0" y="275.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="446.0" y="296.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">Bridge RIGHT</text>
<rect x="56.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="82.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE0</text>
<rect x="54.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="82.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE0</text>
<rect x="140.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="166.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE1</text>
<rect x="138.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="166.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE1</text>
<rect x="364.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="390.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE2</text>
<rect x="362.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="390.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE2</text>
<rect x="448.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="474.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE3</text>
<rect x="446.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="474.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE3</text>
<rect x="56.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="82.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE4</text>
<rect x="54.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="82.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE4</text>
<rect x="140.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="166.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE5</text>
<rect x="138.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="166.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE5</text>
<rect x="364.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="390.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE6</text>
<rect x="362.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="390.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE6</text>
<rect x="448.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
<text x="474.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE7</text>
<rect x="446.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
<text x="474.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE7</text>
<rect width="970" height="900" fill="#0f172a"/>
<text x="485" y="22" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#94a3b8">CUBE TOPOLOGY — 17.0×14.0mm | 6×6 Router Mesh | n_to_one mode | 64 pseudo-ch</text>
<text x="485" y="40" text-anchor="middle" font-family="monospace" font-size="10" fill="#64748b">Per-PE: 8 ch × 32.0 GB/s = 256.0 GB/s | Cube total: 64 × 32.0 = 2048.0 GB/s</text>
<rect x="60" y="60" width="850.0" height="700.0" rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>
<rect x="260" y="285" width="450" height="250" rx="6" fill="#052e16" stroke="#047857" stroke-width="2" opacity="0.6"/>
<text x="485" y="395" text-anchor="middle" font-family="monospace" font-size="11" font-weight="bold" fill="#047857">HBM_CTRL | 64 pseudo channels</text>
<text x="485" y="412" text-anchor="middle" font-family="monospace" font-size="9" fill="#05966988">Total BW: 2048 GB/s</text>
<rect x="270.0" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="283.4" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="296.9" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="310.3" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="323.8" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="337.2" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="350.6" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="364.1" y="289" width="12.9" height="8" rx="1" fill="#3b82f6" opacity="0.8"/>
<rect x="377.5" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="390.9" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="404.4" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="417.8" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="431.2" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="444.7" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="458.1" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="471.6" y="289" width="12.9" height="8" rx="1" fill="#60a5fa" opacity="0.8"/>
<rect x="485.0" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="498.4" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="511.9" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="525.3" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="538.8" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="552.2" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="565.6" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="579.1" y="289" width="12.9" height="8" rx="1" fill="#8b5cf6" opacity="0.8"/>
<rect x="592.5" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="605.9" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="619.4" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="632.8" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="646.2" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="659.7" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="673.1" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<rect x="686.6" y="289" width="12.9" height="8" rx="1" fill="#a78bfa" opacity="0.8"/>
<text x="324" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#3b82f6">PE0×8ch</text>
<text x="431" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#60a5fa">PE1×8ch</text>
<text x="539" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#8b5cf6">PE2×8ch</text>
<text x="646" y="286" text-anchor="middle" font-family="monospace" font-size="6" fill="#a78bfa">PE3×8ch</text>
<rect x="270.0" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="283.4" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="296.9" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="310.3" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="323.8" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="337.2" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="350.6" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="364.1" y="523" width="12.9" height="8" rx="1" fill="#f59e0b" opacity="0.8"/>
<rect x="377.5" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="390.9" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="404.4" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="417.8" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="431.2" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="444.7" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="458.1" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="471.6" y="523" width="12.9" height="8" rx="1" fill="#fbbf24" opacity="0.8"/>
<rect x="485.0" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="498.4" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="511.9" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="525.3" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="538.8" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="552.2" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="565.6" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="579.1" y="523" width="12.9" height="8" rx="1" fill="#ef4444" opacity="0.8"/>
<rect x="592.5" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="605.9" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="619.4" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="632.8" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="646.2" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="659.7" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="673.1" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<rect x="686.6" y="523" width="12.9" height="8" rx="1" fill="#f87171" opacity="0.8"/>
<text x="324" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#f59e0b">PE4×8ch</text>
<text x="431" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#fbbf24">PE5×8ch</text>
<text x="539" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#ef4444">PE6×8ch</text>
<text x="646" y="539" text-anchor="middle" font-family="monospace" font-size="6" fill="#f87171">PE7×8ch</text>
<line x1="135" y1="135" x2="285" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="135" x2="135" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="135" x2="435" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="135" x2="285" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="135" x2="585" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="135" x2="435" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="135" x2="685" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="135" x2="585" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="135" x2="835" y2="135" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="135" x2="685" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="835" y1="135" x2="835" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="260" x2="285" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="260" x2="135" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="260" x2="435" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="260" x2="285" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="260" x2="585" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="260" x2="435" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="260" x2="685" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="260" x2="585" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="260" x2="835" y2="260" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="260" x2="685" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="835" y1="260" x2="835" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="335" x2="285" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="335" x2="135" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="335" x2="685" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="335" x2="285" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="335" x2="835" y2="335" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="335" x2="685" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="835" y1="335" x2="835" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="485" x2="285" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="485" x2="135" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="485" x2="685" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="485" x2="285" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="485" x2="835" y2="485" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="485" x2="685" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="835" y1="485" x2="835" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="560" x2="285" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="560" x2="135" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="560" x2="435" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="560" x2="285" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="560" x2="585" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="560" x2="435" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="560" x2="685" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="560" x2="585" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="560" x2="835" y2="560" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="560" x2="685" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="835" y1="560" x2="835" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="135" y1="685" x2="285" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="285" y1="685" x2="435" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="435" y1="685" x2="585" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="585" y1="685" x2="685" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<line x1="685" y1="685" x2="835" y2="685" stroke="#475569" stroke-width="1" opacity="0.4"/>
<circle cx="135" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="135" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c0</text>
<rect x="119" y="81" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="135" y="92" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE0</text>
<line x1="135" y1="127" x2="149" y2="97" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="285" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="285" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c1</text>
<rect x="269" y="81" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="285" y="92" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE1</text>
<line x1="285" y1="127" x2="299" y2="97" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="435" cy="135" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="435" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c2</text>
<circle cx="585" cy="135" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="585" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c3</text>
<circle cx="685" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="685" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c4</text>
<circle cx="835" cy="135" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="835" y="138" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r0c5</text>
<circle cx="135" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="135" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c0</text>
<circle cx="285" cy="260" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="285" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c1</text>
<circle cx="435" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="435" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c2</text>
<rect x="419" y="206" width="32" height="16" rx="3" fill="#451a03" stroke="#f59e0b" stroke-width="1"/>
<text x="435" y="217" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#f59e0b">M_CPU</text>
<line x1="435" y1="252" x2="449" y2="222" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
<circle cx="585" cy="260" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="585" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c3</text>
<circle cx="685" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="685" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c4</text>
<rect x="669" y="206" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="685" y="217" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE2</text>
<line x1="685" y1="252" x2="699" y2="222" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="835" cy="260" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="835" y="263" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r1c5</text>
<rect x="819" y="206" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="835" y="217" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE3</text>
<line x1="835" y1="252" x2="849" y2="222" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="135" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="135" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c0</text>
<circle cx="285" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="285" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c1</text>
<circle cx="685" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="685" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c4</text>
<circle cx="835" cy="335" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="835" y="338" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r2c5</text>
<circle cx="135" cy="485" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="135" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c0</text>
<rect x="119" y="523" width="32" height="16" rx="3" fill="#1c1917" stroke="#d97706" stroke-width="1"/>
<text x="135" y="534" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#d97706">SRAM</text>
<line x1="135" y1="493" x2="149" y2="523" stroke="#d97706" stroke-width="1" opacity="0.6"/>
<circle cx="285" cy="485" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="285" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c1</text>
<circle cx="685" cy="485" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="685" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c4</text>
<circle cx="835" cy="485" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="835" y="488" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r3c5</text>
<circle cx="135" cy="560" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="135" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c0</text>
<rect x="119" y="598" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="135" y="609" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE4</text>
<line x1="135" y1="568" x2="149" y2="598" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="285" cy="560" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="285" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c1</text>
<rect x="269" y="598" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="285" y="609" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE5</text>
<line x1="285" y1="568" x2="299" y2="598" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="435" cy="560" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="435" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c2</text>
<circle cx="585" cy="560" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="585" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c3</text>
<circle cx="685" cy="560" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="685" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c4</text>
<circle cx="835" cy="560" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="835" y="563" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r4c5</text>
<circle cx="135" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="135" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c0</text>
<circle cx="285" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="285" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c1</text>
<circle cx="435" cy="685" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="435" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c2</text>
<circle cx="585" cy="685" r="8" fill="#334155" stroke="#475569" stroke-width="1"/>
<text x="585" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c3</text>
<circle cx="685" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="685" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c4</text>
<rect x="669" y="723" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="685" y="734" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE6</text>
<line x1="685" y1="693" x2="699" y2="723" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<circle cx="835" cy="685" r="8" fill="#475569" stroke="#64748b" stroke-width="1"/>
<text x="835" y="688" text-anchor="middle" font-family="monospace" font-size="6" fill="white">r5c5</text>
<rect x="819" y="723" width="32" height="16" rx="3" fill="#2d1f3d" stroke="#a855f7" stroke-width="1"/>
<text x="835" y="734" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#a855f7">PE7</text>
<line x1="835" y1="693" x2="849" y2="723" stroke="#a855f7" stroke-width="1" opacity="0.6"/>
<polyline points="135,143 208,216 251,216 324,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="239" y="216" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="285,143 358,216 358,216 431,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="368" y="216" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="685,268 674,278 549,278 539,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="622" y="278" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="835,268 824,278 657,278 646,289" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="751" y="278" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="135,552 146,542 313,542 324,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="239" y="542" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="285,552 296,542 421,542 431,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="368" y="542" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="685,677 612,604 612,604 539,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="622" y="604" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<polyline points="835,677 762,604 719,604 646,531" fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" stroke-dasharray="4,3"/>
<text x="751" y="604" font-family="monospace" font-size="6" fill="#10b98188">256GB/s</text>
<rect x="65" y="360" width="50" height="100" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
<text x="90" y="357" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-W</text>
<rect x="67" y="362" width="46" height="23" rx="2" fill="#818cf8" opacity="0.7"/>
<text x="90" y="376" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
<polyline points="127,135 120,142 120,366 113,374" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
<rect x="67" y="386" width="46" height="23" rx="2" fill="#a78bfa" opacity="0.7"/>
<text x="90" y="400" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
<polyline points="127,260 120,267 120,390 113,398" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
<rect x="67" y="410" width="46" height="23" rx="2" fill="#c084fc" opacity="0.7"/>
<text x="90" y="424" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
<polyline points="127,560 120,553 120,428 113,422" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
<rect x="67" y="434" width="46" height="23" rx="2" fill="#e879f9" opacity="0.7"/>
<text x="90" y="448" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
<polyline points="127,685 120,678 120,452 113,446" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
<rect x="435" y="65" width="100" height="50" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
<text x="485" y="62" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-N</text>
<rect x="437" y="67" width="23" height="46" rx="2" fill="#818cf8" opacity="0.7"/>
<text x="448" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
<polyline points="135,127 142,120 442,120 448,113" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
<rect x="461" y="67" width="23" height="46" rx="2" fill="#a78bfa" opacity="0.7"/>
<text x="472" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
<polyline points="285,127 292,120 466,120 472,113" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
<rect x="485" y="67" width="23" height="46" rx="2" fill="#c084fc" opacity="0.7"/>
<text x="496" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
<polyline points="685,127 678,120 504,120 496,113" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
<rect x="509" y="67" width="23" height="46" rx="2" fill="#e879f9" opacity="0.7"/>
<text x="520" y="93" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
<polyline points="835,127 828,120 528,120 520,113" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
<rect x="855" y="360" width="50" height="100" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
<text x="880" y="357" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-E</text>
<rect x="857" y="362" width="46" height="23" rx="2" fill="#818cf8" opacity="0.7"/>
<text x="880" y="376" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
<polyline points="843,135 850,142 850,367 857,374" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
<rect x="857" y="386" width="46" height="23" rx="2" fill="#a78bfa" opacity="0.7"/>
<text x="880" y="400" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
<polyline points="843,260 850,267 850,391 857,398" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
<rect x="857" y="410" width="46" height="23" rx="2" fill="#c084fc" opacity="0.7"/>
<text x="880" y="424" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
<polyline points="843,560 850,553 850,428 857,422" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
<rect x="857" y="434" width="46" height="23" rx="2" fill="#e879f9" opacity="0.7"/>
<text x="880" y="448" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
<polyline points="843,685 850,678 850,452 857,446" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
<rect x="435" y="705" width="100" height="50" rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>
<text x="485" y="702" text-anchor="middle" font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">UCIe-S</text>
<rect x="437" y="707" width="23" height="46" rx="2" fill="#818cf8" opacity="0.7"/>
<text x="448" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c0</text>
<polyline points="135,693 142,700 442,700 448,707" fill="none" stroke="#818cf8" stroke-width="1" opacity="0.5"/>
<rect x="461" y="707" width="23" height="46" rx="2" fill="#a78bfa" opacity="0.7"/>
<text x="472" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c1</text>
<polyline points="285,693 292,700 466,700 472,707" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.5"/>
<rect x="485" y="707" width="23" height="46" rx="2" fill="#c084fc" opacity="0.7"/>
<text x="496" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c2</text>
<polyline points="685,693 678,700 504,700 496,707" fill="none" stroke="#c084fc" stroke-width="1" opacity="0.5"/>
<rect x="509" y="707" width="23" height="46" rx="2" fill="#e879f9" opacity="0.7"/>
<text x="520" y="733" text-anchor="middle" font-family="monospace" font-size="5" fill="white">c3</text>
<polyline points="835,693 828,700 528,700 520,707" fill="none" stroke="#e879f9" stroke-width="1" opacity="0.5"/>
<rect x="60" y="865" width="10" height="10" rx="2" fill="#3b82f6" stroke="#475569" stroke-width="0.5"/>
<text x="74" y="874" font-family="monospace" font-size="8" fill="#94a3b8">PE Router</text>
<rect x="147" y="865" width="10" height="10" rx="2" fill="#f59e0b" stroke="#475569" stroke-width="0.5"/>
<text x="161" y="874" font-family="monospace" font-size="8" fill="#94a3b8">M_CPU / SRAM</text>
<rect x="255" y="865" width="10" height="10" rx="2" fill="#8b5cf6" stroke="#475569" stroke-width="0.5"/>
<text x="269" y="874" font-family="monospace" font-size="8" fill="#94a3b8">UCIe</text>
<rect x="307" y="865" width="10" height="10" rx="2" fill="#334155" stroke="#475569" stroke-width="0.5"/>
<text x="321" y="874" font-family="monospace" font-size="8" fill="#94a3b8">Relay</text>
<rect x="366" y="865" width="10" height="10" rx="2" fill="#10b981" stroke="#475569" stroke-width="0.5"/>
<text x="380" y="874" font-family="monospace" font-size="8" fill="#94a3b8">HBM Link</text>
<rect x="446" y="865" width="10" height="10" rx="2" fill="#475569" stroke="#475569" stroke-width="0.5"/>
<text x="460" y="874" font-family="monospace" font-size="8" fill="#94a3b8">Mesh Link</text>
</svg>

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 30 KiB

+2
View File
@@ -26,6 +26,8 @@
<text x="285.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE GEMM</text>
<rect x="241.2" y="243.0" width="87.5" height="49.0" rx="4" fill="#ec4899" stroke="#475569" stroke-width="1"/>
<text x="285.0" y="271.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE MATH</text>
<rect x="136.2" y="68.0" width="87.5" height="49.0" rx="4" fill="#e2e8f0" stroke="#475569" stroke-width="1"/>
<text x="180.0" y="96.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE MMU</text>
<rect x="346.2" y="155.5" width="87.5" height="49.0" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
<text x="390.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE TCM</text>
</svg>

Before

Width:  |  Height:  |  Size: 3.2 KiB

After

Width:  |  Height:  |  Size: 3.4 KiB

+4 -4
View File
@@ -51,13 +51,13 @@
<line x1="396.0" y1="504.0" x2="540.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
<text x="468.0" y="500.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
<polyline points="324.0,56.0 108.0,56.0 108.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
<text x="216.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
<text x="216.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
<polyline points="324.0,56.0 252.0,56.0 252.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
<text x="288.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
<text x="288.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
<polyline points="324.0,56.0 396.0,56.0 396.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
<text x="360.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
<text x="360.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
<polyline points="324.0,56.0 540.0,56.0 540.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
<text x="432.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
<text x="432.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 512GB/s</text>
<rect x="84.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
<text x="108.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,0)</text>
<rect x="228.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 10 KiB

+2 -2
View File
@@ -3,9 +3,9 @@
<rect width="768" height="396" fill="#f8fafc"/>
<text x="384" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">SYSTEM VIEW</text>
<polyline points="384.0,60.0 182.0,60.0 182.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
<text x="283.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 256GB/s</text>
<text x="283.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 768GB/s</text>
<polyline points="384.0,60.0 586.0,60.0 586.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
<text x="485.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 256GB/s</text>
<text x="485.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 768GB/s</text>
<rect x="374.0" y="57.0" width="20.0" height="6.0" rx="4" fill="#6366f1" stroke="#475569" stroke-width="1"/>
<text x="384.0" y="64.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">Fabric Switch</text>
<rect x="62.0" y="138.0" width="240.0" height="200.0" rx="4" fill="#e0e7ff" stroke="#475569" stroke-width="1"/>

Before

Width:  |  Height:  |  Size: 1.9 KiB

After

Width:  |  Height:  |  Size: 1.9 KiB

+2 -2
View File
@@ -116,7 +116,7 @@ def _fmt_util(eff: float, bn: float | None) -> str:
def _short_name(node_id: str) -> str:
"""Shorten node id: keep last 2 segments to avoid ambiguity (xbar.pe0 vs pe0)."""
"""Shorten node id: keep last 2 segments to avoid ambiguity (router.pe0 vs pe0)."""
parts = node_id.split(".")
return ".".join(parts[-2:]) if len(parts) >= 2 else node_id
@@ -366,7 +366,7 @@ def run_probe(topology_path: str, case_filter: str | None = None) -> int:
# --- PE DMA Summary Table ---
print()
print(f"=== PE DMA Latency (pe_dma -> xbar -> HBM, data={nbytes}B) ===")
print(f"=== PE DMA Latency (pe_dma -> router -> HBM, data={nbytes}B) ===")
print(f" {'Case':<26} {'Target':<28} {'Actual':>8}"
f" {'Ovhd':>6} {'Drain':>6} {'Wire':>5} {'Ovhd%':>6} {'Drain%':>7}"
f" {'Eff.BW':>8} {'BN.BW':>8} {'Util%':>6}")
+1 -1
View File
@@ -137,7 +137,7 @@ def _extract_peaks(spec: dict | None) -> tuple[float, float]:
gemm_attrs = comps.get("pe_gemm", {}).get("attrs", {})
peak_tflops = float(gemm_attrs.get("peak_tflops_f16", 0.0))
cube_links = cube.get("links", {})
hbm_bw = float(cube_links.get("xbar_to_hbm_bw_gbs", 0.0))
hbm_bw = float(cube_links.get("hbm_to_router_bw_gbs", 0.0))
return peak_tflops, hbm_bw
+1 -1
View File
@@ -114,7 +114,7 @@ class HbmCtrlComponent(ComponentBase):
parts = self.node.id.split(".")
cube_id = int(parts[1].replace("cube", ""))
pe_id = int(parts[3].replace("slice", ""))
pe_id = 0 # single hbm_ctrl, PE info from request
resp_msg = ResponseMsg(
correlation_id=txn.request.correlation_id,
request_id=txn.request.request_id,
+4 -11
View File
@@ -238,14 +238,11 @@ class MCpuComponent(ComponentBase):
def _resolve_dma_destinations(self, request: Any, target_pe: int | str) -> list[str]:
"""Return list of HBM destination node_ids for DMA fan-out.
Uses PA-based resolution to determine the actual target cube and slice,
enabling cross-cube DMA routing when the PA points to a remote cube.
With single hbm_ctrl per cube (ADR-0019), always returns one node.
PA-based resolution still used for cross-cube routing.
"""
cube_prefix = self.node.id.rsplit(".", 1)[0] # e.g. "sip0.cube0"
if isinstance(target_pe, int):
return [f"{cube_prefix}.hbm_ctrl.slice{target_pe}"]
# PA-based resolution: extract actual target from physical address
pa_val = getattr(request, "dst_pa", None) or getattr(request, "src_pa", None)
if pa_val is not None:
@@ -256,12 +253,8 @@ class MCpuComponent(ComponentBase):
except Exception:
pass
# "all" without PA (KernelLaunch): all slices in local cube
n_slices = 8
if self.ctx and self.ctx.spec:
mm = self.ctx.spec.get("cube", {}).get("memory_map", {})
n_slices = mm.get("hbm_slices_per_cube", 8)
return [f"{cube_prefix}.hbm_ctrl.slice{i}" for i in range(n_slices)]
# Default: single hbm_ctrl in local cube
return [f"{cube_prefix}.hbm_ctrl"]
def _mmu_msg_fanout(self, env: simpy.Environment, txn: Any) -> Generator:
"""Fan out MmuMapMsg/MmuUnmapMsg to target PE_MMU(s) via NOC.
-224
View File
@@ -1,224 +0,0 @@
from __future__ import annotations
from collections.abc import Generator
from typing import TYPE_CHECKING, Any
import simpy
from kernbench.components.base import ComponentBase
if TYPE_CHECKING:
from kernbench.components.context import ComponentContext
from kernbench.topology.types import Node
class TwoDMeshNocComponent(ComponentBase):
"""2D mesh NOC modeled as a single smart node.
Latency model:
- Traversal latency = Manhattan distance between prev_hop and next_hop
node positions, split into XY segments, traversed with pipeline.
- overhead_ns (from node.attrs) is added once per traversal.
Contention model:
- Each directed XY segment is a simpy.Resource(capacity=1).
- Pipeline: next segment's resource is requested before the current
segment's timeout completes, so a free downstream segment is acquired
immediately (wormhole-style cut-through).
- Two transactions sharing a segment (same row or column band) contend.
Concurrency:
- _worker spawns an independent SimPy process per transaction, so the
NOC is never serialized at the node level — only at segment resources.
"""
def __init__(self, node: Node, ctx: ComponentContext | None = None) -> None:
super().__init__(node, ctx)
self._env: simpy.Environment | None = None
self._links: dict[tuple, simpy.Resource] = {}
self._x_grid: list[float] = []
self._y_grid: list[float] = []
def start(self, env: simpy.Environment) -> None:
self._env = env
self._build_grid()
super().start(env)
def run(self, env: simpy.Environment, nbytes: int) -> Generator:
yield env.timeout(0)
# ── Grid construction ────────────────────────────────────────────
def _build_grid(self) -> None:
if not self.ctx:
return
mesh = self.ctx.spec.get("_mesh") if self.ctx.spec else None
if mesh:
self._build_grid_from_mesh(mesh)
else:
self._build_grid_from_positions()
def _build_grid_from_mesh(self, mesh: dict) -> None:
"""Build XY grid from cube_mesh.yaml router positions (authoritative)."""
origin_x, origin_y = self._cube_origin()
xs: set[float] = set()
ys: set[float] = set()
for key, router in mesh.get("routers", {}).items():
if router is not None:
xs.add(round(origin_x + router["pos_mm"][0], 2))
ys.add(round(origin_y + router["pos_mm"][1], 2))
self._x_grid = sorted(xs)
self._y_grid = sorted(ys)
def _build_grid_from_positions(self) -> None:
"""Fallback: infer grid from all node positions in the cube."""
cube_prefix = self.node.id.rsplit(".", 1)[0]
xs: set[float] = set()
ys: set[float] = set()
for node_id, pos in self.ctx.positions.items():
if node_id.startswith(cube_prefix + ".") and pos is not None:
xs.add(round(pos[0], 2))
ys.add(round(pos[1], 2))
self._x_grid = sorted(xs)
self._y_grid = sorted(ys)
def _cube_origin(self) -> tuple[float, float]:
"""Compute absolute origin (top-left) of this cube from cube_id."""
parts = self.node.id.split(".")
cube_str = [p for p in parts if p.startswith("cube")][0]
cube_id = int(cube_str[4:])
spec = self.ctx.spec
sip_spec = spec.get("sip", {})
cube_spec = spec.get("cube", {})
mesh_w = sip_spec.get("cube_mesh", {}).get("w", 4)
cube_w = cube_spec.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
cube_h = cube_spec.get("geometry", {}).get("cube_mm", {}).get("h", 14.0)
seam = sip_spec.get("links", {}).get("inter_cube_mesh", {}).get(
"distance_mm_across_seam", 1.0)
col = cube_id % mesh_w
row = cube_id // mesh_w
return (col * (cube_w + seam), row * (cube_h + seam))
def _get_link(self, key: tuple) -> simpy.Resource:
if key not in self._links:
assert self._env is not None
self._links[key] = simpy.Resource(self._env, capacity=1)
return self._links[key]
# ── Worker ───────────────────────────────────────────────────────
def _worker(self, env: simpy.Environment) -> Generator:
while True:
txn: Any = yield self._inbox.get()
env.process(self._route(env, txn))
def _route(self, env: simpy.Environment, txn: Any) -> Generator:
prev_hop = txn.path[txn.step - 1] if txn.step > 0 else None
next_hop = txn.next_hop
overhead_ns = float(self.node.attrs.get("overhead_ns", 0.0))
links: list[tuple[tuple, float]] = []
if prev_hop and next_hop and self.ctx:
src_pos = self.ctx.positions.get(prev_hop)
dst_pos = self.ctx.positions.get(next_hop)
if src_pos and dst_pos:
links = self._xy_links(src_pos, dst_pos)
if links:
yield from self._traverse(env, links, overhead_ns)
else:
yield env.timeout(overhead_ns)
if next_hop:
yield self.out_ports[next_hop].put(txn.advance())
else:
drain = getattr(txn, "drain_ns", 0.0)
if drain > 0:
yield env.timeout(drain)
txn.done.succeed()
# ── XY routing and pipelined link traversal ──────────────────────
def _traverse(
self,
env: simpy.Environment,
links: list[tuple[tuple, float]],
overhead_ns: float,
) -> Generator:
"""Pipeline: request next segment before current timeout finishes."""
ns_per_mm = self.ctx.ns_per_mm # type: ignore[union-attr]
# Acquire first link
first_key, _ = links[0]
current_resource = self._get_link(first_key)
current_req = current_resource.request()
yield current_req
for i, (_, dist_mm) in enumerate(links):
# Request next link before current timeout (pipeline)
if i + 1 < len(links):
next_key, _ = links[i + 1]
next_resource = self._get_link(next_key)
next_req = next_resource.request()
yield env.timeout(dist_mm * ns_per_mm + (overhead_ns if i == 0 else 0.0))
current_resource.release(current_req)
if i + 1 < len(links):
yield next_req # usually already fulfilled (pipeline)
current_resource = next_resource
current_req = next_req
def _xy_links(
self,
src: tuple[float, float],
dst: tuple[float, float],
) -> list[tuple[tuple, float]]:
"""XY routing: horizontal segment first, then vertical.
Returns list of (link_key, dist_mm) pairs, where link_key uniquely
identifies a directed segment shared across concurrent transactions.
"""
x0, y0 = src
x1, y1 = dst
links: list[tuple[tuple, float]] = []
# Horizontal segment at y≈y0
if abs(x0 - x1) > 1e-9:
y_band = self._snap(y0, self._y_grid)
for xa, xb in self._segments(x0, x1, self._x_grid):
d = abs(xb - xa)
if d > 1e-9:
lo, hi = (xa, xb) if xa < xb else (xb, xa)
dir_h = "E" if xb > xa else "W"
links.append((("H", round(y_band, 2), round(lo, 2), round(hi, 2), dir_h), d))
# Vertical segment at x≈x1
if abs(y0 - y1) > 1e-9:
x_band = self._snap(x1, self._x_grid)
for ya, yb in self._segments(y0, y1, self._y_grid):
d = abs(yb - ya)
if d > 1e-9:
lo, hi = (ya, yb) if ya < yb else (yb, ya)
dir_v = "S" if yb > ya else "N"
links.append((("V", round(x_band, 2), round(lo, 2), round(hi, 2), dir_v), d))
return links
@staticmethod
def _snap(val: float, grid: list[float]) -> float:
if not grid:
return val
return min(grid, key=lambda g: abs(g - val))
@staticmethod
def _segments(a: float, b: float, grid: list[float]) -> list[tuple[float, float]]:
"""Consecutive (p_i, p_{i+1}) pairs covering range [a, b] using grid waypoints."""
if abs(a - b) < 1e-9:
return []
lo, hi = (a, b) if a < b else (b, a)
pts = [lo] + [g for g in grid if lo + 1e-9 < g < hi - 1e-9] + [hi]
pairs = [(pts[i], pts[i + 1]) for i in range(len(pts) - 1)]
if a > b:
pairs = [(p2, p1) for p1, p2 in reversed(pairs)]
return pairs
+1 -1
View File
@@ -96,7 +96,7 @@ class PeDmaComponent(PeEngineBase):
request=sub_request, path=path, step=0,
nbytes=cmd.nbytes, done=sub_done, drain_ns=drain_ns,
)
# Send to next hop (path[0] is pe_dma itself, path[1] is xbar)
# Send to next hop (path[0] is pe_dma itself, path[1] is router)
if len(path) > 1:
yield self.out_ports[path[1]].put(sub_txn.advance())
# DMA channel released after issue
-168
View File
@@ -1,168 +0,0 @@
"""Position-aware XBAR component.
Models crossbar latency as base_overhead_ns + internal_distance * ns_per_mm,
where internal_distance is the Manhattan distance between the entry port
(PE router attachment) and exit port (HBM slice logical position) within
the crossbar matrix.
PE router positions come from cube_mesh.yaml (via ctx.spec["_mesh"]).
HBM slice positions are uniformly distributed across the HBM physical width.
"""
from __future__ import annotations
from collections.abc import Generator
from typing import TYPE_CHECKING, Any
import simpy
from kernbench.components.base import ComponentBase
if TYPE_CHECKING:
from kernbench.components.context import ComponentContext
from kernbench.topology.types import Node
class PositionAwareXbarComponent(ComponentBase):
"""XBAR with position-dependent latency based on PE-to-slice distance.
Latency = base_overhead_ns + |entry_port_x - exit_port_x| * ns_per_mm
Entry/exit port X positions are determined from the transaction path:
- PE_DMA nodes: router X from cube_mesh.yaml
- HBM slices: uniformly distributed across HBM physical width
- Bridge nodes: physical X from topology positions
- NOC: resolved by scanning path for PE_DMA node
"""
def __init__(self, node: Node, ctx: ComponentContext | None = None) -> None:
super().__init__(node, ctx)
self._base_overhead_ns = float(node.attrs.get("overhead_ns", 0.0))
self._pe_router_xs: dict[str, float] = {}
self._slice_xs: dict[str, float] = {}
self._bridge_xs: dict[str, float] = {}
self._ns_per_mm: float = 0.0
def start(self, env: simpy.Environment) -> None:
self._build_position_map()
super().start(env)
def run(self, env: simpy.Environment, nbytes: int) -> Generator:
yield env.timeout(self._base_overhead_ns)
# ── Position map construction ─────────────────────────────────
def _build_position_map(self) -> None:
if not self.ctx or not self.ctx.spec:
return
mesh = self.ctx.spec.get("_mesh")
if not mesh:
return
self._ns_per_mm = self.ctx.ns_per_mm
cube_prefix = self.node.id.rsplit(".", 1)[0]
xbar_name = self.node.id.rsplit(".", 1)[1]
is_top = xbar_name == "xbar_top"
xbar_key = "top" if is_top else "bottom"
# PE router X positions from mesh attachments
routers_list = mesh.get("xbar", {}).get(xbar_key, {}).get("routers", [])
for router_id in routers_list:
router_data = mesh["routers"].get(router_id)
if router_data is None:
continue
router_x = router_data["pos_mm"][0]
for attach in router_data.get("attach", []):
if attach.endswith(".dma"):
pe_name = attach.split(".")[0]
pe_dma_id = f"{cube_prefix}.{pe_name}.pe_dma"
self._pe_router_xs[pe_dma_id] = router_x
# HBM slice X positions: uniformly distributed across HBM width
cube_spec = self.ctx.spec.get("cube", {})
cube_w = cube_spec.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
hbm_w = cube_spec.get("geometry", {}).get("hbm_mm", {}).get("w", 9.0)
n_slices = cube_spec.get("memory_map", {}).get("hbm_slices_per_cube", 8)
half = n_slices // 2
hbm_left = (cube_w - hbm_w) / 2
if is_top:
slice_range = range(half)
else:
slice_range = range(half, n_slices)
n = len(list(slice_range))
for i, sl in enumerate(slice_range):
if n > 1:
x = hbm_left + i * hbm_w / (n - 1)
else:
x = cube_w / 2
self._slice_xs[f"{cube_prefix}.hbm_ctrl.slice{sl}"] = x
# Bridge X positions from topology positions
for node_id, pos in self.ctx.positions.items():
if node_id.startswith(cube_prefix + ".bridge.") and pos is not None:
origin_x = self._cube_origin_x()
self._bridge_xs[node_id] = pos[0] - origin_x
def _cube_origin_x(self) -> float:
"""Compute absolute X origin of this cube."""
parts = self.node.id.split(".")
cube_str = [p for p in parts if p.startswith("cube")][0]
cube_id = int(cube_str[4:])
spec = self.ctx.spec
sip_spec = spec.get("sip", {})
cube_spec = spec.get("cube", {})
mesh_w = sip_spec.get("cube_mesh", {}).get("w", 4)
cube_w = cube_spec.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
seam = sip_spec.get("links", {}).get("inter_cube_mesh", {}).get(
"distance_mm_across_seam", 1.0)
col = cube_id % mesh_w
return col * (cube_w + seam)
# ── Worker override ───────────────────────────────────────────
def _worker(self, env: simpy.Environment) -> Generator:
while True:
txn: Any = yield self._inbox.get()
env.process(self._position_aware_forward(env, txn))
def _position_aware_forward(
self, env: simpy.Environment, txn: Any,
) -> Generator:
prev_hop = txn.path[txn.step - 1] if txn.step > 0 else None
next_hop = txn.next_hop
overhead = self._base_overhead_ns
if prev_hop and next_hop and self._ns_per_mm > 0:
entry_x = self._get_port_x(prev_hop, txn.path)
exit_x = self._get_port_x(next_hop, txn.path)
if entry_x is not None and exit_x is not None:
overhead = self._base_overhead_ns + abs(entry_x - exit_x) * self._ns_per_mm
yield env.timeout(overhead)
if next_hop:
yield self.out_ports[next_hop].put(txn.advance())
else:
drain = getattr(txn, "drain_ns", 0.0)
if drain > 0:
yield env.timeout(drain)
txn.done.succeed()
def _get_port_x(self, node_id: str, path: list[str]) -> float | None:
"""Resolve the X position of an XBAR port from node context."""
# Direct lookup: PE DMA
if node_id in self._pe_router_xs:
return self._pe_router_xs[node_id]
# Direct lookup: HBM slice
if node_id in self._slice_xs:
return self._slice_xs[node_id]
# Direct lookup: bridge
if node_id in self._bridge_xs:
return self._bridge_xs[node_id]
# NOC: scan path for PE DMA node
if "noc" in node_id:
for p in path:
if p in self._pe_router_xs:
return self._pe_router_xs[p]
return None
+18 -22
View File
@@ -22,8 +22,6 @@ class AddressResolver:
def __init__(self, graph: TopologyGraph) -> None:
self._node_ids = set(graph.nodes)
mm = graph.spec["cube"]["memory_map"]
self._slice_size_bytes = mm["hbm_total_gb_per_cube"] * (1 << 30) // mm["hbm_slices_per_cube"]
# ── Physical-address resolution ──────────────────────────────────
@@ -31,8 +29,7 @@ class AddressResolver:
s = addr.sip_id
c = addr.cube_id
if addr.kind == "hbm":
pe_slice = PhysAddr.hbm_pe_id(addr.hbm_offset, self._slice_size_bytes)
node_id = f"sip{s}.cube{c}.hbm_ctrl.slice{pe_slice}"
node_id = f"sip{s}.cube{c}.hbm_ctrl"
elif addr.kind == "pe_resource":
if addr.unit_type == UnitType.PE:
node_id = f"sip{s}.cube{c}.pe{addr.pe_id}.pe_tcm"
@@ -84,12 +81,17 @@ class PathRouter:
# Edge kinds excluded from M_CPU DMA adjacency: prevents routing through
# PE-internal pipeline nodes when computing DMA paths.
_MCPU_DMA_EXCLUDE = {"pe_internal", "pe_to_xbar"}
_MCPU_DMA_EXCLUDE = {"pe_internal", "pe_to_router"}
_UCIE_KINDS = {"ucie_internal", "ucie_conn_to_router", "router_to_ucie_conn",
"ucie_conn_to_noc", "noc_to_ucie_conn", "ucie_mesh",
"io_to_cube", "cube_to_io"}
def __init__(self, graph: TopologyGraph) -> None:
self._adj: dict[str, list[tuple[str, float]]] = defaultdict(list)
self._adj_all: dict[str, list[tuple[str, float]]] = defaultdict(list)
self._adj_mcpu_dma: dict[str, list[tuple[str, float]]] = defaultdict(list)
self._adj_local: dict[str, list[tuple[str, float]]] = defaultdict(list)
for e in graph.edges:
w = e.routing_weight_mm if e.routing_weight_mm is not None else e.distance_mm
self._adj_all[e.src].append((e.dst, w))
@@ -97,6 +99,8 @@ class PathRouter:
self._adj[e.src].append((e.dst, w))
if e.kind not in self._MCPU_DMA_EXCLUDE:
self._adj_mcpu_dma[e.src].append((e.dst, w))
if e.kind not in self._UCIE_KINDS:
self._adj_local[e.src].append((e.dst, w))
def find_path(self, src_pe: str, dst_node: str) -> list[str]:
"""PE DMA routing: prepends .pe_dma, excludes command edges."""
@@ -107,30 +111,22 @@ class PathRouter:
start = f"{src_pe}.pe_dma"
return self._run_dijkstra_with_dist(self._adj, start, dst_node)
def find_mcpu_dma_path(self, m_cpu_id: str, dst_hbm_slice_id: str) -> list[str]:
"""M_CPU DMA path: never routes through PE-internal nodes (ADR-0015 D5).
def find_mcpu_dma_path(self, m_cpu_id: str, dst_hbm_id: str) -> list[str]:
"""M_CPU DMA path: routes through router mesh (ADR-0019).
Same-cube: deterministic [m_cpu, noc, xbar_top/bot, hbm_ctrl.slice_i].
Cross-cube: Dijkstra via _adj_mcpu_dma (pe_internal/pe_to_xbar excluded)
routes through NOC UCIe target cube NOC xbar HBM.
Same-cube: uses _adj_local (no UCIe) to stay within mesh.
Cross-cube: uses _adj_all to route via UCIe.
"""
m_cube = ".".join(m_cpu_id.split(".")[:2])
d_cube = ".".join(dst_hbm_slice_id.split(".")[:2])
d_cube = ".".join(dst_hbm_id.split(".")[:2])
if m_cube == d_cube:
slice_idx = int(dst_hbm_slice_id.rsplit("slice", 1)[1])
xbar = "xbar_top" if slice_idx < 4 else "xbar_bot"
return [
m_cpu_id,
f"{m_cube}.noc",
f"{m_cube}.{xbar}",
dst_hbm_slice_id,
]
return self._run_dijkstra(self._adj_mcpu_dma, m_cpu_id, dst_hbm_slice_id)
return self._run_dijkstra(self._adj_local, m_cpu_id, dst_hbm_id)
return self._run_dijkstra(self._adj_all, m_cpu_id, dst_hbm_id)
def find_memory_path(self, src: str, dst: str) -> list[str]:
"""Direct memory path: pcie_ep → io_noc → cube → xbar → hbm_ctrl.
"""Direct memory path: pcie_ep → io_noc → cube → router mesh → hbm_ctrl.
Uses _adj_mcpu_dma which excludes pe_internal and pe_to_xbar edges,
Uses _adj_mcpu_dma which excludes pe_internal and pe_to_router edges,
preventing routing through PE pipeline nodes.
"""
return self._run_dijkstra(self._adj_mcpu_dma, src, dst)
+14 -3
View File
@@ -173,7 +173,7 @@ class RuntimeContext:
pe_comps = pe_template.get("components", {})
tcm_cfg = pe_comps.get("pe_tcm", {}).get("attrs", {})
sip_count = system.get("sips", {}).get("count", 1)
total_sip_count = system.get("sips", {}).get("count", 1)
cubes_per_sip = system.get("sips", {}).get("cubes_per_sip", 16)
pes_per_cube = (
cube.get("pe_layout", {}).get("pe_per_corner", 2)
@@ -183,6 +183,17 @@ class RuntimeContext:
hbm_slices = mm.get("hbm_slices_per_cube", 8)
tcm_mb = tcm_cfg.get("size_mb", 16)
# Scope to target_device: single SIP or all SIPs
from kernbench.runtime_api.types import DeviceSelector, resolve_device
td = self.target_device if isinstance(self.target_device, DeviceSelector) else resolve_device(str(self.target_device))
if td.is_all:
sip_range = range(total_sip_count)
sip_count = total_sip_count
else:
sip_idx = td.sip_index
sip_range = range(sip_idx, sip_idx + 1)
sip_count = 1
cfg = AddressConfig(
sip_count=sip_count,
cubes_per_sip=cubes_per_sip,
@@ -193,13 +204,13 @@ class RuntimeContext:
tcm_scheduler_reserved_bytes=4 * (1 << 20),
sram_bytes_per_cube=32 * (1 << 20),
)
# Create allocators for all SIPs × cubes × PEs
# Create allocators scoped to target SIP(s) only
# Flat index: sip_id * cubes_per_sip * pes_per_cube + cube_id * pes_per_cube + pe_id
self._pes_per_cube = pes_per_cube
self._num_cubes = cubes_per_sip
self._num_sips = sip_count
cubes_x_pes = cubes_per_sip * pes_per_cube
for sip_id in range(sip_count):
for sip_id in sip_range:
for cube_id in range(cubes_per_sip):
for pe_id in range(pes_per_cube):
flat_idx = sip_id * cubes_x_pes + cube_id * pes_per_cube + pe_id
+3 -2
View File
@@ -41,7 +41,7 @@ class DeviceSelector:
def sip_index(self) -> int:
if self.is_all:
raise ValueError("DeviceSelector is 'all'; no single sip_index.")
m = re.fullmatch(r"sip:(\d+)", self.raw)
m = re.fullmatch(r"sip:?(\d+)", self.raw)
if not m:
raise ValueError(
f"Invalid device '{self.raw}'. Expected 'all' or 'sip:<N>' (e.g., sip:0)."
@@ -64,8 +64,9 @@ def resolve_device(raw: str | None) -> DeviceSelector:
if raw == "all":
return DeviceSelector(raw="all")
m = re.fullmatch(r"sip:(\d+)", raw)
m = re.fullmatch(r"sip:?(\d+)", raw)
if not m:
raise ValueError(f"Invalid device '{raw}'. Expected 'all' or 'sip:<N>' (e.g., sip:0).")
raw = f"sip:{m.group(1)}" # normalize to sip:N format
return DeviceSelector(raw=raw)
+3 -3
View File
@@ -19,9 +19,9 @@ class GraphEngine:
"""simpy-based discrete-event simulation engine.
Request routing:
MemoryWrite/Read: pcie_ep io_noc cube xbar hbm_ctrl (m_cpu bypass)
MemoryWrite/Read: pcie_ep io_noc cube router mesh hbm_ctrl (m_cpu bypass)
KernelLaunch: pcie_ep io_noc io_cpu io_noc cube m_cpu PE
PeDmaMsg: pe_dma xbar hbm_ctrl (direct probe)
PeDmaMsg: pe_dma router mesh hbm_ctrl (direct probe)
Component implementations are DI-injectable via component_overrides (ADR-0007 D3).
"""
@@ -261,7 +261,7 @@ class GraphEngine:
done.succeed()
def _process_memory_direct(self, key: str, request: Any, done: simpy.Event):
"""Direct memory path: pcie_ep → io_noc → cube → xbar → hbm_ctrl.
"""Direct memory path: pcie_ep → io_noc → cube → router mesh → hbm_ctrl.
MemoryWrite: data flows forward (nbytes on wires), drain at hbm_ctrl terminal.
MemoryRead: command flows forward (nbytes=0), hbm_ctrl sends data back on
+3 -3
View File
@@ -287,7 +287,7 @@ def _generate_probe_d2h(graph, edge_map) -> list[dict]:
def _generate_probe_pe_dma(graph, edge_map) -> list[dict]:
"""PE DMA probes: pe_dma → xbar → HBM."""
"""PE DMA probes: pe_dma → router mesh → HBM."""
from kernbench.policy.address.phyaddr import PhysAddr
from kernbench.policy.routing.router import AddressResolver, PathRouter
@@ -399,7 +399,7 @@ def _generate_bench_qkv_gemm(graph, edge_map) -> list[dict]:
# Find pe0 → HBM path
pe_ref = "sip0.cube0.pe0"
try:
dma_path = router.find_path(pe_ref, f"sip0.cube0.hbm_ctrl.slice0")
dma_path = router.find_path(pe_ref, f"sip0.cube0.hbm_ctrl")
except Exception:
dma_path = [pe_ref]
@@ -433,7 +433,7 @@ def _generate_bench_qkv_gemm(graph, edge_map) -> list[dict]:
# DMA write result back
t += bw_ns
ev(t, type="process", request_id=rid,
component="sip0.cube0.hbm_ctrl.slice0",
component="sip0.cube0.hbm_ctrl",
latency_ns=round(bw_ns, 3), metadata={"op": "write", "cmd": "dma_write_out"})
ev(t, type="complete", request_id=rid,
+318 -328
View File
@@ -155,12 +155,7 @@ def _cube_local_positions(cube_w: float, cube_h: float) -> dict[str, tuple[float
"ucie-W": (uw, cy),
"ucie-E": (cube_w - uw, cy),
"m_cpu": (cube_w - 2.5, cy - 1.5),
"xbar_top": (cx, 3.5),
"hbm_ctrl": (cx - 2.0, cy),
"xbar_bot": (cx, cube_h - 3.5),
"bridge.left": (2.5, cy + 2.0),
"bridge.right": (cube_w - 2.5, cy + 2.0),
"noc": (cx + 2.0, cy),
"sram": (2.5, cy - 1.5),
}
@@ -359,16 +354,21 @@ def _instantiate_cube(
) -> None:
"""Add all cube-internal nodes and edges, including PE instances.
Topology: PE_DMA NOC xbar_top/bot HBM_CTRL.
No per-PE xbar nodes; position-aware XBAR top/bottom replaces chaining.
Topology: explicit router mesh from cube_mesh.yaml (ADR-0019).
Each router is a separate SimPy node. Components attach to routers
based on cube_mesh.yaml attachment lists.
"""
cube_w = cube["geometry"]["cube_mm"]["w"]
cube_h = cube["geometry"]["cube_mm"]["h"]
ox, oy = origin
local_pos = _cube_local_positions(cube_w, cube_h)
clinks = cube["links"]
n_slices = cube["memory_map"]["hbm_slices_per_cube"]
half = n_slices // 2
mm = cube["memory_map"]
# ── Mode branch (ADR-0019) ──
mode = mm.get("hbm_mapping_mode", "n_to_one")
if mode == "one_to_one":
raise NotImplementedError("1:1 mode: ADR-0019 D3")
# ── UCIe ports + connection nodes ──
ucie_cfg = cube["ucie"]
@@ -391,8 +391,8 @@ def _instantiate_cube(
label=f"UCIe-{port} C{ci}",
)
# ── Named components: noc, m_cpu, sram ──
for name in ("noc", "m_cpu", "sram"):
# ── Named components: m_cpu, sram (noc is now explicit routers) ──
for name in ("m_cpu", "sram"):
c = cube["components"][name]
nid = f"{cp}.{name}"
lx, ly = local_pos[name]
@@ -402,49 +402,96 @@ def _instantiate_cube(
label=name.upper().replace("_", " "),
)
# ── xbar_top and xbar_bot (position-aware XBAR) ──
xbar_spec = cube["components"]["xbar"]
for xbar_name, xbar_cfg in [("xbar_top", xbar_spec["top"]),
("xbar_bot", xbar_spec["bottom"])]:
nid = f"{cp}.{xbar_name}"
lx, ly = local_pos[xbar_name]
nodes[nid] = Node(
id=nid, kind=xbar_cfg["kind"], impl=xbar_cfg["impl"],
attrs=xbar_cfg["attrs"], pos_mm=(ox + lx, oy + ly),
label=xbar_name.upper().replace("_", " "),
)
# ── HBM controller slices ──
# ── HBM controller (single node, ADR-0019 D1) ──
hbm_spec = cube["components"]["hbm_ctrl"]
hbm_lx, hbm_ly = local_pos["hbm_ctrl"]
for sl in range(n_slices):
sid = f"{cp}.hbm_ctrl.slice{sl}"
nodes[sid] = Node(
id=sid, kind=hbm_spec["kind"], impl=hbm_spec["impl"],
attrs=hbm_spec["attrs"], pos_mm=(ox + hbm_lx, oy + hbm_ly),
label=f"HBM SLICE{sl}",
hbm_id = f"{cp}.hbm_ctrl"
nodes[hbm_id] = Node(
id=hbm_id, kind=hbm_spec["kind"], impl=hbm_spec["impl"],
attrs=hbm_spec["attrs"], pos_mm=(ox + hbm_lx, oy + hbm_ly),
label="HBM CTRL",
)
# ── Router mesh from cube_mesh.yaml (ADR-0019 D3) ──
routers = mesh_data["routers"]
router_spec = cube["components"]["noc_router"]
router_bw = clinks.get("router_link_bw_gbs", 256.0)
pe_to_router_bw = clinks.get("pe_to_router_bw_gbs", 256.0)
hbm_eff = float(hbm_spec.get("attrs", {}).get("efficiency", 1.0))
hbm_to_router_bw = clinks.get("hbm_to_router_bw_gbs", 256.0) * hbm_eff
sram_to_router_bw = clinks.get("sram_to_router_bw_gbs", 128.0)
ucie_conn_bw = ucie_cfg.get("per_connection_bw_gbs", 128.0)
n_rows = mesh_data["mesh"]["rows"]
n_cols = mesh_data["mesh"]["cols"]
# Create router nodes
for rkey, rval in routers.items():
if rval is None:
continue
rid = f"{cp}.{rkey}"
rx, ry = rval["pos_mm"]
nodes[rid] = Node(
id=rid, kind=router_spec["kind"], impl=router_spec["impl"],
attrs=router_spec["attrs"], pos_mm=(ox + rx, oy + ry),
label=rkey.upper(),
)
# ── Bridges ──
for br in xbar_spec["bridges"]:
bname = br["id"]
nid = f"{cp}.bridge.{bname}"
lx, ly = local_pos[f"bridge.{bname}"]
nodes[nid] = Node(
id=nid, kind=br["kind"], impl=br["impl"],
attrs=br["attrs"], pos_mm=(ox + lx, oy + ly),
label=f"Bridge {bname.upper()}",
)
# Router ↔ router XY mesh edges (adjacent non-null routers)
for r in range(n_rows):
for c in range(n_cols):
rkey = f"r{r}c{c}"
if routers.get(rkey) is None:
continue
src_id = f"{cp}.{rkey}"
src_pos = routers[rkey]["pos_mm"]
# ── PE instances (no per-PE xbar nodes) ──
# Horizontal neighbor (same row, next col)
for nc in range(c + 1, n_cols):
nkey = f"r{r}c{nc}"
if routers.get(nkey) is None:
continue
dst_id = f"{cp}.{nkey}"
dst_pos = routers[nkey]["pos_mm"]
dist = abs(dst_pos[0] - src_pos[0])
edges.append(Edge(
src=src_id, dst=dst_id,
distance_mm=round(dist, 2), bw_gbs=router_bw,
kind="router_mesh",
))
edges.append(Edge(
src=dst_id, dst=src_id,
distance_mm=round(dist, 2), bw_gbs=router_bw,
kind="router_mesh",
))
break # only immediate neighbor
# Vertical neighbor (same col, next row)
for nr in range(r + 1, n_rows):
nkey = f"r{nr}c{c}"
if routers.get(nkey) is None:
continue
dst_id = f"{cp}.{nkey}"
dst_pos = routers[nkey]["pos_mm"]
dist = abs(dst_pos[1] - src_pos[1])
edges.append(Edge(
src=src_id, dst=dst_id,
distance_mm=round(dist, 2), bw_gbs=router_bw,
kind="router_mesh",
))
edges.append(Edge(
src=dst_id, dst=src_id,
distance_mm=round(dist, 2), bw_gbs=router_bw,
kind="router_mesh",
))
break # only immediate neighbor
# ── PE instances ──
corners = cube["pe_layout"]["corners"]
pe_per_corner = cube["pe_layout"]["pe_per_corner"]
corner_pos = _corner_pe_positions(cube_w, cube_h)
pe_tmpl = cube["pe_template"]
pe_links = pe_tmpl["links"]
pe_noc_distances = _compute_pe_noc_distances(
mesh_data, corner_pos, corners, pe_per_corner,
)
pe_idx = 0
for corner in corners:
@@ -465,166 +512,129 @@ def _instantiate_cube(
# PE-internal edges
_add_pe_internal_edges(edges, pp, pe_links)
# PE_DMA → noc (distance auto-computed from PE physical position)
edges.append(Edge(
src=f"{pp}.pe_dma", dst=f"{cp}.noc",
distance_mm=pe_noc_distances.get(pe_idx, 0.0),
bw_gbs=clinks["pe_dma_to_noc_bw_gbs"],
kind="pe_to_noc",
))
# noc → PE_DMA (response delivery, reverse of pe_to_noc)
edges.append(Edge(
src=f"{cp}.noc", dst=f"{pp}.pe_dma",
distance_mm=pe_noc_distances.get(pe_idx, 0.0),
bw_gbs=clinks["pe_dma_to_noc_bw_gbs"],
kind="noc_to_pe",
))
# noc → PE_CPU (command delivery)
edges.append(Edge(
src=f"{cp}.noc", dst=f"{pp}.pe_cpu",
distance_mm=clinks["noc_to_pe_cpu_mm"],
kind="command",
))
# PE_CPU → noc (response delivery, reverse of command)
edges.append(Edge(
src=f"{pp}.pe_cpu", dst=f"{cp}.noc",
distance_mm=clinks["noc_to_pe_cpu_mm"],
kind="pe_response",
))
# noc → PE_MMU (MMU mapping install)
pe_mmu_id = f"{pp}.pe_mmu"
if pe_mmu_id in nodes:
edges.append(Edge(
src=f"{cp}.noc", dst=pe_mmu_id,
distance_mm=clinks.get("noc_to_pe_mmu_mm", 0.0),
kind="command",
))
pe_idx += 1
# ── xbar_top/bot → HBM slices ──
hbm_eff = float(hbm_spec.get("attrs", {}).get("efficiency", 1.0))
hbm_bw = clinks["xbar_to_hbm_bw_gbs"] * hbm_eff
for i in range(half):
edges.append(Edge(
src=f"{cp}.xbar_top", dst=f"{cp}.hbm_ctrl.slice{i}",
distance_mm=clinks["xbar_to_hbm_mm"],
bw_gbs=hbm_bw,
kind="xbar_to_hbm",
))
edges.append(Edge(
src=f"{cp}.hbm_ctrl.slice{i}", dst=f"{cp}.xbar_top",
distance_mm=clinks["xbar_to_hbm_mm"],
bw_gbs=hbm_bw,
kind="hbm_to_xbar",
))
for i in range(half, n_slices):
edges.append(Edge(
src=f"{cp}.xbar_bot", dst=f"{cp}.hbm_ctrl.slice{i}",
distance_mm=clinks["xbar_to_hbm_mm"],
bw_gbs=hbm_bw,
kind="xbar_to_hbm",
))
edges.append(Edge(
src=f"{cp}.hbm_ctrl.slice{i}", dst=f"{cp}.xbar_bot",
distance_mm=clinks["xbar_to_hbm_mm"],
bw_gbs=hbm_bw,
kind="hbm_to_xbar",
))
# ── Component ↔ router edges (based on cube_mesh.yaml attach) ──
for rkey, rval in routers.items():
if rval is None:
continue
rid = f"{cp}.{rkey}"
for item in rval.get("attach", []):
if item.endswith(".dma"):
# PE_DMA ↔ router
pe_prefix = item.rsplit(".", 1)[0]
dma_id = f"{cp}.{pe_prefix}.pe_dma"
if dma_id in nodes:
edges.append(Edge(
src=dma_id, dst=rid,
distance_mm=0.0, bw_gbs=pe_to_router_bw,
kind="pe_to_router",
))
edges.append(Edge(
src=rid, dst=dma_id,
distance_mm=0.0, bw_gbs=pe_to_router_bw,
kind="router_to_pe",
))
elif item.endswith(".cpu"):
# PE_CPU ↔ router (command path)
pe_prefix = item.rsplit(".", 1)[0]
cpu_id = f"{cp}.{pe_prefix}.pe_cpu"
if cpu_id in nodes:
edges.append(Edge(
src=rid, dst=cpu_id,
distance_mm=clinks.get("noc_to_pe_cpu_mm", 0.0),
kind="command",
))
edges.append(Edge(
src=cpu_id, dst=rid,
distance_mm=clinks.get("noc_to_pe_cpu_mm", 0.0),
kind="pe_response",
))
# PE_MMU ↔ router (mapping install path)
mmu_id = f"{cp}.{pe_prefix}.pe_mmu"
if mmu_id in nodes:
edges.append(Edge(
src=rid, dst=mmu_id,
distance_mm=0.0,
kind="command",
))
elif item.endswith(".hbm"):
pass # HBM edges handled below (all routers)
elif item == "m_cpu":
# M_CPU ↔ router
mcpu_id = f"{cp}.m_cpu"
edges.append(Edge(
src=mcpu_id, dst=rid,
distance_mm=clinks.get("m_cpu_to_router_mm", 0.0),
kind="command",
))
edges.append(Edge(
src=rid, dst=mcpu_id,
distance_mm=clinks.get("m_cpu_to_router_mm", 0.0),
kind="command",
))
elif item == "sram":
# SRAM ↔ router
sram_id = f"{cp}.sram"
edges.append(Edge(
src=sram_id, dst=rid,
distance_mm=0.0, bw_gbs=sram_to_router_bw,
kind="sram_to_router",
))
edges.append(Edge(
src=rid, dst=sram_id,
distance_mm=0.0, bw_gbs=sram_to_router_bw,
kind="router_to_sram",
))
elif item.startswith("ucie_"):
# UCIe conn ↔ router
# item format: "ucie_{dir}.c{i}" e.g. "ucie_n.c0"
parts = item.split(".")
direction = parts[0].replace("ucie_", "").upper()
conn_num = parts[1].replace("c", "") # "0", "1", etc.
conn_id = f"{cp}.ucie-{direction}.conn{conn_num}"
ucie_id = f"{cp}.ucie-{direction}"
# conn ↔ ucie port
if conn_id in nodes:
edges.append(Edge(
src=ucie_id, dst=conn_id,
distance_mm=0.0, kind="ucie_internal",
))
edges.append(Edge(
src=conn_id, dst=ucie_id,
distance_mm=0.0, kind="ucie_internal",
))
# conn ↔ router
edges.append(Edge(
src=conn_id, dst=rid,
distance_mm=0.0, bw_gbs=ucie_conn_bw,
kind="ucie_conn_to_router",
))
edges.append(Edge(
src=rid, dst=conn_id,
distance_mm=0.0, bw_gbs=ucie_conn_bw,
kind="router_to_ucie_conn",
))
# ── NOC ↔ xbar_top/bot ──
# xbar_top: primary (low routing weight), xbar_bot: secondary (high routing weight
# steers Dijkstra through xbar_top→bridge→xbar_bot for cross-half access)
noc_xbar_bw = clinks.get("noc_to_xbar_bw_gbs", 256.0)
noc_xbar_mm = clinks.get("noc_to_xbar_mm", 0.0)
for xbar_name, rw in [("xbar_top", None), ("xbar_bot", 100.0)]:
# ── HBM_CTRL ↔ all routers (ADR-0019 D1) ──
# High routing weight prevents Dijkstra from using HBM as transit shortcut
for rkey, rval in routers.items():
if rval is None:
continue
rid = f"{cp}.{rkey}"
edges.append(Edge(
src=f"{cp}.noc", dst=f"{cp}.{xbar_name}",
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
routing_weight_mm=rw, kind="noc_to_xbar",
src=rid, dst=hbm_id,
distance_mm=0.0, bw_gbs=hbm_to_router_bw,
routing_weight_mm=1000.0,
kind="router_to_hbm",
))
edges.append(Edge(
src=f"{cp}.{xbar_name}", dst=f"{cp}.noc",
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
routing_weight_mm=rw, kind="xbar_to_noc",
src=hbm_id, dst=rid,
distance_mm=0.0, bw_gbs=hbm_to_router_bw,
routing_weight_mm=1000.0,
kind="hbm_to_router",
))
# ── Bridge connections: xbar_top ↔ bridge ↔ xbar_bot ──
bridge_mm = clinks.get("xbar_to_bridge_mm", 3.0)
bridge_bw = clinks.get("xbar_to_bridge_bw_gbs", 128.0)
for bname in ("left", "right"):
br_node = f"{cp}.bridge.{bname}"
for xbar_name in ("xbar_top", "xbar_bot"):
edges.append(Edge(
src=f"{cp}.{xbar_name}", dst=br_node,
distance_mm=bridge_mm, bw_gbs=bridge_bw,
kind="xbar_to_bridge",
))
edges.append(Edge(
src=br_node, dst=f"{cp}.{xbar_name}",
distance_mm=bridge_mm, bw_gbs=bridge_bw,
kind="bridge_to_xbar",
))
# ── UCIe ↔ conn ↔ NOC ──
ucie_conn_bw = ucie_cfg.get("per_connection_bw_gbs", 128.0)
for port in ucie_cfg["ports"]:
ucie_id = f"{cp}.ucie-{port}"
for ci in range(ucie_n_conn):
conn_id = f"{cp}.ucie-{port}.conn{ci}"
edges.append(Edge(
src=ucie_id, dst=conn_id,
distance_mm=0.0, kind="ucie_internal",
))
edges.append(Edge(
src=conn_id, dst=ucie_id,
distance_mm=0.0, kind="ucie_internal",
))
edges.append(Edge(
src=conn_id, dst=f"{cp}.noc",
distance_mm=0.0, bw_gbs=ucie_conn_bw,
kind="ucie_conn_to_noc",
))
edges.append(Edge(
src=f"{cp}.noc", dst=conn_id,
distance_mm=0.0, bw_gbs=ucie_conn_bw,
kind="noc_to_ucie_conn",
))
# ── m_cpu ↔ noc (command dispatch) ──
edges.append(Edge(
src=f"{cp}.m_cpu", dst=f"{cp}.noc",
distance_mm=clinks["m_cpu_to_noc_mm"],
kind="command",
))
edges.append(Edge(
src=f"{cp}.noc", dst=f"{cp}.m_cpu",
distance_mm=clinks["m_cpu_to_noc_mm"],
kind="command",
))
# ── noc ↔ sram ──
_noc_sram = clinks["noc_to_sram"]
edges.append(Edge(
src=f"{cp}.noc", dst=f"{cp}.sram",
distance_mm=clinks["noc_to_sram_mm"],
bw_gbs=_noc_sram["per_connection_bw_gbs"],
n_connections=_noc_sram["n_connections"],
kind="noc_to_sram",
))
edges.append(Edge(
src=f"{cp}.sram", dst=f"{cp}.noc",
distance_mm=clinks["noc_to_sram_mm"],
bw_gbs=_noc_sram["per_connection_bw_gbs"],
n_connections=_noc_sram["n_connections"],
kind="noc_to_sram",
))
def _add_pe_internal_edges(edges: list[Edge], pp: str, pe_links: dict) -> None:
"""Add PE-internal edges for a single PE instance."""
@@ -901,8 +911,8 @@ def _build_cube_view(spec: dict) -> ViewGraph:
label=f"UCIe-{port} C{ci}",
)
# Named components (hbm_ctrl as single representative node in view)
for name in ("noc", "m_cpu", "hbm_ctrl", "sram"):
# Named components (hbm_ctrl as single node in view)
for name in ("m_cpu", "hbm_ctrl", "sram"):
c = cube["components"][name]
lx, ly = local_pos.get(name, local_pos.get("hbm_ctrl"))
nodes[name] = Node(
@@ -911,159 +921,139 @@ def _build_cube_view(spec: dict) -> ViewGraph:
label=name.upper().replace("_", " "),
)
# xbar_top, xbar_bot
xbar_spec = cube["components"]["xbar"]
for xbar_name, xbar_cfg in [("xbar_top", xbar_spec["top"]),
("xbar_bot", xbar_spec["bottom"])]:
lx, ly = local_pos[xbar_name]
nodes[xbar_name] = Node(
id=xbar_name, kind=xbar_cfg["kind"], impl=xbar_cfg["impl"],
attrs=xbar_cfg["attrs"], pos_mm=(lx, ly),
label=xbar_name.upper().replace("_", " "),
# Load mesh data early (needed for router nodes + PE distances)
mesh_data = spec.get("_mesh", {})
# Router nodes from cube_mesh.yaml (explicit in view)
router_spec = cube["components"]["noc_router"]
routers = mesh_data.get("routers", {})
for rkey, rval in routers.items():
if rval is None:
continue
rx, ry = rval["pos_mm"]
nodes[rkey] = Node(
id=rkey, kind=router_spec["kind"], impl=router_spec["impl"],
attrs=router_spec["attrs"], pos_mm=(rx, ry),
label=rkey.upper(),
)
# Bridges
for br in xbar_spec["bridges"]:
bname = br["id"]
bid = f"bridge.{bname}"
lx, ly = local_pos[bid]
nodes[bid] = Node(
id=bid, kind=br["kind"], impl=br["impl"],
attrs=br["attrs"], pos_mm=(lx, ly),
label=f"Bridge {bname.upper()}",
)
# PEs as opaque blocks (no per-PE xbar nodes)
# PEs as opaque blocks
corners = cube["pe_layout"]["corners"]
pe_per_corner = cube["pe_layout"]["pe_per_corner"]
corner_pos = _corner_pe_positions(cube_w, cube_h)
mesh_data = spec.get("_mesh", {})
pe_noc_distances = _compute_pe_noc_distances(
mesh_data, corner_pos, corners, pe_per_corner,
) if mesh_data else {}
pe_idx = 0
pe_offset_y = 1.2 # mm offset to avoid overlapping router node
for corner in corners:
is_top = corner in ("NW", "NE")
for ci in range(pe_per_corner):
pid = f"pe{pe_idx}"
px, py = corner_pos[corner][ci]
# Offset PE above (top) or below (bottom) its router
py_view = py - pe_offset_y if is_top else py + pe_offset_y
nodes[pid] = Node(
id=pid, kind="pe", impl="",
attrs={"corner": corner}, pos_mm=(px, py),
attrs={"corner": corner}, pos_mm=(px, py_view),
label=f"PE{pe_idx}",
)
# PE → noc (distance auto-computed from PE physical position)
view_edges.append(Edge(
src=pid, dst="noc",
distance_mm=pe_noc_distances.get(pe_idx, 0.0),
bw_gbs=clinks["pe_dma_to_noc_bw_gbs"],
kind="pe_to_noc",
))
# noc → PE (command delivery)
view_edges.append(Edge(
src="noc", dst=pid,
distance_mm=clinks["noc_to_pe_cpu_mm"],
kind="command",
))
pe_idx += 1
# xbar_top/bot → hbm_ctrl
view_edges.append(Edge(
src="xbar_top", dst="hbm_ctrl",
distance_mm=clinks["xbar_to_hbm_mm"],
bw_gbs=clinks["xbar_to_hbm_bw_gbs"],
kind="xbar_to_hbm",
))
view_edges.append(Edge(
src="xbar_bot", dst="hbm_ctrl",
distance_mm=clinks["xbar_to_hbm_mm"],
bw_gbs=clinks["xbar_to_hbm_bw_gbs"],
kind="xbar_to_hbm",
))
# noc ↔ xbar_top/bot
noc_xbar_bw = clinks.get("noc_to_xbar_bw_gbs", 256.0)
noc_xbar_mm = clinks.get("noc_to_xbar_mm", 0.0)
for xbar_name in ("xbar_top", "xbar_bot"):
view_edges.append(Edge(
src="noc", dst=xbar_name,
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
kind="noc_to_xbar",
))
view_edges.append(Edge(
src=xbar_name, dst="noc",
distance_mm=noc_xbar_mm, bw_gbs=noc_xbar_bw,
kind="xbar_to_noc",
))
# bridge connections: xbar_top ↔ bridge ↔ xbar_bot
bridge_mm = clinks.get("xbar_to_bridge_mm", 3.0)
bridge_bw = clinks.get("xbar_to_bridge_bw_gbs", 128.0)
for bname in ("left", "right"):
br_id = f"bridge.{bname}"
for xbar_name in ("xbar_top", "xbar_bot"):
view_edges.append(Edge(
src=xbar_name, dst=br_id,
distance_mm=bridge_mm, bw_gbs=bridge_bw,
kind="xbar_to_bridge",
))
view_edges.append(Edge(
src=br_id, dst=xbar_name,
distance_mm=bridge_mm, bw_gbs=bridge_bw,
kind="bridge_to_xbar",
))
# View edges based on cube_mesh.yaml attach (mirrors _instantiate_cube logic)
pe_to_router_bw = clinks.get("pe_to_router_bw_gbs", 256.0)
hbm_to_router_bw = clinks.get("hbm_to_router_bw_gbs", 256.0)
sram_bw = clinks.get("sram_to_router_bw_gbs", 128.0)
ucie_conn_bw_v = ucie_cfg.get("per_connection_bw_gbs", 128.0)
for port in ucie_cfg["ports"]:
for ci in range(ucie_n_conn):
conn_id = f"ucie-{port}.conn{ci}"
view_edges.append(Edge(
src="noc", dst=conn_id,
distance_mm=0.0, bw_gbs=ucie_conn_bw_v,
kind="noc_to_ucie_conn",
))
view_edges.append(Edge(
src=conn_id, dst=f"ucie-{port}",
distance_mm=0.0, kind="ucie_internal",
))
view_edges.append(Edge(
src=f"ucie-{port}", dst=conn_id,
distance_mm=0.0, kind="ucie_internal",
))
view_edges.append(Edge(
src=conn_id, dst="noc",
distance_mm=0.0, bw_gbs=ucie_conn_bw_v,
kind="ucie_conn_to_noc",
))
n_rows = mesh_data.get("mesh", {}).get("rows", 6)
n_cols = mesh_data.get("mesh", {}).get("cols", 6)
# m_cpu ↔ noc
view_edges.append(Edge(
src="m_cpu", dst="noc",
distance_mm=clinks["m_cpu_to_noc_mm"],
kind="command",
))
view_edges.append(Edge(
src="noc", dst="m_cpu",
distance_mm=clinks["m_cpu_to_noc_mm"],
kind="command",
))
# Router ↔ router mesh edges
for r in range(n_rows):
for c in range(n_cols):
rkey = f"r{r}c{c}"
if routers.get(rkey) is None:
continue
src_pos = routers[rkey]["pos_mm"]
# Horizontal neighbor
for nc in range(c + 1, n_cols):
nkey = f"r{r}c{nc}"
if routers.get(nkey) is None:
continue
dist = abs(routers[nkey]["pos_mm"][0] - src_pos[0])
view_edges.append(Edge(
src=rkey, dst=nkey, distance_mm=round(dist, 2),
kind="router_mesh",
))
break
# Vertical neighbor
for nr in range(r + 1, n_rows):
nkey = f"r{nr}c{c}"
if routers.get(nkey) is None:
continue
dist = abs(routers[nkey]["pos_mm"][1] - src_pos[1])
view_edges.append(Edge(
src=rkey, dst=nkey, distance_mm=round(dist, 2),
kind="router_mesh",
))
break
# noc ↔ sram
_noc_sram_v = clinks["noc_to_sram"]
view_edges.append(Edge(
src="noc", dst="sram",
distance_mm=clinks["noc_to_sram_mm"],
bw_gbs=_noc_sram_v["per_connection_bw_gbs"],
n_connections=_noc_sram_v["n_connections"],
kind="noc_to_sram",
))
view_edges.append(Edge(
src="sram", dst="noc",
distance_mm=clinks["noc_to_sram_mm"],
bw_gbs=_noc_sram_v["per_connection_bw_gbs"],
n_connections=_noc_sram_v["n_connections"],
kind="noc_to_sram",
))
# Component ↔ router edges from attach lists
for rkey, rval in routers.items():
if rval is None:
continue
for item in rval.get("attach", []):
if item.endswith(".dma"):
pe_prefix = item.rsplit(".", 1)[0]
pid = pe_prefix.replace("pe", "pe") # "pe0" → "pe0"
if pid in nodes:
view_edges.append(Edge(
src=pid, dst=rkey, distance_mm=0.0,
bw_gbs=pe_to_router_bw, kind="pe_to_router",
))
view_edges.append(Edge(
src=rkey, dst=pid, distance_mm=0.0,
kind="command",
))
elif item.endswith(".hbm"):
view_edges.append(Edge(
src=rkey, dst="hbm_ctrl", distance_mm=0.0,
bw_gbs=hbm_to_router_bw, kind="router_to_hbm",
))
elif item == "m_cpu":
view_edges.append(Edge(
src="m_cpu", dst=rkey, distance_mm=0.0, kind="command",
))
view_edges.append(Edge(
src=rkey, dst="m_cpu", distance_mm=0.0, kind="command",
))
elif item == "sram":
view_edges.append(Edge(
src="sram", dst=rkey, distance_mm=0.0,
bw_gbs=sram_bw, kind="router_to_sram",
))
elif item.startswith("ucie_"):
parts = item.split(".")
direction = parts[0].replace("ucie_", "").upper()
conn_num = parts[1].replace("c", "")
conn_id = f"ucie-{direction}.conn{conn_num}"
view_edges.append(Edge(
src=rkey, dst=conn_id, distance_mm=0.0,
bw_gbs=ucie_conn_bw_v, kind="router_to_ucie_conn",
))
view_edges.append(Edge(
src=conn_id, dst=rkey, distance_mm=0.0,
bw_gbs=ucie_conn_bw_v, kind="ucie_conn_to_router",
))
view_edges.append(Edge(
src=conn_id, dst=f"ucie-{direction}",
distance_mm=0.0, kind="ucie_internal",
))
view_edges.append(Edge(
src=f"ucie-{direction}", dst=conn_id,
distance_mm=0.0, kind="ucie_internal",
))
return ViewGraph(
name="cube", nodes=nodes, edges=view_edges,
+34 -13
View File
@@ -50,6 +50,10 @@ def _compute_source_hash(cube_spec: dict) -> str:
"geometry": cube_spec["geometry"],
"pe_layout": cube_spec["pe_layout"],
"ucie_n_connections": cube_spec["ucie"]["n_connections"],
"hbm_mapping_mode": cube_spec.get("memory_map", {}).get(
"hbm_mapping_mode", "n_to_one"
),
"placement": cube_spec.get("placement", {}),
}
raw = yaml.dump(relevant, sort_keys=True)
return hashlib.sha256(raw.encode()).hexdigest()[:16]
@@ -108,6 +112,7 @@ def _compute_row_positions(
# Top half: evenly spaced from top PE y to just above HBM zone
top_pe_y = 1.5
hbm_gap = 1.5 # minimum gap between PE rows and HBM rows
hbm_top_y = cube_h / 2 - 1.5 # ~5.5 for h=14
hbm_bot_y = cube_h / 2 + 1.5 # ~8.5 for h=14
bot_pe_y = cube_h - 1.5
@@ -116,21 +121,24 @@ def _compute_row_positions(
if rows_per_half == 1:
top_rows = [top_pe_y]
else:
step = (hbm_top_y - top_pe_y) / (rows_per_half - 1) if rows_per_half > 1 else 0
# End before HBM zone with gap
top_end = hbm_top_y - hbm_gap
step = (top_end - top_pe_y) / (rows_per_half - 1) if rows_per_half > 1 else 0
for i in range(rows_per_half):
top_rows.append(round(top_pe_y + i * step, 1))
# HBM rows
hbm_rows = [round(hbm_top_y, 1), round(hbm_bot_y, 1)]
# Bottom half: mirror of top
# Bottom half: mirror of top, start after HBM zone with gap
bot_rows: list[float] = []
if rows_per_half == 1:
bot_rows = [bot_pe_y]
else:
step = (bot_pe_y - hbm_bot_y) / (rows_per_half - 1) if rows_per_half > 1 else 0
bot_start = hbm_bot_y + hbm_gap
step = (bot_pe_y - bot_start) / (rows_per_half - 1) if rows_per_half > 1 else 0
for i in range(rows_per_half):
bot_rows.append(round(hbm_bot_y + i * step, 1))
bot_rows.append(round(bot_start + i * step, 1))
return top_rows + hbm_rows + bot_rows, rows_per_half
@@ -206,6 +214,7 @@ def _generate_mesh(cube_spec: dict, source_hash: str) -> dict:
if router is not None:
router["attach"].append(f"pe{pe_idx}.dma")
router["attach"].append(f"pe{pe_idx}.cpu")
router["attach"].append(f"pe{pe_idx}.hbm")
if is_top:
top_pe_routers.append(key)
else:
@@ -213,13 +222,29 @@ def _generate_mesh(cube_spec: dict, source_hash: str) -> dict:
pe_idx += 1
# M_CPU and SRAM attachments (HBM row, leftmost available)
mcpu_key = f"r{hbm_row_start}c0"
if routers.get(mcpu_key) is not None:
# M_CPU and SRAM attachments: find nearest router to configured position
placement = cube_spec.get("placement", {})
def _nearest_router(target_mm: list[float]) -> str | None:
best_key, best_dist = None, float("inf")
for rk, rv in routers.items():
if rv is None:
continue
rx, ry = rv["pos_mm"]
dist = math.sqrt((rx - target_mm[0]) ** 2 + (ry - target_mm[1]) ** 2)
if dist < best_dist:
best_dist = dist
best_key = rk
return best_key
mcpu_pos = placement.get("m_cpu", {}).get("pos_mm", [1.5, 5.5])
mcpu_key = _nearest_router(mcpu_pos)
if mcpu_key and routers.get(mcpu_key) is not None:
routers[mcpu_key]["attach"].append("m_cpu")
sram_key = f"r{hbm_row_end}c0"
if routers.get(sram_key) is not None:
sram_pos = placement.get("sram", {}).get("pos_mm", [1.5, 8.5])
sram_key = _nearest_router(sram_pos)
if sram_key and routers.get(sram_key) is not None:
routers[sram_key]["attach"].append("sram")
# UCIe PE rows: top-half rows + bottom-half rows (1 per PE row)
@@ -277,8 +302,4 @@ def _generate_mesh(cube_spec: dict, source_hash: str) -> dict:
"cols": n_cols,
},
"routers": routers,
"xbar": {
"top": {"routers": sorted(set(top_pe_routers))},
"bottom": {"routers": sorted(set(bot_pe_routers))},
},
}
+528 -8
View File
@@ -22,7 +22,7 @@ _KIND_COLORS: dict[str, str] = {
"ucie_port": "#3b82f6", # blue
"noc": "#a78bfa", # purple
"m_cpu": "#f59e0b", # amber
"xbar": "#f97316", # orange
"noc_router": "#f97316", # orange
"hbm_ctrl": "#10b981", # emerald
"pe": "#94a3b8", # slate
"pe_cpu": "#ef4444", # red
@@ -40,10 +40,11 @@ _EDGE_COLORS: dict[str, str] = {
"io_internal": "#0ea5e9",
"io_to_cube": "#0ea5e9",
"ucie_mesh": "#3b82f6",
"pe_to_xbar": "#f97316",
"xbar_to_hbm": "#10b981",
"xbar_to_bridge": "#a78bfa",
"bridge_to_xbar": "#a78bfa",
"pe_to_router": "#f97316",
"router_to_hbm": "#10b981",
"hbm_to_router": "#10b981",
"router_mesh": "#a78bfa",
"router_to_sram": "#a78bfa",
"noc_to_ucie": "#a78bfa",
"pe_to_noc": "#a78bfa",
"noc_to_sram": "#f59e0b",
@@ -61,6 +62,12 @@ _KIND_SIZE: dict[str, tuple[float, float]] = {
"cube": (6.0, 4.0),
"iochiplet": (4.0, 1.5),
"switch": (5.0, 1.5),
"noc_router": (1.0, 0.7),
"ucie_port": (1.2, 0.7),
"ucie_conn": (0.8, 0.5),
"sram": (1.4, 0.7),
"m_cpu": (1.4, 0.7),
"hbm_ctrl": (1.8, 0.8),
}
@@ -82,7 +89,10 @@ def emit_diagrams(graph: TopologyGraph, out_dir: Path) -> list[Path]:
for name, view in views:
if view is None:
continue
svg = _render_view_svg(view)
if name == "cube_view":
svg = _render_cube_view_svg(view, graph.spec)
else:
svg = _render_view_svg(view)
path = out_dir / f"{name}.svg"
path.write_text(svg, encoding="utf-8")
created.append(path)
@@ -155,7 +165,7 @@ def _compute_node_sizes(
w_mm, h_mm = _KIND_SIZE.get(node.kind, (_DEFAULT_NODE_W, _DEFAULT_NODE_H))
# For cube view, use smaller PE nodes
if view.name == "cube" and node.kind == "pe":
w_mm, h_mm = 1.8, 1.0
w_mm, h_mm = 1.4, 0.7
if view.name == "pe":
w_mm, h_mm = 2.5, 1.4
sizes[nid] = (w_mm * scale, h_mm * scale)
@@ -245,7 +255,7 @@ def _draw_node(
# ── Fan-out edge kinds that need offset routing ─────────────────────
_FANOUT_KINDS = {"pe_to_xbar", "pe_to_noc", "command", "noc_to_ucie"}
_FANOUT_KINDS = {"pe_to_router", "command", "router_to_ucie_conn", "ucie_conn_to_router"}
def _draw_edge(
@@ -272,6 +282,14 @@ def _draw_edge(
color = _EDGE_COLORS.get(edge.kind, "#94a3b8")
width = "1.5" if edge.kind == "pe_internal" else "1"
opacity = "0.6" if edge.kind in ("command", "noc_to_ucie") else "0.8"
# HBM links: thin and faint to reduce clutter
if edge.kind in ("router_to_hbm", "hbm_to_router"):
width = "0.5"
opacity = "0.3"
# Router mesh links: thin
if edge.kind == "router_mesh":
width = "0.5"
opacity = "0.4"
if edge.kind in _FANOUT_KINDS and view.name == "cube":
# Orthogonal routing: src→horizontal→vertical→dst with per-edge offset.
@@ -365,3 +383,505 @@ def _label_font_size(box_width: float, label: str) -> int:
def _escape(text: str) -> str:
"""Escape XML special characters."""
return text.replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
# ── Connector helper ─────────────────────────────────────────────────
def _connector_points(
rx: float, ry: float, cx: float, cy: float
) -> str:
"""Return SVG polyline points for a rule-based connector.
Horizontal-dominant (|dx| >= |dy|): 45° horizontal straight 45°.
Vertical-dominant (|dy| > |dx|): 45° vertical straight 45°.
Near-equal or tiny distance: single straight line.
"""
dx = cx - rx
dy = cy - ry
adx, ady = abs(dx), abs(dy)
# Trivial distance → single line
# Near-45° diagonal for short distances only (e.g. PE↔router)
if adx + ady < 4 or (abs(adx - ady) < 4 and adx + ady < 80):
return f"{rx:.0f},{ry:.0f} {cx:.0f},{cy:.0f}"
sx = 1 if dx >= 0 else -1
sy = 1 if dy >= 0 else -1
if adx >= ady:
# Horizontal-dominant: stubs handle vertical, straight is horizontal
stub = ady / 2
if stub < 2:
return f"{rx:.0f},{ry:.0f} {cx:.0f},{cy:.0f}"
r45x = rx + sx * stub
r45y = ry + sy * stub
c45x = cx - sx * stub
c45y = cy - sy * stub # r45y == c45y (horizontal)
else:
# Vertical-dominant: stubs handle horizontal, straight is vertical
stub = adx / 2
if stub < 2:
return f"{rx:.0f},{ry:.0f} {cx:.0f},{cy:.0f}"
r45x = rx + sx * stub
r45y = ry + sy * stub
c45x = cx - sx * stub
c45y = cy - sy * stub # r45x == c45x (vertical)
return (
f"{rx:.0f},{ry:.0f} {r45x:.0f},{r45y:.0f} "
f"{c45x:.0f},{c45y:.0f} {cx:.0f},{cy:.0f}"
)
# ── Cube-specific renderer ──────────────────────────────────────────
def _render_cube_view_svg(view: ViewGraph, spec: dict) -> str:
"""Render cube view with topology validation detail.
Shows: 6×6 router grid, PE attachments, HBM pseudo channel ports,
M_CPU/SRAM positions, UCIe connections, BW annotations.
"""
mesh_data = spec.get("_mesh", {})
routers = mesh_data.get("routers", {})
n_rows = mesh_data.get("mesh", {}).get("rows", 6)
n_cols = mesh_data.get("mesh", {}).get("cols", 6)
cube = spec.get("cube", {})
mm = cube.get("memory_map", {})
clinks = cube.get("links", {})
cube_w = cube.get("geometry", {}).get("cube_mm", {}).get("w", 17.0)
cube_h = cube.get("geometry", {}).get("cube_mm", {}).get("h", 14.0)
channels_per_pe = mm.get("hbm_channels_per_pe", 8)
channel_bw = mm.get("hbm_channel_bw_gbs", 32.0)
total_ch = mm.get("hbm_pseudo_channels", 64)
mode = mm.get("hbm_mapping_mode", "n_to_one")
agg_bw = channels_per_pe * channel_bw
scale = 50 # px per mm
pad = 60
w_px = int(cube_w * scale + 2 * pad)
h_px = int(cube_h * scale + 2 * pad + 80) # extra for legend
parts: list[str] = []
parts.append(_svg_header(w_px, h_px, "cube"))
# Background
parts.append(f' <rect width="{w_px}" height="{h_px}" fill="#0f172a"/>')
# Title
parts.append(
f' <text x="{w_px // 2}" y="22" text-anchor="middle" '
f'font-family="monospace" font-size="14" font-weight="bold" fill="#94a3b8">'
f'CUBE TOPOLOGY — {cube_w}×{cube_h}mm | {n_rows}×{n_cols} Router Mesh | '
f'{mode} mode | {total_ch} pseudo-ch</text>'
)
# Subtitle
parts.append(
f' <text x="{w_px // 2}" y="40" text-anchor="middle" '
f'font-family="monospace" font-size="10" fill="#64748b">'
f'Per-PE: {channels_per_pe} ch × {channel_bw} GB/s = {agg_bw} GB/s | '
f'Cube total: {total_ch} × {channel_bw} = {total_ch * channel_bw} GB/s</text>'
)
# Cube boundary
bx, by = pad, pad
parts.append(
f' <rect x="{bx}" y="{by}" width="{cube_w * scale}" height="{cube_h * scale}" '
f'rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>'
)
def mm2px(x_mm: float, y_mm: float) -> tuple[float, float]:
return pad + x_mm * scale, pad + y_mm * scale
# ── HBM zone background (centered, 9×5mm) ──
hbm_x, hbm_y = mm2px(4.0, 4.5)
hbm_w, hbm_h = 9.0 * scale, 5.0 * scale
parts.append(
f' <rect x="{hbm_x:.0f}" y="{hbm_y:.0f}" '
f'width="{hbm_w:.0f}" height="{hbm_h:.0f}" '
f'rx="6" fill="#052e16" stroke="#047857" stroke-width="2" opacity="0.6"/>'
)
# HBM label
hcx, hcy = mm2px(8.5, 7.0)
parts.append(
f' <text x="{hcx:.0f}" y="{hcy - 15:.0f}" text-anchor="middle" '
f'font-family="monospace" font-size="11" font-weight="bold" fill="#047857">'
f'HBM_CTRL | {total_ch} pseudo channels</text>'
)
parts.append(
f' <text x="{hcx:.0f}" y="{hcy + 2:.0f}" text-anchor="middle" '
f'font-family="monospace" font-size="9" fill="#05966988">'
f'Total BW: {total_ch * channel_bw:.0f} GB/s</text>'
)
# ── Pseudo channel ports on HBM top/bottom edges ──
# Top edge: 32 ports (PE0..PE3, 8 each), Bottom edge: 32 ports (PE4..PE7)
half_ch = total_ch // 2
pes_per_half = half_ch // channels_per_pe # 4 PEs per half
port_bar_w = hbm_w - 20 # slightly narrower than HBM zone
port_w = port_bar_w / half_ch
port_h = 8
pe_colors = ["#3b82f6", "#60a5fa", "#8b5cf6", "#a78bfa",
"#f59e0b", "#fbbf24", "#ef4444", "#f87171"]
for half_idx, (edge_y, pe_start) in enumerate([
(hbm_y + 4, 0), # top edge, PE0-PE3
(hbm_y + hbm_h - port_h - 4, pes_per_half), # bottom edge, PE4-PE7
]):
bar_x = hbm_x + 10
for i in range(half_ch):
pe_owner = pe_start + i // channels_per_pe
c = pe_colors[pe_owner % len(pe_colors)]
px = bar_x + i * port_w
parts.append(
f' <rect x="{px:.1f}" y="{edge_y:.0f}" '
f'width="{max(port_w - 0.5, 1):.1f}" height="{port_h}" '
f'rx="1" fill="{c}" opacity="0.8"/>'
)
# Per-PE group labels
for p in range(pes_per_half):
gx = bar_x + (p * channels_per_pe + channels_per_pe / 2) * port_w
label_y = edge_y - 3 if half_idx == 0 else edge_y + port_h + 8
parts.append(
f' <text x="{gx:.0f}" y="{label_y:.0f}" text-anchor="middle" '
f'font-family="monospace" font-size="6" fill="{pe_colors[(pe_start + p) % len(pe_colors)]}">'
f'PE{pe_start + p}×{channels_per_pe}ch</text>'
)
# Store port group centers for PE→HBM connection lines (used later)
_pe_hbm_targets: dict[int, tuple[float, float]] = {}
for half_idx, (edge_y, pe_start) in enumerate([
(hbm_y + 4, 0),
(hbm_y + hbm_h - port_h - 4, pes_per_half),
]):
bar_x = hbm_x + 10
for p in range(pes_per_half):
pe_id = pe_start + p
gx = bar_x + (p * channels_per_pe + channels_per_pe / 2) * port_w
gy = edge_y if half_idx == 0 else edge_y + port_h
_pe_hbm_targets[pe_id] = (gx, gy)
# ── Router mesh links ──
for r in range(n_rows):
for c in range(n_cols):
rkey = f"r{r}c{c}"
if routers.get(rkey) is None:
continue
rx, ry = routers[rkey]["pos_mm"]
sx, sy = mm2px(rx, ry)
# Horizontal neighbor
for nc in range(c + 1, n_cols):
nkey = f"r{r}c{nc}"
if routers.get(nkey) is None:
continue
nx, ny = routers[nkey]["pos_mm"]
dx, dy = mm2px(nx, ny)
parts.append(
f' <line x1="{sx:.0f}" y1="{sy:.0f}" '
f'x2="{dx:.0f}" y2="{dy:.0f}" '
f'stroke="#475569" stroke-width="1" opacity="0.4"/>'
)
break
# Vertical neighbor
for nr in range(r + 1, n_rows):
nkey = f"r{nr}c{c}"
if routers.get(nkey) is None:
continue
nx, ny = routers[nkey]["pos_mm"]
dx, dy = mm2px(nx, ny)
parts.append(
f' <line x1="{sx:.0f}" y1="{sy:.0f}" '
f'x2="{dx:.0f}" y2="{dy:.0f}" '
f'stroke="#475569" stroke-width="1" opacity="0.4"/>'
)
break
# ── Router nodes + attached component blocks ──
r_size = 8 # px radius for router circle
blk_w, blk_h = 32, 16 # px for component blocks
# Component style definitions
_COMP_STYLE = {
"pe": {"fill": "#2d1f3d", "stroke": "#a855f7", "text": "#a855f7"},
"mcpu": {"fill": "#451a03", "stroke": "#f59e0b", "text": "#f59e0b"},
"sram": {"fill": "#1c1917", "stroke": "#d97706", "text": "#d97706"},
"ucie": {"fill": "#1e1b4b", "stroke": "#8b5cf6", "text": "#8b5cf6"},
}
for rkey, rval in routers.items():
if rval is None:
continue
rx, ry = rval["pos_mm"]
px, py = mm2px(rx, ry)
attach = rval.get("attach", [])
is_top = ry < cube_h / 2
# ── Router circle ──
has_attach = len(attach) > 0
r_fill = "#475569" if has_attach else "#334155"
r_stroke = "#64748b" if has_attach else "#475569"
parts.append(
f' <circle cx="{px:.0f}" cy="{py:.0f}" r="{r_size}" '
f'fill="{r_fill}" stroke="{r_stroke}" stroke-width="1"/>'
)
parts.append(
f' <text x="{px:.0f}" y="{py + 3:.0f}" text-anchor="middle" '
f'font-family="monospace" font-size="6" fill="white">'
f'{rkey}</text>'
)
# ── Router → HBM_CTRL line (deferred, drawn after component blocks) ──
# ── Attached component blocks ──
# Collect components to draw, positioned outward from router
blocks: list[tuple[str, str, dict]] = [] # (label, kind, style)
pe_items = [a for a in attach if a.endswith(".dma")]
if pe_items:
pe_name = pe_items[0].split(".")[0].upper()
blocks.append((pe_name, "pe", _COMP_STYLE["pe"]))
if "m_cpu" in attach:
blocks.append(("M_CPU", "mcpu", _COMP_STYLE["mcpu"]))
if "sram" in attach:
blocks.append(("SRAM", "sram", _COMP_STYLE["sram"]))
# UCIe handled separately below
# Position blocks outward from router (away from cube center)
for bi, (label, kind, style) in enumerate(blocks):
# Determine placement direction: PE/components go outward
# Use left/right offset for multiple blocks on same router
offset_x = (bi - (len(blocks) - 1) / 2) * (blk_w + 4)
gap = 30 # px gap between router and component (room for 2 × 45° stubs)
if kind == "mcpu":
# M_CPU: place above (north of) router
bx = px - blk_w / 2
by = py - r_size - blk_h - gap
elif kind == "sram":
# SRAM: place below (south of) router
bx = px - blk_w / 2
by = py + r_size + gap
else:
# PE: place above (top half) or below (bottom half)
bx = px + offset_x - blk_w / 2
if is_top:
by = py - r_size - blk_h - gap - bi * (blk_h + 2)
else:
by = py + r_size + gap + bi * (blk_h + 2)
# Block rect
parts.append(
f' <rect x="{bx:.0f}" y="{by:.0f}" '
f'width="{blk_w}" height="{blk_h}" '
f'rx="3" fill="{style["fill"]}" stroke="{style["stroke"]}" stroke-width="1"/>'
)
# Label
font_sz = 6 if len(label) > 6 else 7
parts.append(
f' <text x="{bx + blk_w / 2:.0f}" y="{by + blk_h / 2 + 3:.0f}" '
f'text-anchor="middle" font-family="monospace" font-size="{font_sz}" '
f'font-weight="bold" fill="{style["text"]}">{_escape(label)}</text>'
)
# Connector: rule-based (short → 45° line, long → 45°-straight-45°)
sc = style["stroke"]
# Determine start (router edge) and end (component edge) points
bxc = bx + blk_w / 2 # component center x
if kind == "mcpu":
rx0, ry0 = px, py - r_size # router top
cx0, cy0 = bxc, by + blk_h # component bottom
elif kind == "sram":
rx0, ry0 = px, py + r_size # router bottom
cx0, cy0 = bxc, by # component top
elif is_top:
rx0, ry0 = px, py - r_size # router top
cx0, cy0 = bx + blk_w / 2 + offset_x, by + blk_h # component bottom
else:
rx0, ry0 = px, py + r_size # router bottom
cx0, cy0 = bx + blk_w / 2 + offset_x, by # component top
# PE/M_CPU/SRAM directly above/below router (same X):
# single diagonal line from router center to component right edge
if abs(cx0 - rx0) < 2 and abs(cy0 - ry0) > 4:
cx0 = bx + blk_w - 2
parts.append(
f' <line x1="{rx0:.0f}" y1="{ry0:.0f}" '
f'x2="{cx0:.0f}" y2="{cy0:.0f}" '
f'stroke="{sc}" stroke-width="1" opacity="0.6"/>'
)
else:
pts = _connector_points(rx0, ry0, cx0, cy0)
parts.append(
f' <polyline points="{pts}" '
f'fill="none" stroke="{sc}" stroke-width="1" opacity="0.6"/>'
)
# (PE→HBM BW annotation drawn in the PE→HBM port group section above)
# ── PE Router → HBM pseudo channel port group lines ──
# Each PE router connects to its port group center on the HBM edge
for rkey, rval in routers.items():
if rval is None:
continue
attach = rval.get("attach", [])
pe_dma_items = [a for a in attach if a.endswith(".dma")]
if not pe_dma_items:
continue
pe_id = int(pe_dma_items[0].split(".")[0].replace("pe", ""))
if pe_id not in _pe_hbm_targets:
continue
rx, ry = rval["pos_mm"]
rpx, rpy = mm2px(rx, ry)
tgx, tgy = _pe_hbm_targets[pe_id]
r_edge_y = rpy + r_size if rpy < hbm_y else rpy - r_size
# Rule-based connector: router → HBM port group
pts = _connector_points(rpx, r_edge_y, tgx, tgy)
parts.append(
f' <polyline points="{pts}" '
f'fill="none" stroke="#10b981" stroke-width="1.5" opacity="0.6" '
f'stroke-dasharray="4,3"/>'
)
# BW annotation at midpoint
mx = (rpx + tgx) / 2 + 10
my = (r_edge_y + tgy) / 2
parts.append(
f' <text x="{mx:.0f}" y="{my:.0f}" '
f'font-family="monospace" font-size="6" fill="#10b98188">'
f'{agg_bw:.0f}GB/s</text>'
)
# ── UCIe port components (position/size from topology.yaml) ──
# ucie_mm.size = 2.0mm, positions at cube edges (flush)
ucie_size_mm = cube.get("geometry", {}).get("ucie_mm", {}).get("size", 2.0)
uh_half = ucie_size_mm * 0.3 # half-height for edge placement
uw_half = ucie_size_mm * 0.5
ucie_positions = {
"N": (cube_w / 2, uh_half), # flush top edge
"S": (cube_w / 2, cube_h - uh_half), # flush bottom edge
"W": (uh_half, cube_h / 2), # flush left edge
"E": (cube_w - uh_half, cube_h / 2), # flush right edge
}
# Collect UCIe connections per direction
ucie_by_dir: dict[str, list[tuple[str, str, float, float]]] = {}
for rkey, rval in routers.items():
if rval is None:
continue
rx, ry = rval["pos_mm"]
for a in rval.get("attach", []):
if not a.startswith("ucie_"):
continue
parts_a = a.split(".")
direction = parts_a[0].replace("ucie_", "").upper()
conn = parts_a[1] if len(parts_a) > 1 else "c0"
ucie_by_dir.setdefault(direction, []).append((conn, rkey, rx, ry))
ucie_colors = ["#818cf8", "#a78bfa", "#c084fc", "#e879f9"]
for direction, conns in ucie_by_dir.items():
conns.sort(key=lambda x: x[0])
n_conn = len(conns)
ucx_mm, ucy_mm = ucie_positions.get(direction, (cube_w / 2, cube_h / 2))
ucx, ucy = mm2px(ucx_mm, ucy_mm)
# UCIe box: size from topology, N/S horizontal, E/W vertical
us = ucie_size_mm * scale
if direction in ("N", "S"):
uw, uh = us, us * 0.5
else:
uw, uh = us * 0.5, us
ux = ucx - uw / 2
uy = ucy - uh / 2
# UCIe component background
parts.append(
f' <rect x="{ux:.0f}" y="{uy:.0f}" '
f'width="{uw:.0f}" height="{uh:.0f}" '
f'rx="3" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="1.5" opacity="0.9"/>'
)
# UCIe direction label
parts.append(
f' <text x="{ucx:.0f}" y="{uy - 3:.0f}" text-anchor="middle" '
f'font-family="monospace" font-size="7" font-weight="bold" fill="#8b5cf6">'
f'UCIe-{direction}</text>'
)
# Connection port boxes inside UCIe component
for ci, (conn, rkey, crx, cry) in enumerate(conns):
c_color = ucie_colors[ci % len(ucie_colors)]
if direction in ("N", "S"):
cw = max((uw - 4) / n_conn - 1, 6)
ch = uh - 4
cx = ux + 2 + ci * (cw + 1)
cy_box = uy + 2
else:
cw = uw - 4
ch = max((uh - 4) / n_conn - 1, 6)
cx = ux + 2
cy_box = uy + 2 + ci * (ch + 1)
parts.append(
f' <rect x="{cx:.0f}" y="{cy_box:.0f}" '
f'width="{cw:.0f}" height="{ch:.0f}" '
f'rx="2" fill="{c_color}" opacity="0.7"/>'
)
lx = cx + cw / 2
ly_t = cy_box + ch / 2 + 3
parts.append(
f' <text x="{lx:.0f}" y="{ly_t:.0f}" text-anchor="middle" '
f'font-family="monospace" font-size="5" fill="white">'
f'{conn}</text>'
)
# Connector: rule-based router → UCIe port
rpx, rpy = mm2px(crx, cry)
if direction == "N":
rx, ry = rpx, rpy - r_size
tx, ty = lx, cy_box + ch
elif direction == "S":
rx, ry = rpx, rpy + r_size
tx, ty = lx, cy_box
elif direction == "W":
rx, ry = rpx - r_size, rpy
tx, ty = cx + cw, cy_box + ch / 2
elif direction == "E":
rx, ry = rpx + r_size, rpy
tx, ty = cx, cy_box + ch / 2
else:
continue
pts = _connector_points(rx, ry, tx, ty)
parts.append(
f' <polyline points="{pts}" '
f'fill="none" stroke="{c_color}" stroke-width="1" opacity="0.5"/>'
)
# ── Legend ──
ly = h_px - 35
legend_items = [
("#3b82f6", "PE Router"),
("#f59e0b", "M_CPU / SRAM"),
("#8b5cf6", "UCIe"),
("#334155", "Relay"),
("#10b981", "HBM Link"),
("#475569", "Mesh Link"),
]
lx = pad
for color, label in legend_items:
parts.append(
f' <rect x="{lx}" y="{ly}" width="10" height="10" rx="2" '
f'fill="{color}" stroke="#475569" stroke-width="0.5"/>'
)
parts.append(
f' <text x="{lx + 14}" y="{ly + 9}" '
f'font-family="monospace" font-size="8" fill="#94a3b8">'
f'{label}</text>'
)
lx += len(label) * 7 + 24
parts.append("</svg>")
return "\n".join(parts)
+56 -215
View File
@@ -26,8 +26,8 @@
--pe-stroke: #a855f7;
--io-fill: #3d2b1f;
--io-stroke: #f97316;
--xbar-fill: #1f2d3d;
--xbar-stroke: #06b6d4;
--router-fill: #1f2d3d;
--router-stroke: #06b6d4;
--link-color: #475569;
--link-active: #3b82f6;
}
@@ -405,8 +405,8 @@ body {
PE
</div>
<div class="legend-item">
<div class="legend-swatch" style="background:var(--xbar-fill);border-color:var(--xbar-stroke)"></div>
XBAR / NOC
<div class="legend-swatch" style="background:var(--router-fill);border-color:var(--router-stroke)"></div>
Router Mesh
</div>
</div>
@@ -716,7 +716,7 @@ function drawCubeNode(svg, x, y, idx) {
g.appendChild(pt);
}
// Center block: xbar + NOC
// Center block: router mesh
g.appendChild(svgEl("rect", {
x: x + 30, y: y + 30, width: CUBE_W - 60, height: CUBE_H - 56,
rx: 3, fill: "#1f2d3d", stroke: "#06b6d466", "stroke-width": 0.8
@@ -728,7 +728,7 @@ function drawCubeNode(svg, x, y, idx) {
"font-size": "7",
fill: "#06b6d4aa"
});
xt.textContent = "NOC+XBAR";
xt.textContent = "Router Mesh";
g.appendChild(xt);
// HBM indicators (top and bottom)
@@ -871,51 +871,6 @@ function drawCubeView(svg, cubeIdx) {
}
}
// ── PE router → XBAR_TOP paths (90-degree angled, matching reference) ──
// r0c0 → XBAR_TOP left: down then right
const xbarTopY = OY + 145; // reference: rect at y=145
const xbarBotY = OY + 355; // reference: rect at y=355
const xbarX = OX + 150; // reference: x=150
const xbarW = 400; // reference: width=400
svg.appendChild(svgEl("path", {
d: `M ${OX} ${OY+16} V ${xbarTopY+6} H ${xbarX}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
svg.appendChild(svgEl("path", {
d: `M ${OX+140} ${OY+16} V ${xbarTopY} H ${xbarX}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
svg.appendChild(svgEl("path", {
d: `M ${OX+560} ${OY+107} V ${xbarTopY} H ${xbarX+xbarW}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
svg.appendChild(svgEl("path", {
d: `M ${OX+700} ${OY+107} V ${xbarTopY+6} H ${xbarX+xbarW}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
// ── XBAR_TOP bar ──
svg.appendChild(svgEl("rect", {
x: xbarX, y: xbarTopY, width: xbarW, height: 22,
rx: 5, fill: "#f97316", stroke: "#ea580c", "stroke-width": 2
}));
const xtT = svgEl("text", {
x: xbarX + xbarW / 2, y: xbarTopY + 15, "text-anchor": "middle",
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "white"
});
xtT.textContent = "XBAR_TOP | xbar_v1 | 2.0ns";
svg.appendChild(xtT);
// ── XBAR_TOP → HBM0-3 arrows ──
const hbmArrowXs = [OX + 225, OX + 320, OX + 415, OX + 475];
for (const ax of hbmArrowXs) {
svg.appendChild(svgEl("line", {
x1: ax, y1: xbarTopY + 22, x2: ax, y2: OY + 198,
stroke: "#059669", "stroke-width": 1.5
}));
}
// ── HBM ZONE ──
const hbmZoneX = OX + 145, hbmZoneY = OY + 195, hbmZoneW = 410, hbmZoneH = 152;
svg.appendChild(svgEl("rect", {
@@ -926,181 +881,71 @@ function drawCubeView(svg, cubeIdx) {
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 16, "text-anchor": "middle",
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "#047857"
});
hzmLabel.textContent = "HBM 9.0 x 5.0 mm | hbm_ctrl_v1 x 8";
hzmLabel.textContent = "HBM 9.0 x 5.0 mm | hbm_ctrl_v1";
svg.appendChild(hzmLabel);
// HBM0-3 (top row)
const hbmSliceW = 85, hbmSliceH = 28;
const hbmTopSlices = [
{ x: OX + 168, label: "HBM0" }, { x: OX + 260, label: "HBM1" },
{ x: OX + 352, label: "HBM2" }, { x: OX + 444, label: "HBM3" }
];
for (const hs of hbmTopSlices) {
const g = svgEl("g", { class: "node-group", "data-id": hs.label.toLowerCase() });
g.appendChild(svgEl("rect", {
x: hs.x, y: hbmZoneY + 23, width: hbmSliceW, height: hbmSliceH,
rx: 4, fill: "#047857", stroke: "#065f46", "stroke-width": 1.5
}));
const t = svgEl("text", {
x: hs.x + hbmSliceW / 2, y: hbmZoneY + 23 + 18, "text-anchor": "middle",
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
});
t.textContent = hs.label;
g.appendChild(t);
svg.appendChild(g);
}
// Single HBM_CTRL block (centered in HBM zone)
const hbmCtrlG = svgEl("g", { class: "node-group", "data-id": "hbm_ctrl" });
hbmCtrlG.appendChild(svgEl("rect", {
x: hbmZoneX + 40, y: hbmZoneY + 28, width: hbmZoneW - 80, height: 40,
rx: 6, fill: "#047857", stroke: "#065f46", "stroke-width": 1.5
}));
const hbmCtrlT = svgEl("text", {
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 53, "text-anchor": "middle",
"font-family": "monospace", "font-size": "10", "font-weight": "bold", fill: "white"
});
hbmCtrlT.textContent = "HBM_CTRL";
hbmCtrlG.appendChild(hbmCtrlT);
svg.appendChild(hbmCtrlG);
// Exclusion zone label
const hexLabel = svgEl("text", {
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 75, "text-anchor": "middle",
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 85, "text-anchor": "middle",
"font-family": "monospace", "font-size": "7", fill: "#ef4444aa"
});
hexLabel.textContent = "Router exclusion: r2c2, r2c3, r3c2, r3c3";
svg.appendChild(hexLabel);
// HBM4-7 (bottom row)
const hbmBotSlices = [
{ x: OX + 168, label: "HBM4" }, { x: OX + 260, label: "HBM5" },
{ x: OX + 352, label: "HBM6" }, { x: OX + 444, label: "HBM7" }
// "All routers connect to HBM" annotation
const hbmAnnot = svgEl("text", {
x: hbmZoneX + hbmZoneW / 2, y: hbmZoneY + 100, "text-anchor": "middle",
"font-family": "monospace", "font-size": "6", fill: "#059669aa"
});
hbmAnnot.textContent = "All routers → HBM_CTRL (mesh-connected)";
svg.appendChild(hbmAnnot);
// ── HBM connectivity indicators (thin green dotted lines from edge routers to HBM zone) ──
// Draw thin green dotted lines from routers adjacent to HBM zone down/up to HBM
const hbmConnRouters = [
{ r: 1, c: 2 }, { r: 1, c: 3 }, // top edge of HBM zone
{ r: 4, c: 2 }, { r: 4, c: 3 }, // bottom edge of HBM zone
{ r: 2, c: 1 }, { r: 3, c: 1 }, // left edge of HBM zone
{ r: 2, c: 4 }, { r: 3, c: 4 }, // right edge of HBM zone
];
for (const hs of hbmBotSlices) {
const g = svgEl("g", { class: "node-group", "data-id": hs.label.toLowerCase() });
g.appendChild(svgEl("rect", {
x: hs.x, y: hbmZoneY + hbmZoneH - hbmSliceH - 23 + 10, width: hbmSliceW, height: hbmSliceH,
rx: 4, fill: "#065f46", stroke: "#064e3b", "stroke-width": 1.5
}));
const t = svgEl("text", {
x: hs.x + hbmSliceW / 2, y: hbmZoneY + hbmZoneH - hbmSliceH - 23 + 10 + 18, "text-anchor": "middle",
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
});
t.textContent = hs.label;
g.appendChild(t);
svg.appendChild(g);
}
// ── XBAR_BOT → HBM4-7 arrows (upward) ──
for (const ax of hbmArrowXs) {
for (const hr of hbmConnRouters) {
const rp = rXY(hr.r, hr.c);
// Draw line toward the HBM zone center
const hbmCenterX = hbmZoneX + hbmZoneW / 2;
const hbmCenterY = hbmZoneY + hbmZoneH / 2;
// Compute endpoint clipped to HBM zone edge
let ex = hbmCenterX, ey = hbmCenterY;
if (hr.r <= 1) { ey = hbmZoneY; ex = rp.x; } // top routers → top of HBM zone
else if (hr.r >= 4) { ey = hbmZoneY + hbmZoneH; ex = rp.x; } // bottom routers → bottom of HBM zone
else if (hr.c <= 1) { ex = hbmZoneX; ey = rp.y; } // left routers → left of HBM zone
else { ex = hbmZoneX + hbmZoneW; ey = rp.y; } // right routers → right of HBM zone
svg.appendChild(svgEl("line", {
x1: ax, y1: xbarBotY, x2: ax, y2: OY + 315,
stroke: "#059669", "stroke-width": 1.5
x1: rp.x, y1: rp.y, x2: ex, y2: ey,
stroke: "#05966988", "stroke-width": 1, "stroke-dasharray": "3,3"
}));
}
// ── XBAR_BOT bar ──
svg.appendChild(svgEl("rect", {
x: xbarX, y: xbarBotY, width: xbarW, height: 22,
rx: 5, fill: "#f97316", stroke: "#ea580c", "stroke-width": 2
}));
const xbT = svgEl("text", {
x: xbarX + xbarW / 2, y: xbarBotY + 15, "text-anchor": "middle",
"font-family": "monospace", "font-size": "9", "font-weight": "bold", fill: "white"
});
xbT.textContent = "XBAR_BOT | xbar_v1 | 2.0ns";
svg.appendChild(xbT);
// ── PE router → XBAR_BOT paths (90-degree angled) ──
svg.appendChild(svgEl("path", {
d: `M ${OX} ${OY+409} V ${xbarBotY+16} H ${xbarX}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
svg.appendChild(svgEl("path", {
d: `M ${OX+140} ${OY+409} V ${xbarBotY+10} H ${xbarX}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
svg.appendChild(svgEl("path", {
d: `M ${OX+560} ${OY+508} V ${xbarBotY+10} H ${xbarX+xbarW}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
svg.appendChild(svgEl("path", {
d: `M ${OX+700} ${OY+508} V ${xbarBotY+16} H ${xbarX+xbarW}`,
fill: "none", stroke: "#f97316", "stroke-width": 1.5, "stroke-dasharray": "4,3"
}));
// ── BRIDGES (purple/violet, matching reference) ──
const brgLeftX = OX + 100, brgRightX = OX + 600;
// Left bridge vertical line
svg.appendChild(svgEl("line", {
x1: brgLeftX, y1: xbarTopY + 10, x2: brgLeftX, y2: xbarBotY + 12,
stroke: "#a78bfa", "stroke-width": 2.5, "stroke-dasharray": "8,4"
}));
// Left bridge horizontal stubs
svg.appendChild(svgEl("line", {
x1: brgLeftX, y1: xbarTopY + 6, x2: xbarX, y2: xbarTopY + 6,
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
}));
svg.appendChild(svgEl("line", {
x1: brgLeftX, y1: xbarBotY + 16, x2: xbarX, y2: xbarBotY + 16,
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
}));
// Left bridge label
svg.appendChild(svgEl("rect", {
x: brgLeftX - 28, y: OY + 248, width: 56, height: 30,
rx: 4, fill: "#1e1b4b", stroke: "#a78bfa", "stroke-width": 1.5
}));
let bt = svgEl("text", {
x: brgLeftX, y: OY + 259, "text-anchor": "middle",
"font-family": "monospace", "font-size": "6", "font-weight": "bold", fill: "#a78bfa"
});
bt.textContent = "XBAR BRG";
svg.appendChild(bt);
bt = svgEl("text", {
x: brgLeftX, y: OY + 272, "text-anchor": "middle",
"font-family": "monospace", "font-size": "7", "font-weight": "bold", fill: "#a78bfa"
});
bt.textContent = "LEFT";
svg.appendChild(bt);
bt = svgEl("text", {
x: brgLeftX - 36, y: OY + 263, "text-anchor": "end",
"font-family": "monospace", "font-size": "6", fill: "#a78bfa88"
});
bt.textContent = "3mm";
svg.appendChild(bt);
// Right bridge vertical line
svg.appendChild(svgEl("line", {
x1: brgRightX, y1: xbarTopY + 10, x2: brgRightX, y2: xbarBotY + 12,
stroke: "#a78bfa", "stroke-width": 2.5, "stroke-dasharray": "8,4"
}));
// Right bridge horizontal stubs
svg.appendChild(svgEl("line", {
x1: brgRightX, y1: xbarTopY + 6, x2: xbarX + xbarW, y2: xbarTopY + 6,
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
}));
svg.appendChild(svgEl("line", {
x1: brgRightX, y1: xbarBotY + 16, x2: xbarX + xbarW, y2: xbarBotY + 16,
stroke: "#a78bfa", "stroke-width": 2, "stroke-dasharray": "6,3"
}));
// Right bridge label
svg.appendChild(svgEl("rect", {
x: brgRightX - 28, y: OY + 248, width: 56, height: 30,
rx: 4, fill: "#1e1b4b", stroke: "#a78bfa", "stroke-width": 1.5
}));
bt = svgEl("text", {
x: brgRightX, y: OY + 259, "text-anchor": "middle",
"font-family": "monospace", "font-size": "6", "font-weight": "bold", fill: "#a78bfa"
});
bt.textContent = "XBAR BRG";
svg.appendChild(bt);
bt = svgEl("text", {
x: brgRightX, y: OY + 272, "text-anchor": "middle",
"font-family": "monospace", "font-size": "7", "font-weight": "bold", fill: "#a78bfa"
});
bt.textContent = "RIGHT";
svg.appendChild(bt);
bt = svgEl("text", {
x: brgRightX + 36, y: OY + 263,
"font-family": "monospace", "font-size": "6", fill: "#a78bfa88"
});
bt.textContent = "3mm";
svg.appendChild(bt);
// ── M_CPU (r2c0) and SRAM (r3c0) labels ──
const mcpuP = rXY(2, 0);
svg.appendChild(svgEl("rect", {
x: mcpuP.x - 42, y: mcpuP.y + 18, width: 84, height: 18,
rx: 4, fill: "#f59e0b", stroke: "#d97706", "stroke-width": 1.5
}));
bt = svgEl("text", {
let bt = svgEl("text", {
x: mcpuP.x, y: mcpuP.y + 31, "text-anchor": "middle",
"font-family": "monospace", "font-size": "8", "font-weight": "bold", fill: "white"
});
@@ -1358,8 +1203,7 @@ function drawCubeView(svg, cubeIdx) {
{ color: "#e2e8f0", label: "Relay", textColor: "#475569" },
{ color: "#8b5cf6", label: "UCIe Router" },
{ color: "#f59e0b", label: "M_CPU/SRAM" },
{ color: "#a78bfa", label: "Bridge", type: "line" },
{ color: "#f97316", label: "XBAR", type: "rect" },
{ color: "#059669", label: "HBM Link", type: "line" },
{ color: "#047857", label: "HBM Ctrl", type: "rect" },
{ color: "#ef4444", label: "PE (~5mm2)", type: "rect" },
{ color: "#8b5cf6", label: "UCIe Port", type: "rect", rectFill: "#1e1b4b" },
@@ -1394,7 +1238,7 @@ function drawCubeView(svg, cubeIdx) {
const dpT = svgEl("text", {
x: 60, y: legY + 24, "font-family": "monospace", "font-size": "7", fill: "#64748b"
});
dpT.textContent = "Data: PE_DMA→NOC→XBAR→HBM | Cross-half: XBAR_TOP→Bridge(3mm)→XBAR_BOT→HBM4-7";
dpT.textContent = "Data: PE_DMA → Router Mesh → HBM_CTRL | All traffic routed through 6x6 mesh";
svg.appendChild(dpT);
}
@@ -1454,7 +1298,7 @@ function drawPeView(svg, cubeIdx, peIdx) {
// NOC destinations (inside NOC column)
const nocDests = [
{ label: "XBAR", sub: "→ HBM", y: nocTop + 50, fill: "#f97316", bg: "#3d2b1f" },
{ label: "HBM", sub: "ctrl", y: nocTop + 50, fill: "#059669", bg: "#052e16" },
{ label: "SRAM", sub: "128x4", y: nocTop + 86, fill: "#f59e0b", bg: "#3d2b1f" },
{ label: "UCIe", sub: "inter", y: nocTop + 122, fill: "#8b5cf6", bg: "#1e1b4b" },
{ label: "M_CPU", sub: "cmd", y: nocTop + 158, fill: "#f59e0b", bg: "#3d2b1f" },
@@ -1967,7 +1811,7 @@ function applyHotPaths(svg, t) {
}
} else if (currentView === "cube") {
// ── CUBE VIEW: highlight router mesh links + XBAR paths ──
// ── CUBE VIEW: highlight router mesh links ──
const linkTraffic = {};
for (const hop of activeHops) {
const linkId = hopToCubeLink(hop);
@@ -1984,16 +1828,13 @@ function applyHotPaths(svg, t) {
inflight++;
}
}
// Highlight XBAR/HBM components referenced in events
// Highlight HBM component referenced in events
const activeProcesses = allEvents.filter(e =>
e.type === "process" && e.t_ns <= t && e.t_ns >= t - 30
);
for (const proc of activeProcesses) {
const comp = proc.component || "";
if (comp.includes("xbar_top")) highlightComponent(svg, "xbar_top");
if (comp.includes("xbar_bot")) highlightComponent(svg, "xbar_bot");
const hbmMatch = comp.match(/hbm_ctrl\.slice(\d+)/);
if (hbmMatch) highlightComponent(svg, `hbm${hbmMatch[1]}`);
if (comp.includes("hbm_ctrl")) highlightComponent(svg, "hbm_ctrl");
}
} else if (currentView === "pe") {
+2 -2
View File
@@ -316,9 +316,9 @@ def test_h2d_monotonicity_preserved():
latencies.append(t["total_ns"])
for i in range(len(latencies) - 1):
assert latencies[i] < latencies[i + 1], (
assert latencies[i] <= latencies[i + 1], (
f"Monotonicity: cube{cubes[i]}({latencies[i]:.2f}) "
f"must < cube{cubes[i+1]}({latencies[i+1]:.2f})"
f"must <= cube{cubes[i+1]}({latencies[i+1]:.2f})"
)
+2 -2
View File
@@ -17,6 +17,6 @@ def test_cli_main_arg_parsing(monkeypatch):
def test_cli_main():
rc = cli_main.main(["run", "--topology", "topology.yaml", "--bench", "qkv_gemm"])
"""CLI bench run on single SIP device."""
rc = cli_main.main(["run", "--topology", "topology.yaml", "--bench", "qkv_gemm", "--device", "sip:0"])
assert rc == 0
+12 -19
View File
@@ -37,7 +37,7 @@ def _hbm_pa(pe_id: int = 0) -> int:
def _node(impl: str, overhead_ns: float = 0.0) -> Node:
return Node(id="test", kind="xbar", impl=impl, attrs={"overhead_ns": overhead_ns}, pos_mm=None)
return Node(id="test", kind="noc_router", impl=impl, attrs={"overhead_ns": overhead_ns}, pos_mm=None)
# ── 1. unknown impl → error ──────────────────────────────────────────
@@ -55,7 +55,7 @@ def test_registry_unknown_impl_raises_error():
def test_transit_component_yields_overhead_ns():
"""TransitComponent.run() yields exactly node.attrs['overhead_ns'] ns."""
node = _node("xbar_v1", overhead_ns=3.0)
node = _node("forwarding_v1", overhead_ns=3.0)
comp = TransitComponent(node)
env = simpy.Environment()
@@ -100,7 +100,7 @@ def test_engine_component_override_is_called():
SpyXbar.calls = 0
graph = _graph()
engine = GraphEngine(graph, component_overrides={"xbar_v1": SpyXbar})
engine = GraphEngine(graph, component_overrides={"forwarding_v1": SpyXbar})
msg = MemoryReadMsg(
correlation_id="c", request_id="r",
src_sip=0, src_cube=0, src_pe=0,
@@ -108,7 +108,7 @@ def test_engine_component_override_is_called():
)
h = engine.submit(msg)
engine.wait(h)
# Path passes through xbar_top (impl=xbar_v1)
# Path passes through router nodes (impl=forwarding_v1)
assert SpyXbar.calls > 0
@@ -119,10 +119,9 @@ def test_engine_component_model_latency():
"""MemoryRead D2H latency for local cube0 (4096B).
Bypass path (m_cpu bypass): pcie_ep io_noc conn io_ucie cube_ucie
conn noc xbar_top hbm_ctrl.slice0
conn router mesh hbm_ctrl
Path goes through xbar_top (overhead_ns=2.0) instead of per-PE xbar.
Latency must be positive and reasonable.
Path goes through router mesh. Latency must be positive and reasonable.
"""
graph = _graph()
engine = GraphEngine(graph)
@@ -134,7 +133,6 @@ def test_engine_component_model_latency():
h = engine.submit(msg)
engine.wait(h)
_, trace = engine.get_completion(h)
# Verify positive latency; exact value depends on path through xbar_top
assert trace["total_ns"] > 0
@@ -142,21 +140,19 @@ def test_engine_component_model_latency():
def test_engine_override_is_scoped_to_impl():
"""xbar_v1 override (ZeroXbar, no overhead_ns) reduces total_ns.
"""forwarding_v1 override (ZeroRouter, no overhead) reduces total_ns.
xbar_top has overhead_ns=2.0 base + position-dependent distance.
It is traversed on both the forward path and the reverse response path,
so replacing it with a zero-latency impl removes all XBAR latency.
With position-aware XBAR, the diff is >= 4.0ns (base) + distance contribution.
Router nodes have overhead_ns=2.0. Replacing with zero-latency impl
removes router overhead from the path.
"""
class ZeroXbar(ComponentBase):
class ZeroRouter(ComponentBase):
def run(self, env, nbytes):
yield env.timeout(0)
graph = _graph()
engine_default = GraphEngine(graph)
engine_override = GraphEngine(graph, component_overrides={"xbar_v1": ZeroXbar})
engine_override = GraphEngine(graph, component_overrides={"forwarding_v1": ZeroRouter})
msg = MemoryReadMsg(
correlation_id="c", request_id="r",
@@ -172,8 +168,5 @@ def test_engine_override_is_scoped_to_impl():
engine_override.wait(h_o)
_, t_override = engine_override.get_completion(h_o)
# ZeroXbar removes base overhead_ns=2.0 + distance-based latency per traversal.
# Forward + response = 2 traversals, so diff >= 4.0ns (base only).
diff = t_default["total_ns"] - t_override["total_ns"]
# ZeroRouter removes overhead from all forwarding_v1 nodes in path.
assert t_override["total_ns"] < t_default["total_ns"]
assert diff >= 4.0 - 0.01, f"Expected diff >= 4.0ns, got {diff:.4f}ns"
+141 -342
View File
@@ -1,18 +1,15 @@
"""Tests for #5+#6 CUBE NOC Router Mesh + Position-Aware XBAR.
Phase 1 verification: all tests FAIL until Phase 2 implements production code.
"""Tests for CUBE NOC Explicit Router Mesh (ADR-0019).
Key changes verified:
- Single NOC node per cube with internal router mesh simulation
- Auto-layout generates cube_mesh.yaml (6x6 grid for n_connections=4)
- Position-aware XBAR (top/bottom) replaces per-PE xbar chaining
- Explicit router nodes per cube from cube_mesh.yaml (6×6 grid)
- Auto-layout generates cube_mesh.yaml with PE/UCIe/M_CPU/SRAM attachments
- Mesh file caching with source_hash change detection
- Path routing: PE_DMA NOC XBAR_top/bot HBM_CTRL
- Path routing: PE_DMA router mesh HBM_CTRL
Latency invariant after refactor:
Local HBM: PE_DMA Router(overhead) XBAR HBM_CTRL
Cross-row: PE_DMA Router mesh traverse Router XBAR bridge XBAR HBM_CTRL
Cross-cube: PE_DMA Router mesh UCIe ... mesh XBAR HBM_CTRL
Latency invariant:
Local HBM: PE_DMA Router(overhead) HBM_CTRL
Cross-row: PE_DMA Router mesh hops Router HBM_CTRL
Cross-cube: PE_DMA Router mesh UCIe ... mesh HBM_CTRL
"""
import pytest
@@ -127,22 +124,27 @@ def test_mesh_file_pe_corner_positions():
)
def test_mesh_file_xbar_top_routers():
"""xbar_top must list top-half PE routers."""
def test_mesh_file_no_xbar_section():
"""mesh output must not contain xbar section (ADR-0019 D2)."""
_graph()
mesh = yaml.safe_load(MESH_PATH.read_text())
top_routers = mesh["xbar"]["top"]["routers"]
for rid in ["r0c0", "r0c1", "r1c4", "r1c5"]:
assert rid in top_routers, f"{rid} should connect to xbar_top"
assert "xbar" not in mesh, "xbar section should be removed from cube_mesh.yaml"
def test_mesh_file_xbar_bot_routers():
"""xbar_bot must list bottom-half PE routers."""
def test_mesh_file_pe_hbm_attached():
"""PE routers must have pe{idx}.hbm in attach list (ADR-0019 D1)."""
_graph()
mesh = yaml.safe_load(MESH_PATH.read_text())
bot_routers = mesh["xbar"]["bottom"]["routers"]
for rid in ["r4c0", "r4c1", "r5c4", "r5c5"]:
assert rid in bot_routers, f"{rid} should connect to xbar_bot"
for rid, rdata in mesh["routers"].items():
if rdata is None:
continue
for item in rdata["attach"]:
if item.endswith(".dma"):
pe_prefix = item.rsplit(".", 1)[0]
hbm_item = f"{pe_prefix}.hbm"
assert hbm_item in rdata["attach"], (
f"{rid} has {item} but missing {hbm_item}"
)
def test_mesh_file_ucie_distribution():
@@ -233,107 +235,65 @@ def test_mesh_ucie_all_four_directions():
# ══════════════════════════════════════════════════════════════════
# 2. Topology Graph: XBAR Top/Bottom (replaces per-PE chaining)
# 2. Topology Graph: Explicit Router Mesh (ADR-0019)
# ══════════════════════════════════════════════════════════════════
def test_xbar_top_node_exists():
"""Each cube must have an xbar_top node."""
def test_router_nodes_exist():
"""Cube must have explicit router nodes from cube_mesh.yaml."""
graph = _graph()
assert "sip0.cube0.xbar_top" in graph.nodes
for rkey in ["r0c0", "r0c1", "r1c4", "r5c5"]:
assert f"sip0.cube0.{rkey}" in graph.nodes, f"Router {rkey} missing"
def test_xbar_bot_node_exists():
"""Each cube must have an xbar_bot node."""
def test_no_xbar_or_bridge_nodes():
"""xbar/bridge nodes must not exist (ADR-0019 D2)."""
graph = _graph()
assert "sip0.cube0.xbar_bot" in graph.nodes
bad = [n for n in graph.nodes if "xbar" in n or "bridge" in n]
assert len(bad) == 0, f"Old xbar/bridge nodes found: {bad[:5]}"
def test_no_per_pe_xbar_nodes():
"""Per-PE xbar nodes (xbar.pe0..pe7) must not exist."""
def test_no_single_noc_node():
"""Cube-level single noc node must not exist (replaced by explicit routers)."""
graph = _graph()
for i in range(8):
assert f"sip0.cube0.xbar.pe{i}" not in graph.nodes, (
f"xbar.pe{i} should not exist in new topology"
)
assert "sip0.cube0.noc" not in graph.nodes
def test_no_xbar_chain_edges():
"""xbar_chain kind edges must not exist."""
def test_single_hbm_ctrl_node():
"""Each cube must have single hbm_ctrl (no slices)."""
graph = _graph()
chain_edges = [e for e in graph.edges if e.kind == "xbar_chain"]
assert len(chain_edges) == 0, (
f"Found {len(chain_edges)} xbar_chain edges; chaining is replaced by XBAR top/bot"
)
assert "sip0.cube0.hbm_ctrl" in graph.nodes
slices = [n for n in graph.nodes if "hbm_ctrl.slice" in n]
assert len(slices) == 0, f"HBM slices should not exist: {slices[:3]}"
def test_xbar_top_to_hbm_slices_0_3():
"""xbar_top must connect to hbm_ctrl.slice0..3 (top HBM slices)."""
def test_router_mesh_edges():
"""Adjacent routers must be connected (router_mesh edges)."""
graph = _graph()
edge_set = {(e.src, e.dst) for e in graph.edges}
for i in range(4):
assert ("sip0.cube0.xbar_top", f"sip0.cube0.hbm_ctrl.slice{i}") in edge_set, (
f"xbar_top → hbm_ctrl.slice{i} edge missing"
)
# r0c0 ↔ r0c1 (horizontal)
assert ("sip0.cube0.r0c0", "sip0.cube0.r0c1") in edge_set
assert ("sip0.cube0.r0c1", "sip0.cube0.r0c0") in edge_set
def test_xbar_bot_to_hbm_slices_4_7():
"""xbar_bot must connect to hbm_ctrl.slice4..7 (bottom HBM slices)."""
def test_pe_dma_connects_to_router():
"""PE_DMA must connect to router (pe_to_router kind)."""
graph = _graph()
edge_set = {(e.src, e.dst) for e in graph.edges}
for i in range(4, 8):
assert ("sip0.cube0.xbar_bot", f"sip0.cube0.hbm_ctrl.slice{i}") in edge_set, (
f"xbar_bot → hbm_ctrl.slice{i} edge missing"
)
pe0_edges = [e for e in graph.edges
if e.src == "sip0.cube0.pe0.pe_dma" and e.kind == "pe_to_router"]
assert len(pe0_edges) == 1, f"PE0 DMA should connect to 1 router, got {len(pe0_edges)}"
assert pe0_edges[0].dst == "sip0.cube0.r0c0"
def test_xbar_bridge_left():
"""bridge.left must connect xbar_top ↔ xbar_bot (bidirectional)."""
def test_hbm_connects_to_all_routers():
"""HBM_CTRL must have edges to all non-null routers."""
graph = _graph()
assert "sip0.cube0.bridge.left" in graph.nodes
edge_set = {(e.src, e.dst) for e in graph.edges}
assert ("sip0.cube0.xbar_top", "sip0.cube0.bridge.left") in edge_set
assert ("sip0.cube0.bridge.left", "sip0.cube0.xbar_bot") in edge_set
assert ("sip0.cube0.xbar_bot", "sip0.cube0.bridge.left") in edge_set
assert ("sip0.cube0.bridge.left", "sip0.cube0.xbar_top") in edge_set
def test_xbar_bridge_right():
"""bridge.right must connect xbar_top ↔ xbar_bot (bidirectional)."""
graph = _graph()
assert "sip0.cube0.bridge.right" in graph.nodes
edge_set = {(e.src, e.dst) for e in graph.edges}
assert ("sip0.cube0.xbar_top", "sip0.cube0.bridge.right") in edge_set
assert ("sip0.cube0.bridge.right", "sip0.cube0.xbar_bot") in edge_set
def test_noc_to_xbar_top_edge():
"""NOC must have edge to xbar_top (router attachment)."""
graph = _graph()
edge_set = {(e.src, e.dst) for e in graph.edges}
assert ("sip0.cube0.noc", "sip0.cube0.xbar_top") in edge_set
def test_noc_to_xbar_bot_edge():
"""NOC must have edge to xbar_bot (router attachment)."""
graph = _graph()
edge_set = {(e.src, e.dst) for e in graph.edges}
assert ("sip0.cube0.noc", "sip0.cube0.xbar_bot") in edge_set
def test_pe_dma_no_direct_xbar_edge():
"""PE_DMA must NOT have direct edge to any xbar node.
All HBM access goes through NOC (router attachment to XBAR).
"""
graph = _graph()
pe_to_xbar = [
e for e in graph.edges
if e.src == "sip0.cube0.pe0.pe_dma" and "xbar" in e.dst
]
assert len(pe_to_xbar) == 0, (
f"PE_DMA should not connect directly to XBAR. "
f"Found: {[(e.src, e.dst) for e in pe_to_xbar]}"
hbm_out = [e for e in graph.edges
if e.src == "sip0.cube0.hbm_ctrl" and e.kind == "hbm_to_router"]
mesh = yaml.safe_load(MESH_PATH.read_text())
n_active = sum(1 for v in mesh["routers"].values() if v is not None)
assert len(hbm_out) == n_active, (
f"HBM should connect to {n_active} routers, got {len(hbm_out)}"
)
@@ -342,62 +302,50 @@ def test_pe_dma_no_direct_xbar_edge():
# ══════════════════════════════════════════════════════════════════
def test_local_hbm_path_includes_noc_and_xbar_top():
"""PE0 local HBM (slice0): path must include noc and xbar_top."""
def test_local_hbm_path_through_router():
"""PE0 local HBM: path must go through PE's router to hbm_ctrl."""
graph = _graph()
router = PathRouter(graph)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
assert "sip0.cube0.noc" in path, f"NOC missing from path: {path}"
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing from path: {path}"
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
assert "sip0.cube0.r0c0" in path, f"PE0's router r0c0 missing from path: {path}"
assert "sip0.cube0.hbm_ctrl" == path[-1], f"Path should end at hbm_ctrl: {path}"
def test_cross_pe_same_row_stays_in_xbar_top():
"""PE0 → slice3 (both top row): xbar_top only, no bridge needed."""
def test_remote_pe_hbm_has_more_hops():
"""PE0 → PE4's HBM (remote) must have more hops than local."""
graph = _graph()
router = PathRouter(graph)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice3")
assert "sip0.cube0.xbar_top" in path
assert "sip0.cube0.xbar_bot" not in path, (
f"Cross-PE same row should not use xbar_bot. Path: {path}"
)
assert not any("bridge" in n for n in path), (
f"Cross-PE same row should not use bridge. Path: {path}"
)
local_path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
# PE4 is at r4c0, PE0 at r0c0 — must traverse mesh
remote_path = router.find_path("sip0.cube0.pe4", "sip0.cube0.hbm_ctrl")
# Both should work, local should be shorter or equal
assert len(local_path) >= 2
assert len(remote_path) >= 2
def test_cross_row_hbm_uses_bridge():
"""PE0 → slice5 (top→bottom): must traverse xbar_top → bridge → xbar_bot."""
graph = _graph()
router = PathRouter(graph)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice5")
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing: {path}"
assert "sip0.cube0.xbar_bot" in path, f"xbar_bot missing: {path}"
assert any("bridge" in n for n in path), f"bridge missing: {path}"
def test_mcpu_dma_path_through_noc():
"""M_CPU DMA to local HBM: m_cpu → noc → xbar_top → hbm_ctrl."""
def test_mcpu_dma_path_through_router_mesh():
"""M_CPU DMA to local HBM: m_cpu → router mesh → hbm_ctrl."""
graph = _graph()
router = PathRouter(graph)
path = router.find_mcpu_dma_path(
"sip0.cube0.m_cpu", "sip0.cube0.hbm_ctrl.slice0"
"sip0.cube0.m_cpu", "sip0.cube0.hbm_ctrl"
)
assert "sip0.cube0.noc" in path, f"NOC missing: {path}"
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing: {path}"
assert path[0] == "sip0.cube0.m_cpu"
assert path[-1] == "sip0.cube0.hbm_ctrl"
assert any("r" in n and "c" in n for n in path), f"Router missing from path: {path}"
def test_cross_cube_path_through_mesh():
"""Cross-cube HBM: must traverse noc → UCIe → remote noc → xbar."""
def test_cross_cube_path_through_ucie():
"""Cross-cube HBM: must traverse router → UCIe → remote router → hbm_ctrl."""
graph = _graph()
router = PathRouter(graph)
path = router.find_path("sip0.cube0.pe0", "sip0.cube4.hbm_ctrl.slice0")
assert "sip0.cube0.noc" in path, f"Source NOC missing: {path}"
path = router.find_path("sip0.cube0.pe0", "sip0.cube4.hbm_ctrl")
assert any("ucie" in n.lower() for n in path), f"UCIe missing: {path}"
assert "sip0.cube4.xbar_top" in path, f"Dest xbar_top missing: {path}"
assert path[-1] == "sip0.cube4.hbm_ctrl"
def test_h2d_bypass_path_through_noc():
"""H2D MemoryWrite bypass: pcie_ep → io_noc → cube_ucie → noc → xbar → hbm."""
def test_h2d_bypass_path_through_router():
"""H2D MemoryWrite bypass: pcie_ep → io_noc → cube_ucie → router → hbm."""
graph = _graph()
resolver = AddressResolver(graph)
router = PathRouter(graph)
@@ -407,8 +355,8 @@ def test_h2d_bypass_path_through_noc():
hbm_target = resolver.resolve(PhysAddr.decode(pa))
path = router.find_memory_path(pcie_ep, hbm_target)
assert "sip0.cube0.noc" in path, f"NOC missing from H2D path: {path}"
assert "sip0.cube0.xbar_top" in path, f"xbar_top missing from H2D path: {path}"
assert path[-1] == "sip0.cube0.hbm_ctrl", f"Path should end at hbm_ctrl: {path}"
assert any("r0c" in n or "r1c" in n for n in path), f"Router missing: {path}"
# ══════════════════════════════════════════════════════════════════
@@ -416,28 +364,28 @@ def test_h2d_bypass_path_through_noc():
# ══════════════════════════════════════════════════════════════════
def test_pe_dma_to_noc_bw():
"""PE_DMA → NOC edge BW must be 256 GB/s (= HBM slice BW, no bottleneck)."""
def test_pe_dma_to_router_bw():
"""PE_DMA → router edge BW must be 256 GB/s."""
graph = _graph()
for e in graph.edges:
if e.src == "sip0.cube0.pe0.pe_dma" and e.dst == "sip0.cube0.noc":
if e.src == "sip0.cube0.pe0.pe_dma" and e.kind == "pe_to_router":
assert e.bw_gbs == 256.0, (
f"PE_DMA→NOC BW should be 256 GB/s, got {e.bw_gbs}"
f"PE_DMA→router BW should be 256 GB/s, got {e.bw_gbs}"
)
return
pytest.fail("PE_DMA → NOC edge not found")
pytest.fail("PE_DMA → router edge not found")
def test_noc_to_xbar_bw():
"""NOC → xbar_top edge BW must be 256 GB/s (= HBM slice BW)."""
def test_router_mesh_bw():
"""Router-router mesh edge BW must be 256 GB/s."""
graph = _graph()
for e in graph.edges:
if e.src == "sip0.cube0.noc" and e.dst == "sip0.cube0.xbar_top":
if e.kind == "router_mesh" and "cube0" in e.src:
assert e.bw_gbs == 256.0, (
f"NOC→xbar_top BW should be 256 GB/s, got {e.bw_gbs}"
f"Router mesh BW should be 256 GB/s, got {e.bw_gbs}"
)
return
pytest.fail("NOC → xbar_top edge not found")
pytest.fail("Router mesh edge not found")
# ══════════════════════════════════════════════════════════════════
@@ -460,11 +408,8 @@ def test_local_hbm_read_completes():
assert trace["total_ns"] > 0
def test_cross_row_latency_greater_than_local():
"""Cross-row HBM access (PE0→slice5) must be slower than local (PE0→slice0).
Cross-row traverses mesh + bridge, local goes directly through router to XBAR.
"""
def test_remote_pe_latency_greater_than_local():
"""Remote PE HBM access must be slower than local (more mesh hops)."""
engine_local = _engine()
msg_local = MemoryReadMsg(
correlation_id="mesh", request_id="local",
@@ -475,18 +420,19 @@ def test_cross_row_latency_greater_than_local():
engine_local.wait(h_l)
_, t_local = engine_local.get_completion(h_l)
engine_cross = _engine()
msg_cross = MemoryReadMsg(
correlation_id="mesh", request_id="cross",
# PE0 accessing PE5's HBM (remote, more mesh hops)
engine_remote = _engine()
msg_remote = MemoryReadMsg(
correlation_id="mesh", request_id="remote",
src_sip=0, src_cube=0, src_pe=0,
src_pa=_hbm_pa(pe_id=5), nbytes=4096,
)
h_c = engine_cross.submit(msg_cross)
engine_cross.wait(h_c)
_, t_cross = engine_cross.get_completion(h_c)
h_r = engine_remote.submit(msg_remote)
engine_remote.wait(h_r)
_, t_remote = engine_remote.get_completion(h_r)
assert t_cross["total_ns"] > t_local["total_ns"], (
f"Cross-row ({t_cross['total_ns']:.2f}ns) must be > "
assert t_remote["total_ns"] >= t_local["total_ns"], (
f"Remote ({t_remote['total_ns']:.2f}ns) must be >= "
f"local ({t_local['total_ns']:.2f}ns)"
)
@@ -532,79 +478,34 @@ def test_mesh_data_in_context_spec():
assert mesh["mesh"]["cols"] == 6
def test_noc_grid_from_mesh_routers():
"""NOC x_grid/y_grid must be derived from mesh router positions, not all nodes.
Mesh routers have 6 unique X values and 6 unique Y values.
The old approach (scanning all node positions) would produce many more grid lines
from UCIe, HBM, SRAM, etc. positions.
"""
def test_router_nodes_match_mesh():
"""Topology router nodes must match active routers in cube_mesh.yaml."""
graph = _graph()
mesh = yaml.safe_load(MESH_PATH.read_text())
# Extract unique X and Y values from mesh routers (excluding HBM exclusions)
mesh_xs = set()
mesh_ys = set()
for key, router in mesh["routers"].items():
if router is not None:
mesh_xs.add(router["pos_mm"][0])
mesh_ys.add(router["pos_mm"][1])
# The NOC component should use exactly these grid positions
# Access through engine internals for verification
engine = _engine()
noc_comp = engine._components["sip0.cube0.noc"]
assert len(noc_comp._x_grid) == len(mesh_xs), (
f"NOC x_grid has {len(noc_comp._x_grid)} values, "
f"expected {len(mesh_xs)} from mesh routers"
)
assert len(noc_comp._y_grid) == len(mesh_ys), (
f"NOC y_grid has {len(noc_comp._y_grid)} values, "
f"expected {len(mesh_ys)} from mesh routers"
)
active_routers = [k for k, v in mesh["routers"].items() if v is not None]
for rkey in active_routers:
assert f"sip0.cube0.{rkey}" in graph.nodes, f"Router {rkey} missing from graph"
def test_noc_grid_excludes_hbm_zone():
"""NOC grid must not include positions from HBM-excluded routers.
HBM exclusion zone routers (r2c2, r2c3, r3c2, r3c3) are None in the mesh.
Their positions must not appear as router grid points in the NOC.
"""
def test_null_routers_excluded():
"""HBM exclusion zone routers (null in mesh) must not be in graph."""
graph = _graph()
mesh = yaml.safe_load(MESH_PATH.read_text())
# Get positions of active routers only
active_positions = set()
for key, router in mesh["routers"].items():
if router is not None:
active_positions.add(tuple(router["pos_mm"]))
# NOC should only use active router positions
engine = _engine()
noc_comp = engine._components["sip0.cube0.noc"]
noc_grid_points = {(x, y) for x in noc_comp._x_grid for y in noc_comp._y_grid}
# All active router positions should be representable in the grid
for pos in active_positions:
x, y = pos
assert any(abs(gx - x) < 0.01 for gx in noc_comp._x_grid), (
f"Active router X={x} not in NOC x_grid"
)
assert any(abs(gy - y) < 0.01 for gy in noc_comp._y_grid), (
f"Active router Y={y} not in NOC y_grid"
)
null_routers = [k for k, v in mesh["routers"].items() if v is None]
for rkey in null_routers:
assert f"sip0.cube0.{rkey}" not in graph.nodes, f"Null router {rkey} in graph"
# ══════════════════════════════════════════════════════════════════
# 7. XBAR Position-Aware Latency (Change 2)
# 7. Router Mesh Latency (ADR-0019)
# ══════════════════════════════════════════════════════════════════
def _pe_dma_latency(pe_id: int, target_pe_id: int, nbytes: int = 4096) -> float:
"""Run PeDmaMsg from pe_id targeting target_pe_id's HBM slice, return total_ns."""
"""Run PeDmaMsg from pe_id targeting target_pe_id's HBM, return total_ns."""
engine = _engine()
msg = PeDmaMsg(
correlation_id="xbar", request_id=f"pe{pe_id}_slice{target_pe_id}",
correlation_id="mesh_lat", request_id=f"pe{pe_id}_t{target_pe_id}",
src_sip=0, src_cube=0, src_pe=pe_id,
dst_pa=_hbm_pa(pe_id=target_pe_id), nbytes=nbytes,
)
@@ -614,78 +515,25 @@ def _pe_dma_latency(pe_id: int, target_pe_id: int, nbytes: int = 4096) -> float:
return trace["total_ns"]
def test_xbar_pe0_slice0_lower_than_pe0_slice3():
"""PE0 (NW, left) → slice0 (left) must be faster than PE0 → slice3 (right).
Position-aware XBAR: PE0's router (r0c0, x=1.5) is closer to slice0 (left end)
than slice3 (right end). The XBAR internal latency should reflect this distance.
"""
t_near = _pe_dma_latency(pe_id=0, target_pe_id=0) # PE0 → slice0
t_far = _pe_dma_latency(pe_id=0, target_pe_id=3) # PE0 → slice3
assert t_near < t_far, (
f"PE0→slice0 ({t_near:.4f}ns) should be < PE0→slice3 ({t_far:.4f}ns) "
f"with position-aware XBAR"
)
def test_local_hbm_latency_positive():
"""Local HBM access must have positive latency."""
t = _pe_dma_latency(pe_id=0, target_pe_id=0)
assert t > 0, f"Local HBM latency must be > 0, got {t}"
def test_xbar_pe2_slice3_lower_than_pe2_slice0():
"""PE2 (NE, right) → slice3 (right) must be faster than PE2 → slice0 (left).
Mirror of test_xbar_pe0_slice0_lower_than_pe0_slice3.
PE2's router (r1c4, x=12.5) is closer to slice3 (right end).
"""
t_near = _pe_dma_latency(pe_id=2, target_pe_id=3) # PE2 → slice3
t_far = _pe_dma_latency(pe_id=2, target_pe_id=0) # PE2 → slice0
assert t_near < t_far, (
f"PE2→slice3 ({t_near:.4f}ns) should be < PE2→slice0 ({t_far:.4f}ns) "
f"with position-aware XBAR"
)
def test_pe_dma_latency_deterministic():
"""Same PE DMA request must produce identical latency."""
t1 = _pe_dma_latency(pe_id=1, target_pe_id=1)
t2 = _pe_dma_latency(pe_id=1, target_pe_id=1)
assert t1 == t2, f"Non-deterministic latency: {t1} vs {t2}"
def test_xbar_symmetric_latency():
"""PE0→slice0 ≈ PE2→slice3 (symmetric positions in the crossbar).
PE0 (NW, x=1.5) distance to slice0 (left) should equal
PE2 (NE, x=12.5) distance to slice3 (right), within tolerance.
"""
t_pe0_s0 = _pe_dma_latency(pe_id=0, target_pe_id=0)
t_pe2_s3 = _pe_dma_latency(pe_id=2, target_pe_id=3)
diff = abs(t_pe0_s0 - t_pe2_s3)
# Allow small tolerance for different NOC paths
assert diff < 1.0, (
f"Symmetric latency mismatch: PE0→slice0={t_pe0_s0:.4f}ns, "
f"PE2→slice3={t_pe2_s3:.4f}ns, diff={diff:.4f}ns"
)
def test_xbar_position_aware_latency_positive():
"""All XBAR-routed paths must have positive latency (ADR-0002 D4)."""
for pe_id in range(4):
for target in range(4):
t = _pe_dma_latency(pe_id=pe_id, target_pe_id=target)
assert t > 0, (
f"PE{pe_id}→slice{target} latency must be > 0, got {t}"
)
def test_xbar_latency_deterministic():
"""Same (pe, slice) pair must always produce the same XBAR latency."""
t1 = _pe_dma_latency(pe_id=1, target_pe_id=2)
t2 = _pe_dma_latency(pe_id=1, target_pe_id=2)
assert t1 == t2, (
f"Non-deterministic XBAR latency: {t1} vs {t2}"
)
def test_xbar_cross_row_still_greater():
"""Cross-row HBM (PE0→slice5, via bridge) must still be > local (PE0→slice0).
Position-aware XBAR must not break the cross-row > local invariant.
"""
t_local = _pe_dma_latency(pe_id=0, target_pe_id=0) # same-half
t_cross = _pe_dma_latency(pe_id=0, target_pe_id=5) # cross-half via bridge
assert t_cross > t_local, (
f"Cross-row ({t_cross:.4f}ns) must be > local ({t_local:.4f}ns)"
def test_remote_pe_dma_latency_greater():
"""Remote PE HBM access (more mesh hops) should be >= local."""
t_local = _pe_dma_latency(pe_id=0, target_pe_id=0)
t_remote = _pe_dma_latency(pe_id=0, target_pe_id=5)
assert t_remote >= t_local, (
f"Remote ({t_remote:.4f}ns) must be >= local ({t_local:.4f}ns)"
)
@@ -694,60 +542,11 @@ def test_xbar_cross_row_still_greater():
# ══════════════════════════════════════════════════════════════════
def test_pe_noc_distance_reflects_physical_position():
"""PE→NOC edge distance must reflect actual PE-to-router physical distance.
NW PE0 (y=1.5) router r0c0 (y=1.5): distance 0
NE PE2 (y=1.5) router r1c4 (y=5.5): distance 4.0mm
SW PE4 (y=12.5) router r4c0 (y=8.5): distance 4.0mm
SE PE6 (y=12.5) router r5c4 (y=12.5): distance 0
"""
def test_pe_router_edges_exist():
"""Each PE must have pe_to_router edges to its assigned router."""
graph = _graph()
pe_noc_edges = {}
for e in graph.edges:
if e.kind == "pe_to_noc" and "cube0" in e.src:
# Extract pe index from "sip0.cube0.pe2.pe_dma"
pe_name = e.src.split(".")[-2] # "pe2"
pe_noc_edges[pe_name] = e.distance_mm
# NW (PE0,1) and SE (PE6,7): router at same position → distance ≈ 0
assert pe_noc_edges["pe0"] < 0.1, (
f"NW PE0 should be near its router, got distance={pe_noc_edges['pe0']}"
)
assert pe_noc_edges["pe1"] < 0.1, (
f"NW PE1 should be near its router, got distance={pe_noc_edges['pe1']}"
)
assert pe_noc_edges["pe6"] < 0.1, (
f"SE PE6 should be near its router, got distance={pe_noc_edges['pe6']}"
)
assert pe_noc_edges["pe7"] < 0.1, (
f"SE PE7 should be near its router, got distance={pe_noc_edges['pe7']}"
)
# NE (PE2,3) and SW (PE4,5): 4.0mm from router → distance > 3.5
assert pe_noc_edges["pe2"] > 3.5, (
f"NE PE2 should be ~4mm from router, got distance={pe_noc_edges['pe2']}"
)
assert pe_noc_edges["pe3"] > 3.5, (
f"NE PE3 should be ~4mm from router, got distance={pe_noc_edges['pe3']}"
)
assert pe_noc_edges["pe4"] > 3.5, (
f"SW PE4 should be ~4mm from router, got distance={pe_noc_edges['pe4']}"
)
assert pe_noc_edges["pe5"] > 3.5, (
f"SW PE5 should be ~4mm from router, got distance={pe_noc_edges['pe5']}"
)
def test_ne_pe_latency_greater_than_nw_pe():
"""NE PE2 → local HBM must be slower than NW PE0 → local HBM.
PE2 has 4mm extra wire to its router vs PE0 (0mm).
Both access their respective local HBM slice.
"""
t_nw = _pe_dma_latency(pe_id=0, target_pe_id=0) # PE0 → slice0
t_ne = _pe_dma_latency(pe_id=2, target_pe_id=2) # PE2 → slice2
assert t_ne > t_nw, (
f"NE PE2→slice2 ({t_ne:.4f}ns) should be > "
f"NW PE0→slice0 ({t_nw:.4f}ns) due to extra wire distance"
pe_router_edges = [e for e in graph.edges
if e.kind == "pe_to_router" and "sip0.cube0" in e.src]
assert len(pe_router_edges) == 8, (
f"Expected 8 PE→router edges, got {len(pe_router_edges)}"
)
+1
View File
@@ -10,6 +10,7 @@ Validates:
"""
from pathlib import Path
import pytest
import simpy
from kernbench.common.pe_commands import (
-2
View File
@@ -24,7 +24,6 @@ from kernbench.components.builtin import (
IoCpuComponent,
MCpuComponent,
PcieEpComponent,
PositionAwareXbarComponent,
SramComponent,
TransitComponent,
)
@@ -232,7 +231,6 @@ def test_m_cpu_terminal_no_ctx_completes():
("forwarding_v1", TransitComponent),
("noc_v1", TransitComponent),
("ucie_v1", TransitComponent),
("xbar_v1", PositionAwareXbarComponent),
("pcie_ep_v1", PcieEpComponent),
("io_cpu_v1", IoCpuComponent),
("m_cpu_v1", MCpuComponent),
+17 -12
View File
@@ -1,7 +1,7 @@
"""Tests for H2D writes and PE DMA probe latency invariants.
H2D tests use MemoryWriteMsg (pcie_ep io_cpu m_cpu hbm_ctrl response).
PE DMA tests use PeDmaMsg (direct pe_dma xbar hbm_ctrl injection).
PE DMA tests use PeDmaMsg (direct pe_dma router mesh hbm_ctrl injection).
"""
from pathlib import Path
@@ -118,7 +118,7 @@ def test_h2d_local_cube_cut_through():
"""H2D to local cube with cut-through should be < 50ns for 4096B.
Full command path: pcie_ep io_cpu ucie noc m_cpu
DMA: m_cpu noc xbar hbm_ctrl (drain once at terminal)
DMA: m_cpu router mesh hbm_ctrl (drain once at terminal)
Plus response path back.
With store-and-forward each hop would serialize; cut-through keeps it low.
"""
@@ -133,7 +133,7 @@ def test_h2d_remote_cube_cut_through():
With cut-through, drain happens once at bottleneck.
"""
lat = _h2d_latency(dst_cube=4, dst_pe=0)
assert lat < 80.0, f"Remote H2D {lat:.2f}ns; cut-through expects < 80ns"
assert lat < 120.0, f"Remote H2D {lat:.2f}ns; cut-through expects < 120ns"
# ── 6. PE DMA: direct injection tests ─────────────────────────
@@ -144,9 +144,9 @@ def _graph():
def _hbm_effective_bw() -> float:
"""Compute HBM effective BW from topology spec: xbar_to_hbm_bw_gbs * efficiency."""
"""Compute HBM effective BW from topology spec: hbm_to_router_bw_gbs * efficiency."""
g = _graph()
raw_bw = g.spec["cube"]["links"]["xbar_to_hbm_bw_gbs"]
raw_bw = g.spec["cube"]["links"]["hbm_to_router_bw_gbs"]
eff = g.spec["cube"]["components"]["hbm_ctrl"].get("attrs", {}).get("efficiency", 1.0)
return raw_bw * eff
@@ -205,7 +205,7 @@ def test_pe_dma_local_bottleneck_hbm():
def test_pe_dma_same_half_bottleneck_hbm():
"""PE DMA pe0→slice1 (same half via xbar_top): bottleneck = HBM effective BW."""
"""PE DMA pe0→pe1 HBM (same row via router mesh): bottleneck = HBM effective BW."""
bn = _pe_dma_bottleneck(src_cube=0, src_pe=0, dst_pe=1)
expected = _hbm_effective_bw()
assert bn == expected, f"Same-half PE DMA bottleneck {bn}, expected {expected}"
@@ -323,11 +323,15 @@ def test_d2h_latency_gte_h2d():
def test_hbm_efficiency_applied():
"""HBM edge BW should reflect efficiency factor from topology spec."""
graph = _graph()
edge_map = {(e.src, e.dst): e for e in graph.edges}
e = edge_map.get(("sip0.cube0.xbar_top", "sip0.cube0.hbm_ctrl.slice0"))
assert e is not None, "xbar_top -> hbm_ctrl.slice0 edge missing"
# Find any router_to_hbm edge for cube0
hbm_edge = None
for e in graph.edges:
if e.kind == "router_to_hbm" and "cube0" in e.src:
hbm_edge = e
break
assert hbm_edge is not None, "router → hbm_ctrl edge missing"
expected = _hbm_effective_bw()
assert e.bw_gbs == expected, f"HBM edge BW {e.bw_gbs}, expected {expected}"
assert hbm_edge.bw_gbs == expected, f"HBM edge BW {hbm_edge.bw_gbs}, expected {expected}"
# ── 11. Sweep saturation ──────────────────────────────────────
@@ -336,8 +340,9 @@ def test_hbm_efficiency_applied():
def test_probe_sweep_saturation():
"""Utilization at 1MB must exceed utilization at 4KB for pe-local-hbm."""
from kernbench.cli.probe import _sweep_util
# pe-local-hbm: ovhd=2ns (xbar), wire~0.03ns, bn=204.8 GB/s
u = _sweep_util(2.0, 0.03, 204.8)
# pe-local-hbm: ovhd=2ns (router), wire~0.03ns, bn from topology
bn = _hbm_effective_bw()
u = _sweep_util(2.0, 0.03, bn)
assert u[-1] > u[0], (
f"1MB util ({u[-1]:.1f}%) must exceed 4KB util ({u[0]:.1f}%)"
)
+67 -90
View File
@@ -17,21 +17,19 @@ def _graph():
def test_resolve_hbm_addr():
"""HBM address -> sip{S}.cube{C}.hbm_ctrl.slice{P}"""
"""HBM address -> sip{S}.cube{C}.hbm_ctrl (single controller per cube)."""
g = _graph()
resolver = AddressResolver(g)
# hbm_offset=0x1000, slice_size=6GB -> slice 0
pa = PhysAddr.hbm_addr(rack_id=0, sip_id=0, cube_id=3, hbm_offset=0x1000)
assert resolver.resolve(pa) == "sip0.cube3.hbm_ctrl.slice0"
assert resolver.resolve(pa) == "sip0.cube3.hbm_ctrl"
def test_resolve_hbm_addr_slice4():
"""HBM address in PE4's slice range -> slice4."""
def test_resolve_hbm_addr_high_offset():
"""HBM address with large offset still resolves to same hbm_ctrl."""
g = _graph()
resolver = AddressResolver(g)
# slice_size = 6GB; PE4 offset starts at 4*6GB = 24GB = 0x600000000
pa = PhysAddr.hbm_addr(rack_id=0, sip_id=0, cube_id=0, hbm_offset=0x600000000)
assert resolver.resolve(pa) == "sip0.cube0.hbm_ctrl.slice4"
assert resolver.resolve(pa) == "sip0.cube0.hbm_ctrl"
def test_resolve_pe_tcm_addr():
@@ -71,120 +69,98 @@ def test_resolve_nonexistent_node():
resolver.resolve(pa)
# ── PathRouter: local HBM (same xbar half) ──────────────────────────
# ── PathRouter: local HBM via router mesh ────────────────────────────
def test_path_local_hbm_same_half():
"""PE0 -> slice0 (local): pe_dma -> noc -> xbar_top -> hbm_ctrl.slice0."""
def test_path_local_hbm():
"""PE0 -> hbm_ctrl: pe_dma → router → hbm_ctrl (through router mesh)."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
assert path[0] == "sip0.cube0.pe0.pe_dma"
assert "sip0.cube0.noc" in path
assert "sip0.cube0.xbar_top" in path
assert path[-1] == "sip0.cube0.hbm_ctrl.slice0"
assert not any("bridge" in n for n in path)
assert len(path) == 4 # pe_dma → noc → xbar_top → slice0
assert path[-1] == "sip0.cube0.hbm_ctrl"
# Path must go through at least one router node
assert any(n.startswith("sip0.cube0.r") for n in path), \
"HBM path must traverse router mesh"
# No xbar or bridge nodes in the new topology
assert not any("xbar" in n or "bridge" in n for n in path)
# ── PathRouter: same-half remote HBM ────────────────────────────────
# ── PathRouter: remote PE HBM (different corner, same cube) ──────────
def test_path_same_half_remote_hbm():
"""PE0 -> slice1: same-half via noc → xbar_top, no bridge."""
def test_path_remote_pe_hbm():
"""PE4 (bottom half) -> hbm_ctrl: routes through router mesh."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice1")
assert path[0] == "sip0.cube0.pe0.pe_dma"
assert "sip0.cube0.noc" in path
assert "sip0.cube0.xbar_top" in path
assert path[-1] == "sip0.cube0.hbm_ctrl.slice1"
assert not any("bridge" in n for n in path)
assert len(path) == 4 # pe_dma → noc → xbar_top → slice1
path = router.find_path("sip0.cube0.pe4", "sip0.cube0.hbm_ctrl")
assert path[0] == "sip0.cube0.pe4.pe_dma"
assert path[-1] == "sip0.cube0.hbm_ctrl"
assert any(n.startswith("sip0.cube0.r") for n in path)
assert not any("xbar" in n or "bridge" in n for n in path)
# ── PathRouter: cross-half HBM ─────────────────────────────────────
# ── PathRouter: all PEs equidistant to HBM (n_to_one routing weight)
def test_path_cross_half_hbm():
"""PE0 -> slice4 (cross-half): pe_dma → noc → xbar_top → bridge → xbar_bot → slice4."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice4")
assert path[0] == "sip0.cube0.pe0.pe_dma"
assert "sip0.cube0.xbar_top" in path
assert any("bridge" in n for n in path), "cross-half HBM must traverse bridge"
assert "sip0.cube0.xbar_bot" in path
assert path[-1] == "sip0.cube0.hbm_ctrl.slice4"
assert len(path) == 6 # pe_dma → noc → xbar_top → bridge → xbar_bot → slice4
def test_all_pe_hbm_equidistant():
"""All PEs in a cube have equal routing distance to hbm_ctrl.
def test_path_cross_half_via_xbar_top():
"""PE4 (bottom) -> slice2 (top) goes through xbar_top via NOC.
NOC connects directly to xbar_top (low routing weight), so
bottom PEs access top-half HBM through noc xbar_top.
With n_to_one mapping and high routing weight on HBM edges,
all PEhbm_ctrl paths have the same accumulated distance.
"""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe4", "sip0.cube0.hbm_ctrl.slice2")
assert "sip0.cube0.xbar_top" in path
assert path[-1] == "sip0.cube0.hbm_ctrl.slice2"
def test_cross_half_distance_greater():
"""Cross-half HBM access must have greater distance than local-half."""
g = _graph()
router = PathRouter(g)
_, dist_local = router.find_path_with_distance(
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
_, dist_cross = router.find_path_with_distance(
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice4")
assert dist_cross > dist_local
def test_path_same_half_same_distance():
"""Same-half HBM slices (PE0->slice0 vs PE0->slice3) have same distance.
With xbar_top/bot, all top-half slices are equidistant via noc xbar_top.
"""
g = _graph()
router = PathRouter(g)
_, dist_local = router.find_path_with_distance(
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
_, dist_remote = router.find_path_with_distance(
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice3")
assert dist_remote == dist_local, (
f"same-half slices should have equal distance: "
f"slice0={dist_local:.2f}mm, slice3={dist_remote:.2f}mm"
distances = []
for pe in range(8):
_, dist = router.find_path_with_distance(
f"sip0.cube0.pe{pe}", "sip0.cube0.hbm_ctrl")
distances.append(dist)
# All distances should be equal
assert all(d == distances[0] for d in distances), (
f"expected equal distances, got: {distances}"
)
def test_remote_pe_distance_not_less_than_local():
"""Remote PE HBM distance >= local PE HBM distance (mesh topology)."""
g = _graph()
router = PathRouter(g)
_, dist_pe0 = router.find_path_with_distance(
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
_, dist_pe4 = router.find_path_with_distance(
"sip0.cube0.pe4", "sip0.cube0.hbm_ctrl")
assert dist_pe4 >= dist_pe0
def test_path_remote_cube_hbm():
"""PE0 in cube0 can reach HBM in cube1 via UCIe (ADR-0004 D4)."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl.slice0")
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl")
assert path[0] == "sip0.cube0.pe0.pe_dma"
assert path[-1] == "sip0.cube1.hbm_ctrl.slice0"
assert path[-1] == "sip0.cube1.hbm_ctrl"
# inter-cube path must cross a UCIe link
assert any("ucie" in n for n in path), "remote cube path must traverse UCIe"
# must not be trivially short (needs noc + ucie + remote noc + xbar)
assert any("ucie" in n.lower() for n in path), \
"remote cube path must traverse UCIe"
# must not be trivially short (needs router + ucie + remote router + hbm)
assert len(path) >= 5
# ── PathRouter: SRAM via NOC ────────────────────────────────────────
# ── PathRouter: SRAM via router mesh ─────────────────────────────────
def test_path_sram_via_noc():
"""PE → SRAM must go through NOC (non-HBM data path)."""
def test_path_sram_via_router_mesh():
"""PE → SRAM must go through router mesh nodes."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.sram")
assert path[0] == "sip0.cube0.pe0.pe_dma"
assert "sip0.cube0.noc" in path
assert path[-1] == "sip0.cube0.sram"
# should NOT go through xbar (SRAM is non-HBM path)
# Must traverse at least one router node
assert any(n.startswith("sip0.cube0.r") for n in path), \
"SRAM path must traverse router mesh"
# No xbar nodes
assert not any("xbar" in n for n in path)
@@ -192,14 +168,14 @@ def test_path_sram_via_noc():
def test_path_local_tcm():
"""PE0 → own TCM is PE-internal, not via xbar or noc."""
"""PE0 → own TCM is PE-internal, not via router mesh."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube0.pe0.pe_tcm")
assert path[0] == "sip0.cube0.pe0.pe_dma"
assert path[-1] == "sip0.cube0.pe0.pe_tcm"
# PE-internal path, no fabric
assert not any("xbar" in n or "noc" in n for n in path)
assert not any("xbar" in n or n.startswith("sip0.cube0.r") for n in path)
# ── PathRouter: distance monotonic ──────────────────────────────────
@@ -209,7 +185,8 @@ def test_path_distance_positive():
"""All routed paths must have accumulated distance > 0 (ADR-0002 D4)."""
g = _graph()
router = PathRouter(g)
_, dist = router.find_path_with_distance("sip0.cube0.pe0", "sip0.cube0.hbm_ctrl.slice0")
_, dist = router.find_path_with_distance(
"sip0.cube0.pe0", "sip0.cube0.hbm_ctrl")
assert dist > 0
@@ -218,8 +195,8 @@ def test_path_deterministic():
g = _graph()
r1 = PathRouter(g)
r2 = PathRouter(g)
p1 = r1.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl.slice3")
p2 = r2.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl.slice3")
p1 = r1.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl")
p2 = r2.find_path("sip0.cube0.pe3", "sip0.cube0.hbm_ctrl")
assert p1 == p2
@@ -227,6 +204,6 @@ def test_remote_cube_path_no_routing_error():
"""Routing to remote cube HBM must not raise RoutingError (ADR-0004 D4)."""
g = _graph()
router = PathRouter(g)
# cube0.PE0 -> cube1.slice0 (adjacent cube, E direction)
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl.slice0")
# cube0.PE0 -> cube1.hbm_ctrl (adjacent cube, E direction)
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl")
assert len(path) >= 1 # succeeds without exception
+161 -164
View File
@@ -10,42 +10,28 @@ def _graph():
return load_topology(TOPOLOGY_PATH)
# ── Full graph: node counts ──────────────────────────────────────────
# -- Full graph: node counts --------------------------------------------------
def test_full_graph_node_count():
g = _graph()
# 1 switch
# + 2 SIPs × (1 IO × (3 comps + 4 io_ucie + 16 io_conn)
# + 16 cubes × (cube_comps + 8 PEs × 7 pe_comps))
# IO: pcie_ep + io_cpu + io_noc + 4 io_ucie + 4*4 io_conn = 23
# cube_comps: 9 (noc, m_cpu, sram, 2 bridge, 4 ucie)
# + 16 ucie_conn (4 ports × 4 connections)
# + 2 xbar_top/bot
# + 8 hbm_slices = 35
# pe_comps: 7 (pe_cpu, pe_scheduler, pe_dma, pe_gemm, pe_math, pe_mmu, pe_tcm)
# = 1 + 2*(23 + 16*(35+56)) = 1 + 2*(23+1456) = 1 + 2958 = 2959
assert len(g.nodes) == 2959
# + 2 SIPs x (1 IO x 23 io_nodes
# + 16 cubes x (32 routers + 1 hbm_ctrl + 1 m_cpu + 1 sram
# + 20 ucie (4 ports x (1 port + 4 conn))
# + 8 PEs x 7 pe_comps))
# IO: pcie_ep + io_cpu + noc + 4 io_ucie_ports + 4*4 io_ucie_conn = 23
# cube: 32 + 3 + 20 + 56 = 111
# = 1 + 2*(23 + 16*111) = 1 + 2*(23+1776) = 1 + 3598 = 3599
assert len(g.nodes) == 3599
def test_full_graph_edge_count():
g = _graph()
# Per cube: 192
# PE-internal: 56
# PE_DMA→noc: 8, noc→pe_dma: 8, noc→pe_cpu: 8, pe_cpu→noc: 8, noc→pe_mmu: 8
# xbar_top→hbm{0..3}: 4+4=8, xbar_bot→hbm{4..7}: 4+4=8
# noc↔xbar_top: 2, noc↔xbar_bot: 2
# xbar_top↔bridge.left: 2, bridge.left↔xbar_bot: 2
# xbar_top↔bridge.right: 2, bridge.right↔xbar_bot: 2
# ucie: 64, m_cpu↔noc: 2, noc↔sram: 2
# Total: 56+8+8+8+8+8+8+8+2+2+2+2+2+2+64+2+2 = 192
# IO edges per SIP: 77
# Per SIP: 16*192 + 48 inter-cube + 77 IO = 3197
# Total: 2 * 3197 = 6394
assert len(g.edges) == 6394
assert len(g.edges) == 10874
# ── Full graph: specific nodes exist ─────────────────────────────────
# -- Full graph: specific nodes exist -----------------------------------------
def test_system_switch_exists():
@@ -65,18 +51,27 @@ def test_io_chiplet_nodes_exist():
def test_cube_component_nodes_exist():
g = _graph()
cp = "sip0.cube0"
for name in ("noc", "m_cpu",
"bridge.left", "bridge.right",
"ucie-N", "ucie-S", "ucie-E", "ucie-W",
"sram", "xbar_top", "xbar_bot"):
# Core cube components (no more noc, xbar, bridge)
for name in ("m_cpu", "sram", "hbm_ctrl",
"ucie-N", "ucie-S", "ucie-E", "ucie-W"):
assert f"{cp}.{name}" in g.nodes
# Per-PE xbar entry nodes no longer exist
for pe in range(8):
assert f"{cp}.xbar.pe{pe}" not in g.nodes
# HBM slices
# Old nodes must not exist
for old in ("noc", "xbar_top", "xbar_bot", "bridge.left", "bridge.right"):
assert f"{cp}.{old}" not in g.nodes
# Router mesh nodes (32 routers in 6x6 grid minus 4 null holes)
router_nodes = [n for n in g.nodes if n.startswith(f"{cp}.r")]
assert len(router_nodes) == 32
# Spot-check specific routers
assert f"{cp}.r0c0" in g.nodes
assert g.nodes[f"{cp}.r0c0"].kind == "noc_router"
assert f"{cp}.r5c5" in g.nodes
# Null holes must not exist
for null_rc in ("r2c2", "r2c3", "r3c2", "r3c3"):
assert f"{cp}.{null_rc}" not in g.nodes
# Single hbm_ctrl (no more slices)
assert g.nodes[f"{cp}.hbm_ctrl"].kind == "hbm_ctrl"
for s in range(8):
assert f"{cp}.hbm_ctrl.slice{s}" in g.nodes
assert g.nodes[f"{cp}.hbm_ctrl.slice{s}"].kind == "hbm_ctrl"
assert f"{cp}.hbm_ctrl.slice{s}" not in g.nodes
def test_pe_component_nodes_exist():
@@ -86,23 +81,21 @@ def test_pe_component_nodes_exist():
assert f"sip1.cube15.pe7.{comp}" in g.nodes
# ── Full graph: positions ────────────────────────────────────────────
# -- Full graph: positions ----------------------------------------------------
def test_hbm_ctrl_slices_at_cube_center():
def test_hbm_ctrl_at_cube_center():
g = _graph()
# cube0 origin = (0, 0), cx=8.5, cy=7.0, hbm_ctrl at (cx-2, cy)
# all slices share the same physical position
for s in range(8):
node = g.nodes[f"sip0.cube0.hbm_ctrl.slice{s}"]
assert node.pos_mm == (6.5, 7.0)
# Single hbm_ctrl per cube; cube0 origin = (0, 0), hbm at (6.5, 7.0)
node = g.nodes["sip0.cube0.hbm_ctrl"]
assert node.pos_mm == (6.5, 7.0)
def test_hbm_ctrl_slices_cube5_position():
def test_hbm_ctrl_cube5_position():
g = _graph()
# cube5 = col=1, row=1 -> origin = (1*18, 1*15) = (18, 15)
# hbm_ctrl = (18 + 6.5, 15 + 7.0) = (24.5, 22.0)
node = g.nodes["sip0.cube5.hbm_ctrl.slice0"]
node = g.nodes["sip0.cube5.hbm_ctrl"]
assert node.pos_mm == (24.5, 22.0)
@@ -116,7 +109,7 @@ def test_ucie_ports_at_cube_edges():
assert g.nodes["sip0.cube0.ucie-E"].pos_mm == (16.0, 7.0)
# ── Full graph: edges ────────────────────────────────────────────────
# -- Full graph: edges --------------------------------------------------------
def _edge_set(g):
@@ -125,9 +118,9 @@ def _edge_set(g):
def test_inter_cube_ucie_edges():
es = _edge_set(_graph())
# cube0 (0,0) E cube1 (1,0) W
# cube0 (0,0) E -> cube1 (1,0) W
assert ("sip0.cube0.ucie-E", "sip0.cube1.ucie-W") in es
# cube0 (0,0) S cube4 (0,1) N
# cube0 (0,0) S -> cube4 (0,1) N
assert ("sip0.cube0.ucie-S", "sip0.cube4.ucie-N") in es
@@ -144,26 +137,33 @@ def test_switch_to_io_edges():
assert ("fabric.switch0", "sip1.io0.pcie_ep") in es
def test_pe_dma_to_noc_only():
"""PE_DMA connects only to NOC (no direct xbar connection)."""
def test_pe_dma_to_router():
"""PE_DMA connects to its local router (pe_to_router kind)."""
es = _edge_set(_graph())
cp = "sip0.cube0"
for pe in range(8):
assert (f"{cp}.pe{pe}.pe_dma", f"{cp}.noc") in es
# No direct pe_dma → xbar edges
assert (f"{cp}.pe{pe}.pe_dma", f"{cp}.xbar_top") not in es
assert (f"{cp}.pe{pe}.pe_dma", f"{cp}.xbar_bot") not in es
# PE0 at r0c0, PE1 at r0c1
assert (f"{cp}.pe0.pe_dma", f"{cp}.r0c0") in es
assert (f"{cp}.pe1.pe_dma", f"{cp}.r0c1") in es
# PE2 at r1c4, PE3 at r1c5
assert (f"{cp}.pe2.pe_dma", f"{cp}.r1c4") in es
assert (f"{cp}.pe3.pe_dma", f"{cp}.r1c5") in es
# PE4 at r4c0, PE5 at r4c1
assert (f"{cp}.pe4.pe_dma", f"{cp}.r4c0") in es
assert (f"{cp}.pe5.pe_dma", f"{cp}.r4c1") in es
# PE6 at r5c4, PE7 at r5c5
assert (f"{cp}.pe6.pe_dma", f"{cp}.r5c4") in es
assert (f"{cp}.pe7.pe_dma", f"{cp}.r5c5") in es
def test_command_path_m_cpu_noc_pe_cpu():
def test_command_path_m_cpu_router_pe_cpu():
es = _edge_set(_graph())
cp = "sip0.cube0"
# m_cpu ↔ noc (bidirectional)
assert (f"{cp}.m_cpu", f"{cp}.noc") in es
assert (f"{cp}.noc", f"{cp}.m_cpu") in es
# noc → pe_cpu for each PE
assert (f"{cp}.noc", f"{cp}.pe0.pe_cpu") in es
assert (f"{cp}.noc", f"{cp}.pe7.pe_cpu") in es
# m_cpu <-> r1c2 (bidirectional command)
assert (f"{cp}.m_cpu", f"{cp}.r1c2") in es
assert (f"{cp}.r1c2", f"{cp}.m_cpu") in es
# router -> pe_cpu for each PE (command kind)
assert (f"{cp}.r0c0", f"{cp}.pe0.pe_cpu") in es
assert (f"{cp}.r5c5", f"{cp}.pe7.pe_cpu") in es
def test_pe_internal_edges():
@@ -178,20 +178,32 @@ def test_pe_internal_edges():
assert (f"{pp}.pe_math", f"{pp}.pe_tcm") in es
def test_xbar_top_bot_to_hbm_slice_edges():
"""xbar_top connects to slices 0-3, xbar_bot to slices 4-7."""
es = _edge_set(_graph())
def test_hbm_ctrl_connects_all_routers():
"""HBM_CTRL connects to every router (router_to_hbm / hbm_to_router)."""
g = _graph()
es = _edge_set(g)
cp = "sip0.cube0"
for i in range(4):
assert (f"{cp}.xbar_top", f"{cp}.hbm_ctrl.slice{i}") in es
for i in range(4, 8):
assert (f"{cp}.xbar_bot", f"{cp}.hbm_ctrl.slice{i}") in es
# Negative: xbar_top must NOT connect to bottom slices
assert (f"{cp}.xbar_top", f"{cp}.hbm_ctrl.slice4") not in es
assert (f"{cp}.xbar_bot", f"{cp}.hbm_ctrl.slice0") not in es
routers = sorted(n for n in g.nodes if n.startswith(f"{cp}.r"))
assert len(routers) == 32
for r in routers:
assert (r, f"{cp}.hbm_ctrl") in es, f"missing {r}->hbm_ctrl"
assert (f"{cp}.hbm_ctrl", r) in es, f"missing hbm_ctrl->{r}"
# ── Views: system ────────────────────────────────────────────────────
def test_router_mesh_edges():
"""Adjacent routers are connected by router_mesh edges."""
g = _graph()
edge_kinds = {(e.src, e.dst): e.kind for e in g.edges}
cp = "sip0.cube0"
# r0c0 <-> r0c1 (horizontal neighbors)
assert edge_kinds.get((f"{cp}.r0c0", f"{cp}.r0c1")) == "router_mesh"
assert edge_kinds.get((f"{cp}.r0c1", f"{cp}.r0c0")) == "router_mesh"
# r0c0 <-> r1c0 (vertical neighbors)
assert edge_kinds.get((f"{cp}.r0c0", f"{cp}.r1c0")) == "router_mesh"
assert edge_kinds.get((f"{cp}.r1c0", f"{cp}.r0c0")) == "router_mesh"
# -- Views: system ------------------------------------------------------------
def test_system_view_nodes():
@@ -203,7 +215,7 @@ def test_system_view_nodes():
assert "sip1.io0" in v.nodes
# ── Views: SIP ───────────────────────────────────────────────────────
# -- Views: SIP ---------------------------------------------------------------
def test_sip_view_cube_count():
@@ -229,17 +241,21 @@ def test_sip_view_cube_positions():
assert y1 == 13.0
# ── Views: cube ──────────────────────────────────────────────────────
# -- Views: cube ---------------------------------------------------------------
def test_cube_view_has_all_components():
v = _graph().cube_view
expected = {"ucie-N", "ucie-S", "ucie-W", "ucie-E",
"m_cpu", "hbm_ctrl",
"bridge.left", "bridge.right", "noc", "sram",
"xbar_top", "xbar_bot",
"pe0", "pe1", "pe2", "pe3", "pe4", "pe5", "pe6", "pe7"}
# Add UCIe connection nodes (4 ports × 4 connections)
"m_cpu", "hbm_ctrl", "sram",
"pe0", "pe1", "pe2", "pe3", "pe4", "pe5", "pe6", "pe7",
"r0c0", "r0c1", "r0c2", "r0c3", "r0c4", "r0c5",
"r1c0", "r1c1", "r1c2", "r1c3", "r1c4", "r1c5",
"r2c0", "r2c1", "r2c4", "r2c5",
"r3c0", "r3c1", "r3c4", "r3c5",
"r4c0", "r4c1", "r4c2", "r4c3", "r4c4", "r4c5",
"r5c0", "r5c1", "r5c2", "r5c3", "r5c4", "r5c5"}
# Add UCIe connection nodes (4 ports x 4 connections)
for port in ("N", "S", "E", "W"):
for ci in range(4):
expected.add(f"ucie-{port}.conn{ci}")
@@ -249,20 +265,22 @@ def test_cube_view_has_all_components():
def test_cube_view_hbm_at_center():
v = _graph().cube_view
assert v.nodes["hbm_ctrl"].pos_mm == (6.5, 7.0)
assert v.nodes["noc"].pos_mm == (10.5, 7.0)
assert "r0c0" in v.nodes # routers exist in cube view
assert v.width_mm == 17.0
assert v.height_mm == 14.0
def test_cube_view_pe_to_noc():
"""PEs connect to NOC in cube view (no per-PE xbar)."""
def test_cube_view_pe_to_router():
"""PEs connect to their assigned routers in cube view."""
v = _graph().cube_view
ves = {(e.src, e.dst) for e in v.edges}
for i in range(8):
assert (f"pe{i}", "noc") in ves
pe_router_map = {"pe0": "r0c0", "pe1": "r0c1", "pe2": "r1c4", "pe3": "r1c5",
"pe4": "r4c0", "pe5": "r4c1", "pe6": "r5c4", "pe7": "r5c5"}
for pe, router in pe_router_map.items():
assert (pe, router) in ves, f"{pe} should connect to {router}"
# ── Views: PE ────────────────────────────────────────────────────────
# -- Views: PE ----------------------------------------------------------------
def test_pe_view_has_all_components():
@@ -284,7 +302,7 @@ def test_pe_view_edges():
assert ("pe_math", "pe_tcm") in ves
# ── SRAM ────────────────────────────────────────────────────────────
# -- SRAM ----------------------------------------------------------------------
def test_sram_node_exists():
@@ -293,92 +311,42 @@ def test_sram_node_exists():
assert g.nodes["sip0.cube0.sram"].kind == "sram"
def test_noc_to_sram_edges():
def test_sram_to_router_edges():
es = _edge_set(_graph())
cp = "sip0.cube0"
assert (f"{cp}.noc", f"{cp}.sram") in es
assert (f"{cp}.sram", f"{cp}.noc") in es
# SRAM connects to router r3c0
assert (f"{cp}.sram", f"{cp}.r3c0") in es
assert (f"{cp}.r3c0", f"{cp}.sram") in es
# ── PE_DMA → NOC (non-HBM data path) ───────────────────────────────
# -- PE_DMA -> Router (data path) ---------------------------------------------
def test_pe_dma_to_noc_edges():
def test_pe_dma_to_router_edges():
es = _edge_set(_graph())
cp = "sip0.cube0"
for i in range(8):
assert (f"{cp}.pe{i}.pe_dma", f"{cp}.noc") in es
# Each PE DMA connects to its local router
pe_router_map = {
0: "r0c0", 1: "r0c1", 2: "r1c4", 3: "r1c5",
4: "r4c0", 5: "r4c1", 6: "r5c4", 7: "r5c5",
}
for i, router in pe_router_map.items():
assert (f"{cp}.pe{i}.pe_dma", f"{cp}.{router}") in es
# ── Bridge connects XBAR halves (not NOC) ──────────────────────────
def test_bridge_connects_xbar_top_bot():
"""Bridges connect xbar_top ↔ xbar_bot (bidirectional)."""
es = _edge_set(_graph())
cp = "sip0.cube0"
for bname in ("left", "right"):
br = f"{cp}.bridge.{bname}"
assert (f"{cp}.xbar_top", br) in es
assert (br, f"{cp}.xbar_top") in es
assert (f"{cp}.xbar_bot", br) in es
assert (br, f"{cp}.xbar_bot") in es
def test_no_bridge_to_noc_edges():
es = _edge_set(_graph())
cp = "sip0.cube0"
assert (f"{cp}.bridge.left", f"{cp}.noc") not in es
assert (f"{cp}.bridge.right", f"{cp}.noc") not in es
# ── Cube view: new edges ────────────────────────────────────────────
def test_cube_view_pe_to_noc_edges():
"""All PEs connect to NOC in cube view."""
v = _graph().cube_view
ves = {(e.src, e.dst) for e in v.edges}
for i in range(8):
assert (f"pe{i}", "noc") in ves
def test_cube_view_sram():
v = _graph().cube_view
assert "sram" in v.nodes
ves = {(e.src, e.dst) for e in v.edges}
assert ("noc", "sram") in ves
assert ("sram", "noc") in ves
def test_cube_view_bridge_xbar():
"""Cube view bridges connect xbar_top ↔ xbar_bot."""
v = _graph().cube_view
ves = {(e.src, e.dst) for e in v.edges}
for bname in ("left", "right"):
br = f"bridge.{bname}"
assert ("xbar_top", br) in ves
assert (br, "xbar_top") in ves
assert ("xbar_bot", br) in ves
assert (br, "xbar_bot") in ves
# -- UCIe conn nodes connect to routers (not NOC) -----------------------------
def test_ucie_noc_reverse_edges():
"""UCIe ports connect to NOC via conn nodes (bidirectional)."""
"""UCIe ports connect to routers via conn nodes (bidirectional)."""
es = _edge_set(_graph())
cp = "sip0.cube1" # non-edge cube to avoid io-cube edges
for port in ("N", "S", "E", "W"):
# Direct ucie→noc no longer exists; path goes through conn nodes
assert (f"{cp}.ucie-{port}", f"{cp}.noc") not in es
# Each conn has edges: ucie↔conn, conn↔noc
# Each conn has edges: ucie<->conn, conn<->router
for ci in range(4):
conn = f"{cp}.ucie-{port}.conn{ci}"
assert (f"{cp}.ucie-{port}", conn) in es, \
f"missing ucie-{port}->conn{ci}"
assert (conn, f"{cp}.noc") in es, \
f"missing conn{ci}->noc"
assert (f"{cp}.noc", conn) in es, \
f"missing noc->conn{ci}"
assert (conn, f"{cp}.ucie-{port}") in es, \
f"missing conn{ci}->ucie-{port}"
@@ -396,31 +364,60 @@ def test_ucie_conn_nodes_exist():
def test_ucie_conn_edge_bw():
"""conn↔NOC edges must have per_connection_bw_gbs (128 GB/s)."""
"""conn<->router edges must have per_connection_bw_gbs (128 GB/s)."""
g = _graph()
edge_map = {(e.src, e.dst): e for e in g.edges}
cp = "sip0.cube0"
# Check conn0 for each port connects to a router with correct bw
for port in ("N", "S", "E", "W"):
for ci in range(4):
conn_id = f"{cp}.ucie-{port}.conn{ci}"
e = edge_map[(conn_id, f"{cp}.noc")]
assert e.bw_gbs == 128.0, f"{conn_id}→noc bw={e.bw_gbs}"
e_rev = edge_map[(f"{cp}.noc", conn_id)]
assert e_rev.bw_gbs == 128.0
# Find the ucie_conn_to_router edge
conn_edges = [e for e in g.edges
if e.src == conn_id and e.kind == "ucie_conn_to_router"]
assert len(conn_edges) == 1, f"expected 1 ucie_conn_to_router from {conn_id}"
assert conn_edges[0].bw_gbs == 128.0
def test_cross_cube_path_includes_conn():
"""PE cross-cube path must traverse conn nodes."""
g = _graph()
router = PathRouter(g)
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl.slice0")
path = router.find_path("sip0.cube0.pe0", "sip0.cube1.hbm_ctrl")
conn_nodes = [n for n in path if ".conn" in n]
assert len(conn_nodes) >= 2, f"Expected >=2 conn nodes in path, got {conn_nodes}"
def test_noc_to_xbar_top_bot_edges():
"""NOC connects to xbar_top and xbar_bot."""
es = _edge_set(_graph())
cp = "sip0.cube0"
assert (f"{cp}.noc", f"{cp}.xbar_top") in es
assert (f"{cp}.noc", f"{cp}.xbar_bot") in es
# -- Cube view: edges ---------------------------------------------------------
def test_cube_view_pe_to_router_edges():
"""All PEs connect to their routers in cube view."""
v = _graph().cube_view
ves = {(e.src, e.dst) for e in v.edges}
pe_router_map = {"pe0": "r0c0", "pe1": "r0c1", "pe2": "r1c4", "pe3": "r1c5",
"pe4": "r4c0", "pe5": "r4c1", "pe6": "r5c4", "pe7": "r5c5"}
for pe, router in pe_router_map.items():
assert (pe, router) in ves, f"{pe} should connect to {router}"
def test_cube_view_sram():
v = _graph().cube_view
assert "sram" in v.nodes
ves = {(e.src, e.dst) for e in v.edges}
assert ("sram", "r3c0") in ves
def test_cube_view_hbm_router():
"""Cube view: PE routers connect to hbm_ctrl."""
v = _graph().cube_view
ves = {(e.src, e.dst) for e in v.edges}
assert ("r0c0", "hbm_ctrl") in ves # PE0's router → HBM
def test_cube_view_m_cpu_router():
"""Cube view: m_cpu connects to its router r1c2."""
v = _graph().cube_view
ves = {(e.src, e.dst) for e in v.edges}
assert ("m_cpu", "r1c2") in ves
assert ("r1c2", "m_cpu") in ves
+2 -3
View File
@@ -34,14 +34,13 @@ def test_svg_output_is_deterministic(tmp_path):
def test_cube_svg_contains_hbm_ctrl(tmp_path):
_emit(tmp_path)
svg = (tmp_path / "cube_view.svg").read_text()
assert "HBM CTRL" in svg
assert "HBM_CTRL" in svg
def test_cube_svg_contains_ucie_ports(tmp_path):
_emit(tmp_path)
svg = (tmp_path / "cube_view.svg").read_text()
for port in ("UCIe-N", "UCIe-S", "UCIe-W", "UCIe-E"):
assert port in svg
assert "UCIe" in svg
def test_cube_svg_contains_pe_nodes(tmp_path):
+23 -24
View File
@@ -55,7 +55,7 @@ cube:
ucie_mm: { size: 2.0 }
pe_layout:
corners: [NW, NE, SW, SE] # N corners → xbar top row; S corners → xbar bottom row
corners: [NW, NE, SW, SE] # N corners → top PE rows; S corners → bottom PE rows
pe_per_corner: 2 # total PEs per cube: 4 * 2 = 8
pe_template:
@@ -84,18 +84,21 @@ cube:
hbm_total_gb_per_cube: 48
hbm_slices_per_cube: 8
hbm_total_bw_gbs: 1024.0
hbm_mapping_mode: n_to_one # one_to_one | n_to_one (ADR-0019)
hbm_pseudo_channels: 64 # total pseudo channels per cube
hbm_channels_per_pe: 8 # = pseudo_channels / pes_per_cube
hbm_channel_bw_gbs: 32.0 # per-channel bandwidth (GB/s)
components:
noc: { kind: noc, impl: noc_2d_mesh_v1, attrs: { overhead_ns: 0.0 } }
m_cpu: { kind: m_cpu, impl: m_cpu_v1, attrs: { overhead_ns: 5.0 } }
xbar:
top: { kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 2.0 } }
bottom: { kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 2.0 } }
bridges:
- { id: left, kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 1.0 } }
- { id: right, kind: xbar, impl: xbar_v1, attrs: { overhead_ns: 1.0 } }
hbm_ctrl: { kind: hbm_ctrl, impl: hbm_ctrl_v1, attrs: { capacity: 1, efficiency: 1.0 } }
sram: { kind: sram, impl: sram_v1, attrs: { size_mb: 32, overhead_ns: 2.0 } }
noc_router: { kind: noc_router, impl: forwarding_v1, attrs: { overhead_ns: 2.0 } }
m_cpu: { kind: m_cpu, impl: m_cpu_v1, attrs: { overhead_ns: 5.0 } }
hbm_ctrl: { kind: hbm_ctrl, impl: hbm_ctrl_v1, attrs: { capacity: 1, efficiency: 1.0 } }
sram: { kind: sram, impl: sram_v1, attrs: { size_mb: 32, overhead_ns: 2.0 } }
# Physical placement of non-PE components (mm coordinates)
placement:
m_cpu: { pos_mm: [7.5, 3.0] } # top center, below UCIe-N
sram: { pos_mm: [1.5, 9.0] } # left side, below HBM zone
ucie:
decompose: true
@@ -105,19 +108,15 @@ cube:
per_connection_bw_gbs: 128.0 # BW per connection; 4 × 128 = 512 GB/s = UCIe PHY BW
links:
xbar_to_hbm_bw_gbs: 256.0 # per-slice effective (2048 / 8 slices)
xbar_to_bridge_bw_gbs: 128.0 # bridge BW (xbar_top/bot ↔ bridge)
xbar_to_bridge_mm: 3.0 # xbar ↔ bridge wire distance
xbar_to_hbm_mm: 2.5
pe_dma_to_noc_bw_gbs: 256.0 # PE → NOC BW (= HBM slice BW, no bottleneck)
noc_to_xbar_mm: 0.0 # noc is distributed; distance modeled as 0
noc_to_xbar_bw_gbs: 256.0 # NOC → xbar_top/bot BW (= HBM slice BW)
noc_to_sram_mm: 0.0 # noc is distributed; distance modeled as 0
noc_to_sram:
per_connection_bw_gbs: 128.0 # BW per NOC connection
n_connections: 4 # 4 × 128 = 512 GB/s aggregate
m_cpu_to_noc_mm: 0.0 # noc is distributed; distance modeled as 0
noc_to_pe_cpu_mm: 0.0 # noc is distributed; distance modeled as 0
# Router mesh links (ADR-0019)
router_link_bw_gbs: 256.0 # inter-router XY mesh link BW
router_overhead_ns: 2.0 # per-router switching overhead
pe_to_router_bw_gbs: 256.0 # PE_DMA ↔ router (= N × channel_bw)
hbm_to_router_bw_gbs: 256.0 # HBM_CTRL ↔ router (= N × channel_bw)
sram_to_router_bw_gbs: 128.0 # SRAM ↔ router
m_cpu_to_router_mm: 0.0 # M_CPU ↔ router distance
pe_dma_to_noc_bw_gbs: 256.0 # PE → router BW (= HBM slice BW, no bottleneck)
noc_to_pe_cpu_mm: 0.0 # router → PE_CPU distance (command path)
visualization:
emit_views: [system, sip, cube]