commit - release 1
@@ -0,0 +1,108 @@
|
||||
# ADR-0001: PhysAddr Layout & Address Decoding Contract
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Date
|
||||
|
||||
2026-02-27
|
||||
|
||||
## Context
|
||||
|
||||
KernBench Graph Latency Simulator must route requests deterministically and compute end-to-end latency strictly by graph traversal.
|
||||
To model local vs remote traffic (same/different SIP, same/different CUBE, optional PE-group), requests need a stable, parsable address/location scheme that:
|
||||
|
||||
- can be decoded into routing domains (SIP/CUBE/HBM/PE-resource, etc.)
|
||||
- remains topology-agnostic (no hardcoded counts)
|
||||
- supports swappable policy and DI-first components without leaking topology assumptions into node implementations
|
||||
|
||||
## Decision
|
||||
|
||||
We define a **PhysAddr value object** and an **address decoding contract** that converts an integer address into routing domains.
|
||||
|
||||
### D1. PhysAddr is an immutable value object
|
||||
|
||||
- PhysAddr is immutable and comparable as a pure value.
|
||||
- Any allocator returns a **fully specified PhysAddr** (not partial metadata).
|
||||
- No global state may be required to interpret a PhysAddr.
|
||||
|
||||
### D2. PhysAddr fields (logical contract)
|
||||
|
||||
PhysAddr must be able to represent at least:
|
||||
|
||||
- `rack_id` (optional but reserved for scale-out)
|
||||
- `sip_id` (device / SIP domain)
|
||||
- `sip_seg` (SIP-level segment/window selection, e.g., cube window)
|
||||
- `local_offset` (offset within the chosen segment/window)
|
||||
|
||||
Decoded/derived fields may include (optional):
|
||||
|
||||
- `cube_id`
|
||||
- `kind` (e.g., HBM vs PE-resource vs raw)
|
||||
- `unit_type` / `pe_id` (if PE-level addressing is modeled)
|
||||
|
||||
**Important:** The exact bit allocation may evolve, but the *semantic fields above* must remain decodable without hidden assumptions.
|
||||
|
||||
### D3. Decoding is deterministic and policy-compatible
|
||||
|
||||
- Decoding must deterministically map an integer address to:
|
||||
- destination SIP domain (`sip_id`)
|
||||
- destination sub-domain (`cube_id` if applicable)
|
||||
- destination target kind (HBM/PE-resource/other)
|
||||
- Decoding must not depend on runtime topology sizes; it may depend on **explicit topology parameters** provided through configuration (e.g., segment size, slice size), and those parameters must live in the topology/config layer (not in random components).
|
||||
|
||||
### D4. Topology-derived constants live in the topology layer
|
||||
|
||||
Constants such as segment sizes (e.g., HBM slice size / window size) are derived from topology configuration (YAML/JSON/dict) and are provided to the decoder via DI/config.
|
||||
They must not be hardcoded in node implementations.
|
||||
|
||||
### D5. Routing consumes decoded domains, not raw bits
|
||||
|
||||
Routing policy uses decoded domains:
|
||||
|
||||
- `src` location (sip/cube/pe or node_id)
|
||||
- `dst` domains derived from PhysAddr decoding
|
||||
- `size_bytes` for size-aware link latency
|
||||
Routing must not inspect raw bit-fields directly except inside the decoding module.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
1) **Use raw integers everywhere, decode ad-hoc in routing**
|
||||
|
||||
- Rejected: leads to duplicated logic, inconsistent routing, and hidden assumptions embedded in multiple components.
|
||||
|
||||
1) **Hardcode topology sizes (SIP/CUBE/PE counts) into decoding**
|
||||
|
||||
- Rejected: violates SPEC (R3) and breaks swappability and configuration-driven topologies.
|
||||
|
||||
1) **Put decoding inside memory controllers or routers**
|
||||
|
||||
- Rejected: leaks policy into components and undermines DI-first, swappable implementations (SPEC R4).
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Deterministic routing domains enable clear test invariants for local vs remote paths (SPEC R1, R5).
|
||||
- Keeps topology variability (SPEC R3) while preserving consistent semantics.
|
||||
- DI-first: decoder can be swapped or extended without changing components or tests (SPEC R4).
|
||||
|
||||
### Tradeoffs / Costs
|
||||
|
||||
- Requires explicit configuration for any topology-derived sizes.
|
||||
- Introduces a single “blessed” decoding module that must remain stable and well-tested.
|
||||
|
||||
## Implementation Notes (Non-normative)
|
||||
|
||||
- Recommended module boundary:
|
||||
- `src/kernbench/policy/address/phyaddr.py`
|
||||
|
||||
- Tests should cover:
|
||||
- deterministic decoding
|
||||
- local vs remote classification from decoded fields
|
||||
- invariants: “allocator returns full PhysAddr”, “decoding requires no global state”
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC.md: R1 (routing), R3 (configurable topology), R4 (DI-first), R5 (multi-domain comm)
|
||||
@@ -0,0 +1,103 @@
|
||||
# ADR-0002: Routing Distance, Ordering & Bypass Rules
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Date
|
||||
2026-02-27
|
||||
|
||||
## Context
|
||||
The KernBench Graph Latency Simulator must compare kernel execution time
|
||||
across different architectures and topologies by computing end-to-end
|
||||
latency from graph traversal.
|
||||
|
||||
To support meaningful comparison:
|
||||
- routing must be deterministic
|
||||
- latency must reflect actual interconnect structure
|
||||
- local vs remote traffic must be distinguishable
|
||||
- “bypass” optimizations must not undermine debuggability or correctness
|
||||
|
||||
The simulator also aims to avoid software-managed metadata and hidden
|
||||
shortcuts that obscure control paths.
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Distance is accumulated latency, not hop count
|
||||
- Routing “distance” is defined as the **sum of per-node and per-link latency**.
|
||||
- Hop count alone must not be used for ordering or path selection.
|
||||
- Size-aware serialization latency (bytes / BW) contributes to distance.
|
||||
|
||||
### D2. Routing order is derived from graph traversal
|
||||
- The chosen route is the path with minimum accumulated latency
|
||||
given the constructed graph and routing policy.
|
||||
- Deterministic ordering must be guaranteed for identical inputs
|
||||
(topology + policy + request).
|
||||
|
||||
### D3. Bypass is explicit and graph-represented
|
||||
- Any bypass (e.g., local cube HBM access via XBAR instead of NOC) must be:
|
||||
- explicitly represented as a graph path, and
|
||||
- subject to latency accumulation like any other path.
|
||||
- Example: PE_DMA has dual egress — one to XBAR (HBM path) and one to NOC (non-HBM path).
|
||||
Both are explicit graph edges; neither is a “bypass” — they are distinct data paths
|
||||
serving different memory domains.
|
||||
- Implicit or “magic” bypass paths are disallowed.
|
||||
|
||||
### D4. No zero-latency end-to-end paths
|
||||
|
||||
- Every routed request must incur **end-to-end** latency > 0.
|
||||
- Individual fabric segments (e.g., NOC hops) MAY have distance_mm = 0
|
||||
when the fabric is distributed and distance is not meaningful at that granularity.
|
||||
This is allowed because other components on the same path (e.g., PE_DMA, SRAM,
|
||||
UCIe endpoints) contribute non-zero latency, ensuring the end-to-end invariant holds.
|
||||
- Fully zero-latency end-to-end paths are disallowed, except for explicit
|
||||
test-only stubs clearly marked as such.
|
||||
|
||||
### D5. Policy vs topology responsibility split
|
||||
- Topology builder:
|
||||
- defines nodes and links and their latency/BW parameters
|
||||
- Routing policy:
|
||||
- selects among available graph paths based on decoded domains
|
||||
- Routing policy must not assume missing links; missing connectivity
|
||||
is a topology construction error.
|
||||
|
||||
### D6. No software-managed routing metadata
|
||||
- Routing decisions must not rely on per-request software-managed metadata
|
||||
that tracks distance, hop count, or ordering outside the graph model.
|
||||
- All distance/order computation is derived from traversal itself.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
1) **Hop-count based routing**
|
||||
- Rejected: ignores heterogeneous latency/BW and misrepresents
|
||||
architectural differences.
|
||||
|
||||
2) **Implicit local shortcuts**
|
||||
- Rejected: breaks debuggability and violates traversal-based latency.
|
||||
|
||||
3) **Software-managed distance metadata**
|
||||
- Rejected: increases control overhead and obscures routing semantics.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Clear, debuggable hop-by-hop traces (SPEC R2, R4).
|
||||
- Architecture comparisons reflect real interconnect structure.
|
||||
- Routing behavior is reproducible and deterministic.
|
||||
|
||||
### Tradeoffs / Costs
|
||||
- Graph construction must be correct and complete.
|
||||
- Bypass modeling requires explicit graph representation,
|
||||
which slightly increases topology description complexity.
|
||||
|
||||
## Implementation Notes (Non-normative)
|
||||
- Recommended responsibilities:
|
||||
- Graph builder: ensure all required paths exist.
|
||||
- Router: select next hop based on decoded domains and policy.
|
||||
- Tests should assert:
|
||||
- non-zero end-to-end latency
|
||||
- deterministic routing for identical inputs
|
||||
- bypass paths appear explicitly in emitted traces
|
||||
|
||||
## Links
|
||||
- SPEC.md: R1 (routing), R2 (latency), R3 (topology), R5 (multi-domain comm)
|
||||
- ADR-0001: PhysAddr layout & decoding contract
|
||||
@@ -0,0 +1,64 @@
|
||||
# ADR-0003: Target System Hierarchy & Modeling Scope
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
We need a system-level simulator to evaluate LLM kernel performance on our AI Accelerator platform.
|
||||
The platform is organized as a compute tray containing multiple identical SIPs connected via PCIe or UAL
|
||||
through switching fabrics, with a host CPU issuing commands/kernels.
|
||||
|
||||
## Decision
|
||||
|
||||
We model the system hierarchy explicitly:
|
||||
|
||||
### D1. Tray-level
|
||||
|
||||
- A compute tray contains:
|
||||
- Host CPU (issues requests / coordinates runtime & data placement)
|
||||
- Multiple identical SIPs (accelerators)
|
||||
- Interconnect fabric between SIPs (PCIe and/or UAL via switches)
|
||||
|
||||
### D2. SIP-level
|
||||
|
||||
- A SIP is a multi-die package composed of:
|
||||
- Multiple CUBEs (HBM die + compute PEs + UCIe)
|
||||
- One or more IO chiplets (host/SIP interfaces)
|
||||
- IO chiplets:
|
||||
- provide interfaces: PCIe-EP, IO_CPU, optionally UAL-EP
|
||||
- can be multiple per SIP
|
||||
- placement constrained to SIP shoreline (top/bottom/left/right); each shoreline may host 1–2 IO chiplets
|
||||
|
||||
### D3. CUBE-level
|
||||
|
||||
- A CUBE contains:
|
||||
- HBM + memory controller (HBM_CTRL)
|
||||
- XBAR (top/bottom): HBM pseudo-channel crossbar, PE's dedicated path to HBM
|
||||
- Bridge (left/right): connects XBAR.top ↔ XBAR.bottom for cross-half HBM access
|
||||
- NOC: distributed on-die fabric spanning the entire cube (distance modeled as 0);
|
||||
carries non-HBM traffic including inter-cube (UCIe), command (M_CPU↔PE_CPU), and shared SRAM access
|
||||
- Shared SRAM: cube-level shared memory accessible by all PEs via NOC
|
||||
- management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation
|
||||
- multiple PEs
|
||||
- up to 4 UCIe endpoints (N/E/W/S) for CUBE↔CUBE and CUBE↔IO connectivity
|
||||
|
||||
### D4. PE-level
|
||||
|
||||
- A PE can execute one kernel instance
|
||||
- PE contains internal control + accelerators (modeled at PE view granularity):
|
||||
- PE_CPU, command handler, PE_TCM, DMA/GEMM/MATH engines, internal queues
|
||||
|
||||
## Consequences
|
||||
|
||||
- The simulator supports abstraction by “views”:
|
||||
- SIP view hides PE internals
|
||||
- CUBE view treats each PE as a single block
|
||||
- PE view expands PE internals
|
||||
- Topology remains parameterized; sizes/counts/links come from configuration.
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC R3/R5
|
||||
- ADR-0005 (diagram views)
|
||||
@@ -0,0 +1,64 @@
|
||||
# ADR-0004: Memory Semantics & Local-HBM Bandwidth Guarantee
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Accurately modeling PE↔HBM behavior is essential for kernel latency estimation.
|
||||
Each PE has a notion of “local HBM” that must guarantee full HBM bandwidth, independent of intervening on-die fabric bandwidth.
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Local HBM definition
|
||||
|
||||
- Each PE is assigned a logically defined “local HBM” region.
|
||||
- Local HBM corresponds to the pseudo-channel subset directly attached to that PE’s DMA path
|
||||
via the XBAR (top or bottom, depending on PE corner placement).
|
||||
- The path is: PE_DMA → XBAR.top/bottom → HBM_CTRL.
|
||||
- The mapping (HBM pseudo-channels → PE local regions) is derived from topology configuration.
|
||||
|
||||
### D2. Local HBM bandwidth guarantee contract
|
||||
|
||||
- Accesses from a PE to its local HBM MUST guarantee full HBM read/write bandwidth
|
||||
independent of intervening fabric bandwidth limits.
|
||||
- This guarantee is modeled by:
|
||||
- a dedicated logical path and/or service model that enforces HBM BW at the PE-local-HBM interaction point,
|
||||
- while still incurring non-zero latency along explicitly modeled components.
|
||||
|
||||
### D3. Cross-half HBM semantics
|
||||
|
||||
- A PE connected to XBAR.bottom that accesses HBM pseudo-channels on the XBAR.top half
|
||||
(or vice versa) traverses a bridge:
|
||||
- PE_DMA → XBAR.bottom → bridge → XBAR.top → HBM_CTRL
|
||||
- Bridge bandwidth may limit cross-half HBM access relative to local-half access.
|
||||
|
||||
### D4. Non-local HBM semantics (inter-cube / inter-SIP)
|
||||
|
||||
- Accesses from a PE to HBM in a different cube or SIP MAY be limited by:
|
||||
- NOC bandwidth within the cube,
|
||||
- inter-cube UCIe links,
|
||||
- inter-SIP fabric (PCIe/UAL).
|
||||
- These paths MUST be explicit and traceable.
|
||||
|
||||
### D5. Shared SRAM semantics
|
||||
|
||||
- Each CUBE contains a shared SRAM accessible by all PEs in that CUBE.
|
||||
- Access path: PE_DMA → NOC → shared SRAM.
|
||||
- Shared SRAM bandwidth is limited by the NOC↔SRAM link bandwidth.
|
||||
- Shared SRAM is not part of the HBM address space; it is a separate memory domain.
|
||||
|
||||
## Verification Notes
|
||||
|
||||
Tests should cover:
|
||||
|
||||
- local-HBM case: BW matches HBM BW regardless of fabric BW parameter
|
||||
- cross-half HBM case: latency includes bridge traversal
|
||||
- non-local cases (inter-cube/inter-SIP): BW/latency respond to fabric/link parameters
|
||||
- shared SRAM case: access via NOC with correct BW
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC R2/R5
|
||||
- ADR-0002 (distance/order & explicit bypass)
|
||||
@@ -0,0 +1,186 @@
|
||||
# ADR-0005: Diagram Views & Distance-Aware Layout Rules
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
We require verifiable and inspectable system modeling for a large-scale,
|
||||
parameterized AI Accelerator system.
|
||||
|
||||
Humans must be able to:
|
||||
|
||||
- visually inspect the modeled topology,
|
||||
- reason about communication structure and relative distance,
|
||||
- do so at multiple abstraction levels without being overwhelmed by detail.
|
||||
|
||||
The simulator models distance (accumulated latency) as a first-class concept.
|
||||
Diagrams must reflect this distance by default.
|
||||
|
||||
---
|
||||
|
||||
## Global Defaults
|
||||
|
||||
- All diagrams MUST be **distance-aware by default**.
|
||||
- All diagrams MUST render **representative views** of the architecture.
|
||||
- Instance indices (e.g., sip0, cube2, pe3) MUST NOT be required for diagram generation.
|
||||
- Instance indices MAY be used ONLY:
|
||||
- to define a distance anchor in asymmetric or debugging scenarios, or
|
||||
- when explicitly requested.
|
||||
|
||||
---
|
||||
|
||||
## Representative Rendering Rule
|
||||
|
||||
- All CUBEs share the same internal structure.
|
||||
- All PEs share the same internal structure.
|
||||
|
||||
Therefore:
|
||||
|
||||
- SIP-level diagrams render representative CUBEs and IO chiplets.
|
||||
- CUBE-level diagrams render representative PEs as opaque blocks.
|
||||
- PE-level diagrams render a representative PE with fully expanded internals.
|
||||
|
||||
Diagrams MUST NOT depend on specific SIP, CUBE, or PE indices
|
||||
unless explicitly requested.
|
||||
|
||||
---
|
||||
|
||||
## Diagram Views
|
||||
|
||||
### View A — SIP-Level Diagram
|
||||
|
||||
**Purpose**
|
||||
Explain system-scale structure and connectivity.
|
||||
|
||||
**Visible elements**
|
||||
|
||||
- SIP boundaries (optional)
|
||||
- CUBEs (opaque blocks)
|
||||
- IO chiplets (opaque blocks)
|
||||
- Optional UCIe stubs only if needed to clarify connectivity
|
||||
|
||||
**Hidden elements**
|
||||
|
||||
- PE internals
|
||||
- CUBE internal fabric
|
||||
- IO chiplet internals
|
||||
|
||||
**Visible links**
|
||||
|
||||
- Host ↔ IO chiplets (PCIe)
|
||||
- SIP ↔ SIP (PCIe / UAL via switches)
|
||||
- IO ↔ CUBE (on-package links)
|
||||
|
||||
---
|
||||
|
||||
### View B — CUBE-Level Diagram
|
||||
|
||||
**Purpose**
|
||||
Explain cube-internal structure and data/control flow.
|
||||
|
||||
**Visible elements**
|
||||
|
||||
- XBAR (top/bottom): HBM pseudo-channel crossbar
|
||||
- Bridge (left/right): cross-half HBM connectors between XBAR.top and XBAR.bottom
|
||||
- NOC: distributed on-die fabric for non-HBM traffic
|
||||
- HBM subsystem (HBM_CTRL)
|
||||
- Shared SRAM: cube-level shared memory
|
||||
- Management CPU (M_CPU)
|
||||
- PEs as opaque blocks (PE[0..N−1])
|
||||
- UCIe endpoints (N/E/W/S) as ports
|
||||
|
||||
**Hidden elements**
|
||||
|
||||
- PE internals
|
||||
|
||||
**Visible links**
|
||||
|
||||
- PE → XBAR (HBM data path, top or bottom by corner placement)
|
||||
- PE → NOC (non-HBM data path)
|
||||
- XBAR ↔ bridge ↔ XBAR (cross-half HBM access)
|
||||
- XBAR → HBM_CTRL
|
||||
- NOC ↔ UCIe endpoints
|
||||
- NOC ↔ shared SRAM
|
||||
- M_CPU ↔ NOC (command path)
|
||||
- NOC → PE_CPU (command delivery, collapsed into PE block)
|
||||
|
||||
---
|
||||
|
||||
### View C — PE-Level Diagram
|
||||
|
||||
**Purpose**
|
||||
Explain internal PE behavior and execution structure.
|
||||
|
||||
**Visible elements**
|
||||
|
||||
- PE_CPU
|
||||
- Command handler / scheduler
|
||||
- PE_TCM (local SRAM)
|
||||
- HW accelerators (DMA, GEMM, MATH, etc.)
|
||||
- Local HBM interface
|
||||
- Optional IPCQ / messaging endpoints
|
||||
|
||||
**Visible links**
|
||||
|
||||
- Control paths (CPU → scheduler → engines)
|
||||
- Data paths (engines ↔ TCM, DMA ↔ local HBM)
|
||||
- External fabric ports as abstract ports only
|
||||
|
||||
---
|
||||
|
||||
## Distance-Aware Layout (Default)
|
||||
|
||||
### Distance definition
|
||||
|
||||
- Distance is defined as **accumulated latency**, consistent with ADR-0002.
|
||||
- Distance is computed from a single anchor node.
|
||||
|
||||
### Default anchor selection
|
||||
|
||||
- SIP view: IO chiplet (or Host CPU if present)
|
||||
- CUBE view: a representative PE
|
||||
- PE view: PE_CPU or Command Handler
|
||||
|
||||
Anchors are **implicit defaults** and MUST NOT be required to be specified.
|
||||
|
||||
### Layout rules
|
||||
|
||||
- Diagrams MUST be laid out in layers based on distance buckets.
|
||||
- Layout direction MUST be consistent within a view type
|
||||
(preferred: left-to-right).
|
||||
- Nodes with equal distance MUST have stable ordering
|
||||
(by role or identifier, deterministically).
|
||||
|
||||
Cycles MAY be rendered using dashed or curved edges for readability,
|
||||
without affecting distance semantics.
|
||||
|
||||
---
|
||||
|
||||
## Generation Contract (for Tools / Claude Code)
|
||||
|
||||
When generating diagrams:
|
||||
|
||||
- Assume distance-aware layout by default.
|
||||
- Assume representative rendering by default.
|
||||
- Do NOT ask for SIP/CUBE/PE indices unless required.
|
||||
- Do NOT expand hidden abstraction levels.
|
||||
- Prefer architectural clarity over micro-hop fidelity.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Diagrams are stable across topology scaling.
|
||||
- Changes in distance or routing policy are reflected visually.
|
||||
- Diagrams serve as verifiable artifacts derived from the simulator model,
|
||||
not as hand-maintained documentation.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC Section 4 (Output, Debuggability, and Diagrams)
|
||||
- ADR-0002 (Routing distance semantics)
|
||||
- ADR-0006 (Topology compilation & automatic diagram generation)
|
||||
@@ -0,0 +1,130 @@
|
||||
# ADR-0006: Topology Compilation, Distance Extraction, and Automatic Diagram Generation
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The simulator compiles topology configuration (e.g., topology.yaml) into an explicit model graph,
|
||||
and computes routing and accumulated latency (distance).
|
||||
Diagrams should be generated from these authoritative artifacts to ensure consistency and avoid
|
||||
hand-maintained topology drawings.
|
||||
|
||||
Additionally, for usability, diagrams should be emitted automatically into a stable location
|
||||
so that developers can preview them immediately in the repository.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Topology compilation is the single source of truth
|
||||
|
||||
- topology.yaml (or equivalent config) is compiled into:
|
||||
- an explicit system graph,
|
||||
- node/link attributes,
|
||||
- routing policies.
|
||||
This compiled graph is the authoritative representation of the system.
|
||||
|
||||
### D2. Distance extraction during compilation
|
||||
|
||||
- During or immediately after topology compilation, the simulator MUST compute distance metadata
|
||||
(accumulated latency) consistent with ADR-0002.
|
||||
- Distance metadata MUST be sufficient to support distance-aware diagram layout as defined in ADR-0005.
|
||||
- Distributed fabric segments (e.g., NOC) MAY have distance_mm = 0 per ADR-0002 D4;
|
||||
layout placement for such nodes uses explicit position metadata rather than distance buckets.
|
||||
|
||||
### D3. Diagram generation is a derived artifact
|
||||
|
||||
- Diagrams MUST be generated from:
|
||||
- the compiled topology graph,
|
||||
- extracted distance metadata,
|
||||
- view/layout rules defined in ADR-0005.
|
||||
- Diagram generation MUST NOT require additional hand-written topology descriptions.
|
||||
|
||||
### D4. Automatic diagram emission to the repository
|
||||
|
||||
- As part of topology compilation, the implementation MUST produce the following diagrams by default:
|
||||
- SIP-level diagram (representative, distance-aware)
|
||||
- CUBE-level diagram (representative, distance-aware)
|
||||
- PE-level diagram (representative, distance-aware)
|
||||
- The default output directory is:
|
||||
- `docs/diagrams/`
|
||||
- The generator MUST overwrite/update only when the compiled topology (or diagram rules) changes.
|
||||
|
||||
### D5. View-specific projection and layout
|
||||
|
||||
For each view (SIP / CUBE / PE):
|
||||
|
||||
- The generator MUST project the compiled graph into a reduced view graph:
|
||||
- hide/collapse nodes according to ADR-0005,
|
||||
- preserve connectivity semantics relevant to that view,
|
||||
- compute distance buckets and assign layout layers deterministically.
|
||||
- CUBE-level projection MUST include:
|
||||
- XBAR (top/bottom), bridge (left/right), NOC, HBM_CTRL, shared SRAM, M_CPU, UCIe ports,
|
||||
and PEs as opaque blocks.
|
||||
- Distinct edge kinds for HBM path (PE→XBAR) vs non-HBM path (PE→NOC).
|
||||
- Default anchors are implicit (ADR-0005) and MUST NOT require instance indices.
|
||||
|
||||
### D6. Output formats and determinism
|
||||
|
||||
- The generator MUST output at least one of:
|
||||
- Mermaid (Markdown-native)
|
||||
- Graphviz DOT (rank-based control)
|
||||
- SVG (mm-accurate layout, no external dependencies)
|
||||
- SVG is preferred when mm-accurate position metadata is available from the compiled topology.
|
||||
- Output MUST be deterministic:
|
||||
- same topology + same rules → identical diagram text
|
||||
- File naming MUST be deterministic and stable (see "Output Conventions").
|
||||
|
||||
### D7. Performance and caching
|
||||
|
||||
- Diagram generation MAY be lazy and/or cached, as long as the outputs in `docs/diagrams/`
|
||||
remain consistent with the compiled topology.
|
||||
- The implementation SHOULD use a cache key based on:
|
||||
- topology content hash,
|
||||
- routing policy version,
|
||||
- diagram rules version,
|
||||
- view type (SIP/CUBE/PE).
|
||||
|
||||
---
|
||||
|
||||
## Output Conventions
|
||||
|
||||
### Directory
|
||||
|
||||
- `docs/diagrams/` is the canonical output directory for generated diagrams.
|
||||
|
||||
### File names (recommended, deterministic)
|
||||
|
||||
- `system_view.svg` / `system_view.mmd` / `system_view.dot`
|
||||
- `sip_view.svg` / `sip_view.mmd` / `sip_view.dot`
|
||||
- `cube_view.svg` / `cube_view.mmd` / `cube_view.dot`
|
||||
- `pe_view.svg` / `pe_view.mmd` / `pe_view.dot`
|
||||
|
||||
Optionally, for multi-topology workflows:
|
||||
|
||||
- `sip_view__{topology_id}.svg`
|
||||
- `cube_view__{topology_id}.svg`
|
||||
- `pe_view__{topology_id}.svg`
|
||||
|
||||
### Repository policy
|
||||
|
||||
- Generated diagram files MAY be committed to the repository to enable diff-based review.
|
||||
- If committed, they MUST be reproducible from topology compilation.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Diagrams are always consistent with simulator behavior.
|
||||
- Architectural changes automatically propagate to visualizations.
|
||||
- Diagram diffs become meaningful indicators of architectural change.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC Section 4 (Output, Debuggability, and Diagrams)
|
||||
- ADR-0002 (Distance semantics)
|
||||
- ADR-0005 (Diagram views and layout rules)
|
||||
@@ -0,0 +1,89 @@
|
||||
# ADR-0007: Runtime API and Simulation Engine Boundaries
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The simulator consists of multiple layers with distinct responsibilities:
|
||||
|
||||
- a host-facing API layer used by benchmarks and user code,
|
||||
- a discrete-event simulation engine that executes requests,
|
||||
- device components that model hardware behavior.
|
||||
|
||||
Without strict boundaries, orchestration logic can leak into components,
|
||||
or simulation internals can become entangled with user-facing APIs.
|
||||
|
||||
This ADR defines clear responsibility boundaries between:
|
||||
|
||||
- runtime API,
|
||||
- simulation engine (sim_engine),
|
||||
- hardware components.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Runtime API is host-facing orchestration only
|
||||
|
||||
The runtime API represents host/driver-level behavior and MUST:
|
||||
|
||||
- expose high-level operations (tensor deployment, kernel launch),
|
||||
- submit requests only to endpoint components (e.g., IO_CPU),
|
||||
- await completion via futures/handles,
|
||||
- own and persist host-side metadata (tensor allocation maps, kernel bindings).
|
||||
|
||||
The runtime API MUST NOT:
|
||||
|
||||
- hardcode hop-by-hop routing or fan-out,
|
||||
- directly invoke internal components (M_CPU, PE_CPU, engines),
|
||||
- embed topology- or routing-specific assumptions.
|
||||
|
||||
---
|
||||
|
||||
### D2. Simulation engine executes and schedules requests
|
||||
|
||||
The simulation engine (sim_engine) MUST:
|
||||
|
||||
- inject requests into the compiled topology graph,
|
||||
- schedule and execute events using a discrete-event model,
|
||||
- manage correlation ids and completion tracking,
|
||||
- decompose operations into low-level requests when required
|
||||
(e.g., MemoryWrite events).
|
||||
|
||||
The simulation engine MUST NOT:
|
||||
|
||||
- define tensor semantics,
|
||||
- define kernel execution policies,
|
||||
- expose internal graph details to the runtime API.
|
||||
|
||||
---
|
||||
|
||||
### D3. Components own fan-out and aggregation
|
||||
|
||||
Device-side components MUST:
|
||||
|
||||
- fan-out requests to downstream domains
|
||||
(IO_CPU → M_CPU → PE_CPU → schedulers/engines),
|
||||
- aggregate completion and failure signals,
|
||||
- propagate results deterministically upstream.
|
||||
|
||||
Neither the runtime API nor the simulation engine may orchestrate
|
||||
component-level fan-out explicitly.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Runtime APIs remain stable as topology and routing evolve.
|
||||
- Simulation internals can change without affecting user-facing code.
|
||||
- Component implementations remain swappable via DI.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC R4, R7, R8
|
||||
- ADR-0008 (Tensor deployment)
|
||||
- ADR-0009 (Kernel execution)
|
||||
@@ -0,0 +1,100 @@
|
||||
# ADR-0008: Tensor Deployment and Allocation (Host Allocator, PA-first)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Benchmarks require PyTorch-like tensor semantics:
|
||||
|
||||
- tensor creation (empty, fill),
|
||||
- deployment to accelerator devices (tensor.to()).
|
||||
|
||||
In the realistic system, host software manages allocation/mapping and installs
|
||||
mappings for DMA/MMU. For Phase 0 we simplify (ADR-0011):
|
||||
|
||||
- device memory operations use PA only,
|
||||
- VA/MMU/IOMMU is not modeled.
|
||||
|
||||
To keep the host↔device interface minimal, we avoid a separate
|
||||
AllocateTensorMeta message. Instead, host allocation produces a PA shard map
|
||||
that is used directly by MemoryWrite/Read and KernelLaunch.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Tensor is a host-owned handle with PA shard mapping
|
||||
|
||||
A Tensor object is a host-owned handle that encapsulates:
|
||||
|
||||
- shape and dtype,
|
||||
- initialization intent,
|
||||
- device placement and allocation metadata as a PA shard map.
|
||||
|
||||
After deployment, the Tensor handle MUST contain:
|
||||
|
||||
- a list of shards, each with (sip,cube,pe,pa,nbytes,offset_bytes).
|
||||
|
||||
This PA shard mapping is the single source of truth for kernel argument binding.
|
||||
|
||||
---
|
||||
|
||||
### D2. Deployment uses a host allocator (Phase 0)
|
||||
|
||||
In Phase 0, tensor deployment produces PA shard mappings via a host allocator:
|
||||
|
||||
- placement (split/replicate/hybrid) is decided by a DP policy,
|
||||
- allocation assigns PA ranges at the PE level and returns shard mappings,
|
||||
- the Tensor handle stores the resulting shard list deterministically.
|
||||
|
||||
No separate host-visible device allocation RPC is required in Phase 0.
|
||||
|
||||
---
|
||||
|
||||
### D3. Data initialization and transfer uses MemoryWrite/Read only
|
||||
|
||||
Any data initialization or transfer implied by a tensor (e.g., fill, copy)
|
||||
MUST be represented using Host ↔ IO_CPU messages only:
|
||||
|
||||
- MemoryWrite
|
||||
- MemoryRead
|
||||
|
||||
Rules:
|
||||
|
||||
- MemoryWrite/Read MUST reference PA + (sip,cube,pe) tags (ADR-0012).
|
||||
- Allocation metadata MUST NOT be embedded as a separate allocation message.
|
||||
- Bulk tensor data MUST NOT be embedded in Phase 0 messages.
|
||||
|
||||
The simulation engine schedules MemoryWrite/Read through the graph so that
|
||||
latency is computed by explicit traversal.
|
||||
|
||||
---
|
||||
|
||||
### D4. Extension path (non-breaking)
|
||||
|
||||
Future ADRs MAY introduce optional VA/MMU/IOMMU modeling by adding:
|
||||
|
||||
- virtual addressing in tensor handles,
|
||||
- mapping install steps,
|
||||
- translation latency/page granularity.
|
||||
|
||||
The Phase 0 PA shard map remains a valid fast-path configuration.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Host↔IO_CPU contract remains minimal (MemoryRead/Write + KernelLaunch).
|
||||
- KernelLaunch can pass per-PE data placement explicitly via shard tags.
|
||||
- Early implementation stays simple and testable.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- ADR-0011 (PA-first)
|
||||
- ADR-0012 (Host↔IO_CPU schema)
|
||||
- ADR-0007 (runtime_api vs sim_engine boundaries)
|
||||
- ADR-0009 (Kernel execution)
|
||||
@@ -0,0 +1,74 @@
|
||||
# ADR-0009: Kernel Execution Messaging and Completion Semantics
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Kernel execution is initiated by the host and proceeds through
|
||||
device control components:
|
||||
|
||||
Host → IO_CPU → M_CPU → PE_CPU → schedulers → engines
|
||||
|
||||
Completion propagates in reverse order.
|
||||
|
||||
To keep benchmarks simple and topology-agnostic,
|
||||
kernel execution must be endpoint-driven with deterministic aggregation.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Kernel launch is an endpoint request
|
||||
|
||||
A kernel launch is initiated by submitting a single KernelLaunch request
|
||||
to the IO_CPU endpoint.
|
||||
|
||||
The runtime API MUST:
|
||||
|
||||
- construct the kernel launch request,
|
||||
- submit it to IO_CPU,
|
||||
- await a single completion result.
|
||||
|
||||
The runtime API MUST NOT orchestrate internal fan-out.
|
||||
|
||||
---
|
||||
|
||||
### D2. Tensor arguments are passed by metadata
|
||||
|
||||
KernelLaunch requests MUST reference tensor arguments via:
|
||||
|
||||
- host-owned tensor handles, or
|
||||
- resolved device address maps derived from those handles.
|
||||
|
||||
Bulk tensor data MUST NOT be embedded in kernel launch messages.
|
||||
|
||||
---
|
||||
|
||||
### D3. Fan-out and aggregation are component responsibilities
|
||||
|
||||
- IO_CPU fans out work to M_CPUs.
|
||||
- M_CPU fans out work to PE_CPUs.
|
||||
- PE_CPU manages kernel execution and engine dispatch.
|
||||
|
||||
Completion semantics:
|
||||
|
||||
- M_CPU completes when all targeted PEs complete or a failure policy triggers.
|
||||
- IO_CPU completes when all targeted CUBEs complete or a failure policy triggers.
|
||||
|
||||
---
|
||||
|
||||
### D4. Completion and failure propagation
|
||||
|
||||
- All messages MUST carry correlation identifiers.
|
||||
- Completion and failure MUST propagate deterministically to the host.
|
||||
- The simulation engine provides futures/handles to observe completion.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC R1, R2, R7, R8
|
||||
- ADR-0007 (Runtime API boundaries)
|
||||
- ADR-0008 (Tensor deployment)
|
||||
@@ -0,0 +1,62 @@
|
||||
# ADR-0010: CLI Device Selection and Multi-Device Execution Semantics
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Benchmarks represent device-agnostic workloads that operate on a single device.
|
||||
Users may want to run a benchmark:
|
||||
|
||||
- on a specific device, or
|
||||
- across all devices in the system.
|
||||
|
||||
Device enumeration must not leak into benchmarks or runtime APIs.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Benchmarks are single-device by design
|
||||
|
||||
- A benchmark MUST define behavior for a single device only.
|
||||
- A benchmark MUST accept a device identifier as input.
|
||||
- Benchmarks MUST NOT enumerate or loop over multiple devices.
|
||||
|
||||
---
|
||||
|
||||
### D2. CLI controls device selection
|
||||
|
||||
The `kernbench run` command supports an optional `--device` argument:
|
||||
|
||||
- If `--device <id>` is specified:
|
||||
- the benchmark executes once for the specified device.
|
||||
|
||||
- If `--device` is omitted:
|
||||
- the benchmark executes once using all the SIPs discovered in the topology.
|
||||
|
||||
---
|
||||
|
||||
### D3. Multi-device execution is logically parallel
|
||||
|
||||
When running on multiple devices:
|
||||
|
||||
- benchmark executions are submitted to a single simulation engine instance,
|
||||
- executions are logically parallel in simulation time,
|
||||
- inter-device contention is naturally modeled.
|
||||
|
||||
---
|
||||
|
||||
### D4. Runtime API and simulation engine remain device-scoped
|
||||
|
||||
- Runtime API calls operate on one device per invocation.
|
||||
- The simulation engine schedules all requests deterministically.
|
||||
- Neither layer enumerates devices.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC R7, R8
|
||||
- ADR-0007 (Runtime API boundaries)
|
||||
@@ -0,0 +1,65 @@
|
||||
# ADR-0011: Memory Addressing Simplification (PA-first)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
A realistic system uses host-side virtual addressing and an MMU/IOMMU-style
|
||||
translation path for DMA: host allocates physical memory at PE level, maps it
|
||||
into a virtual address space, installs mappings, and DMA requests use virtual
|
||||
addresses that are translated to physical addresses.
|
||||
|
||||
For early development, we want a minimal, deterministic model that enables:
|
||||
|
||||
- correct routing and latency accounting through the graph,
|
||||
- stable tensor deployment and kernel execution semantics,
|
||||
- future extension toward VA/MMU without rewriting workflows.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Phase 0 model is PA-only
|
||||
|
||||
The simulator uses a PA-first model:
|
||||
|
||||
- All device memory accesses (MemoryRead/MemoryWrite) operate on device physical
|
||||
addresses (PA) plus size.
|
||||
- Tensor handles store PA-based shard mappings after deployment.
|
||||
- KernelLaunch passes tensor arguments as PA-based mappings (or references to them).
|
||||
- MMU/IOMMU concepts (virtual address spaces, page tables, translation latency)
|
||||
are NOT modeled in Phase 0.
|
||||
|
||||
### D2. Allocation produces PA mappings
|
||||
|
||||
Device allocation selects PE-local memory regions and returns PA mappings
|
||||
sufficient to execute kernels and issue DMA requests.
|
||||
|
||||
### D3. Extension path (non-breaking)
|
||||
|
||||
A future ADR MAY introduce an optional VA/MMU layer by:
|
||||
|
||||
- introducing virtual addresses in tensor handles,
|
||||
- adding a mapping-install step,
|
||||
- modeling translation latency and page granularity.
|
||||
|
||||
The Phase 0 PA model remains a valid fast-path configuration.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Early implementation stays simple and testable.
|
||||
- All latency remains explicit via graph traversal, not hidden translation.
|
||||
- Future VA/MMU modeling can be added without breaking existing benchmarks.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- ADR-0007 (runtime_api vs sim_engine boundaries)
|
||||
- ADR-0008 (tensor deployment)
|
||||
- ADR-0009 (kernel execution)
|
||||
- SPEC R2 (latency by traversal)
|
||||
@@ -0,0 +1,232 @@
|
||||
# ADR-0012: Host ↔ IO_CPU Message Schema (PA-first, PE-tagged)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Phase 0 uses a PA-first memory model (ADR-0011):
|
||||
|
||||
- memory operations use device physical addresses (PA) only,
|
||||
- VA/MMU/IOMMU is not modeled.
|
||||
|
||||
The host-facing runtime API interacts with the device via the IO_CPU endpoint.
|
||||
We define stable, minimal message schemas for Host ↔ IO_CPU so that:
|
||||
|
||||
- benchmarks remain stable,
|
||||
- IO_CPU-internal fan-out/aggregation can evolve independently,
|
||||
- completion and failure propagation is deterministic.
|
||||
|
||||
We also require PE-tagging (A 방식): each shard explicitly carries (sip,cube,pe)
|
||||
so IO_CPU can deterministically route/fan-out without relying on PA decoding.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Contract scope
|
||||
|
||||
This schema is the stable contract ONLY for Host ↔ IO_CPU.
|
||||
|
||||
Messages beyond IO_CPU (to M_CPU, PE_CPU, schedulers, engines) are component-internal
|
||||
and are NOT part of this host contract in Phase 0.
|
||||
|
||||
---
|
||||
|
||||
### D2. Required message set
|
||||
|
||||
The runtime API MUST use only these message types for Host ↔ IO_CPU:
|
||||
|
||||
- MemoryWrite
|
||||
- MemoryRead
|
||||
- KernelLaunch
|
||||
|
||||
All operations required by benchmarks (tensor init/copy, kernel run) MUST be expressible
|
||||
with these messages.
|
||||
|
||||
---
|
||||
|
||||
### D3. Common envelope (mandatory for all requests)
|
||||
|
||||
All Host ↔ IO_CPU requests MUST include:
|
||||
|
||||
- `msg_type: str`
|
||||
- `correlation_id: str`
|
||||
- generated by the host
|
||||
- used to match responses deterministically
|
||||
- `request_id: str`
|
||||
- unique within a correlation_id
|
||||
- `target_device: str`
|
||||
- device identifier (e.g., "sip:0")
|
||||
- `timestamp_tag: str | None` (optional)
|
||||
- debug tag only; MUST NOT affect determinism
|
||||
|
||||
All Host ↔ IO_CPU responses MUST include:
|
||||
|
||||
- `correlation_id: str`
|
||||
- `request_id: str`
|
||||
- `completion: Completion`
|
||||
|
||||
---
|
||||
|
||||
### D4. Completion schema (mandatory)
|
||||
|
||||
`Completion` MUST have:
|
||||
|
||||
- `ok: bool`
|
||||
- `error_code: str | None`
|
||||
- `error_message: str | None`
|
||||
|
||||
Rules:
|
||||
|
||||
- If `ok == true` then `error_code` and `error_message` MUST be null.
|
||||
- If `ok == false` then `error_code` MUST be non-null.
|
||||
- Completion semantics MUST be deterministic.
|
||||
|
||||
---
|
||||
|
||||
### D5. MemoryWrite schema (PA-first, PE-tagged)
|
||||
|
||||
`MemoryWrite` represents a host-initiated write/initialize operation to device memory.
|
||||
|
||||
Mandatory fields:
|
||||
|
||||
- common envelope fields (D3)
|
||||
- destination placement tags (A 방식):
|
||||
- `dst_sip: int`
|
||||
- `dst_cube: int`
|
||||
- `dst_pe: int`
|
||||
- `dst_pa: int`
|
||||
- destination physical address in the destination PE's address space
|
||||
- `nbytes: int`
|
||||
- `src_kind: "pattern" | "host_buffer_ref"`
|
||||
- Phase 0 MUST support "pattern"
|
||||
- `pattern: Pattern | None`
|
||||
- required if `src_kind == "pattern"`
|
||||
|
||||
`Pattern` (Phase 0 mandatory support):
|
||||
|
||||
- `pattern_kind: "zero" | "fill_u8" | "fill_u16" | "fill_u32" | "fill_fp16" | "fill_fp32"`
|
||||
- `value: number | None`
|
||||
- required for fill_*; ignored for zero
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `dst_mem_kind: "HBM" | "TCM" | "AUTO"` (default "AUTO")
|
||||
- `debug_label: str | None`
|
||||
|
||||
Notes:
|
||||
|
||||
- This message MUST NOT embed bulk tensor data in Phase 0.
|
||||
- All latency MUST come from explicit graph traversal and modeled components.
|
||||
|
||||
---
|
||||
|
||||
### D6. MemoryRead schema (PA-first, PE-tagged)
|
||||
|
||||
`MemoryRead` represents a host-initiated read from device memory.
|
||||
|
||||
Mandatory fields:
|
||||
|
||||
- common envelope fields (D3)
|
||||
- source placement tags (A 방식):
|
||||
- `src_sip: int`
|
||||
- `src_cube: int`
|
||||
- `src_pe: int`
|
||||
- `src_pa: int`
|
||||
- `nbytes: int`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `dst_kind: "host_sink" | "discard"` (default "host_sink")
|
||||
- `debug_label: str | None`
|
||||
|
||||
Response payload:
|
||||
|
||||
- actual bytes are NOT required in Phase 0 (latency/traces focus)
|
||||
- implementations MAY return lightweight stats or hashes later via a new ADR
|
||||
|
||||
---
|
||||
|
||||
### D7. KernelLaunch schema (PA-first, PE-tagged shards)
|
||||
|
||||
`KernelLaunch` represents launching a kernel on a target device via IO_CPU.
|
||||
|
||||
Mandatory fields:
|
||||
|
||||
- common envelope fields (D3)
|
||||
- `kernel_ref: KernelRef`
|
||||
- `args: list[KernelArg]`
|
||||
|
||||
`KernelRef` MUST have:
|
||||
|
||||
- `name: str`
|
||||
- `kind: "deployed" | "builtin"`
|
||||
- `deploy_pa: int | None` — PA where kernel binary was deployed (required for "deployed")
|
||||
- `deploy_sip: int` — SIP where binary resides
|
||||
- `deploy_cube: int` — cube where binary resides
|
||||
- `deploy_pe: int` — PE where binary resides
|
||||
- `nbytes_code: int` — kernel binary size (for BW modeling)
|
||||
|
||||
Kernel binaries MUST be pre-deployed to device memory via MemoryWrite.
|
||||
KernelLaunch MUST NOT embed kernel source code or IR in the launch message.
|
||||
|
||||
`KernelArg` supports tensor args by PA mapping and scalars by value.
|
||||
|
||||
Tensor arg (mandatory):
|
||||
|
||||
- `arg_kind: "tensor"`
|
||||
- `tensor_pa_map: TensorPAMap`
|
||||
|
||||
`TensorPAMap` MUST have:
|
||||
|
||||
- `shards: list[TensorShard]`
|
||||
|
||||
`TensorShard` MUST have (A 방식 강제):
|
||||
|
||||
- `sip: int`
|
||||
- `cube: int`
|
||||
- `pe: int`
|
||||
- `pa: int`
|
||||
- `nbytes: int`
|
||||
- `offset_bytes: int`
|
||||
|
||||
Scalar arg (mandatory):
|
||||
|
||||
- `arg_kind: "scalar"`
|
||||
- `dtype: "i32" | "i64" | "fp16" | "fp32" | "bool"`
|
||||
- `value: number | bool`
|
||||
|
||||
Optional KernelLaunch fields:
|
||||
|
||||
- `grid: dict | None`
|
||||
- `meta: dict | None`
|
||||
- `failure_policy: "fail_fast" | "collect_all"` (default "fail_fast")
|
||||
- `debug_label: str | None`
|
||||
|
||||
Notes:
|
||||
|
||||
- KernelLaunch MUST NOT embed bulk tensor data.
|
||||
- KernelLaunch MUST be submitted only to the IO_CPU endpoint.
|
||||
- IO_CPU MUST fan-out work internally using the shard (sip,cube,pe) tags.
|
||||
|
||||
---
|
||||
|
||||
## Verification Notes
|
||||
|
||||
Tests SHOULD validate:
|
||||
|
||||
- schema validation rejects missing mandatory fields,
|
||||
- deterministic correlation/response matching,
|
||||
- MemoryWrite/Read/KernelLaunch produce explicit hop traces,
|
||||
- all routed requests incur latency > 0.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- ADR-0011 (PA-first memory addressing)
|
||||
- ADR-0007 (runtime_api vs sim_engine boundaries)
|
||||
- ADR-0009 (kernel execution fan-out/aggregation)
|
||||
- SPEC R2, R7, R8
|
||||
@@ -0,0 +1,139 @@
|
||||
# ADR-0013: Verification Strategy and Phase 1 Test Plan
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
KernBench is a system-level simulator whose correctness is defined by:
|
||||
|
||||
- adherence to SPEC-defined invariants,
|
||||
- determinism and debuggability,
|
||||
- explicit modeling of routing and latency.
|
||||
|
||||
Given the evolving implementation, we need a stable verification strategy
|
||||
that prevents architectural drift while allowing incremental development.
|
||||
|
||||
This ADR defines the Phase 1 verification plan and what constitutes
|
||||
"correct behavior" for early implementations.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Verification is contract-based
|
||||
|
||||
Verification MUST be derived from:
|
||||
|
||||
- SPEC requirements,
|
||||
- accepted ADRs.
|
||||
|
||||
Tests MUST validate architectural contracts, not incidental implementation details.
|
||||
|
||||
---
|
||||
|
||||
### D2. Phase 1 verification scope
|
||||
|
||||
Phase 1 verification focuses on:
|
||||
|
||||
- message contract validity (ADR-0012),
|
||||
- routing and fan-out semantics at the IO_CPU boundary (ADR-0009),
|
||||
- PA-first memory addressing and shard tagging (ADR-0011),
|
||||
- core latency and trace invariants (SPEC 0.1, R2).
|
||||
|
||||
Microarchitectural accuracy, bandwidth contention, and cycle-level behavior
|
||||
are explicitly out of scope in Phase 1.
|
||||
|
||||
---
|
||||
|
||||
### D3. Required Phase 1 verification cases
|
||||
|
||||
The following verification cases MUST be supported by the implementation:
|
||||
|
||||
#### V1. Message schema validation
|
||||
|
||||
- KernelLaunch requests missing `(sip, cube, pe)` in any tensor shard MUST be rejected.
|
||||
- MemoryWrite/MemoryRead requests missing destination/source placement tags MUST be rejected.
|
||||
- Completion results MUST follow the `ok / error_code / error_message` contract.
|
||||
|
||||
#### V2. IO_CPU fan-out and aggregation
|
||||
|
||||
Given:
|
||||
|
||||
- a topology with one SIP, one CUBE, and two PEs,
|
||||
- a KernelLaunch request containing two tensor shards targeting different PEs,
|
||||
|
||||
The system MUST:
|
||||
|
||||
- submit a single KernelLaunch to IO_CPU,
|
||||
- fan-out work internally to both PEs,
|
||||
- aggregate completion and return a single deterministic completion to the host.
|
||||
|
||||
#### V3. Latency and trace invariants
|
||||
|
||||
For any valid request:
|
||||
|
||||
- the hop-by-hop trace MUST be non-empty,
|
||||
- total latency MUST be greater than zero,
|
||||
- repeated runs with identical inputs MUST produce identical traces.
|
||||
|
||||
#### V4. Topology independence and cross-domain coverage
|
||||
|
||||
Verification cases MUST pass for multiple topology shapes, including:
|
||||
|
||||
- minimal: (1 SIP, 1 CUBE, 1 PE)
|
||||
- multi-PE: (1 SIP, 1 CUBE, N PEs)
|
||||
- multi-CUBE within a SIP: (1 SIP, M CUBEs, ≥1 PE per CUBE)
|
||||
- multi-SIP tray: (K SIPs, ≥1 CUBE per SIP, ≥1 PE per CUBE)
|
||||
|
||||
For multi-CUBE and multi-SIP topologies, Phase 1 verification focuses on:
|
||||
|
||||
- explicit connectivity (required links exist),
|
||||
- deterministic routing and control-path traversal,
|
||||
- non-empty traces and latency > 0 for representative cross-domain requests
|
||||
(inter-CUBE and inter-SIP paths).
|
||||
|
||||
Tests MUST NOT hardcode topology sizes, node ids, or link counts.
|
||||
Instead, tests MUST derive expectations from the compiled topology metadata
|
||||
---
|
||||
|
||||
### D4. Phase 1 artifacts
|
||||
|
||||
Phase 1 MAY include:
|
||||
|
||||
- verification-only test code,
|
||||
- topology fixtures,
|
||||
- trace inspection utilities.
|
||||
|
||||
Phase 1 MUST NOT require:
|
||||
|
||||
- production code changes solely to satisfy tests,
|
||||
- weakening or removing tests to allow progress.
|
||||
|
||||
---
|
||||
|
||||
### D5. Phase 2 enforcement
|
||||
|
||||
Phase 2 (Apply) MUST:
|
||||
|
||||
- run the Phase 1 verification cases,
|
||||
- rollback all changes if any verification fails,
|
||||
- preserve tests as authoritative contracts.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Architectural correctness is enforced early.
|
||||
- Tests serve as executable documentation of system behavior.
|
||||
- Implementation remains flexible without losing rigor.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC 0.1, R2, R6
|
||||
- ADR-0011 (PA-first memory addressing)
|
||||
- ADR-0012 (Host ↔ IO_CPU message schema)
|
||||
- ADR-0009 (Kernel execution semantics)
|
||||
@@ -0,0 +1,364 @@
|
||||
# ADR-0014: PE Internal Execution Model (PE_CPU, PE_SCHEDULER, and Composite Commands)
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
ADR-0003 (system hierarchy) and ADR-0009 (kernel execution semantics) reference PE internals but do not define:
|
||||
|
||||
- the dispatch model inside a PE,
|
||||
- the responsibilities of PE_SCHEDULER,
|
||||
- the PE_TCM-centric dataflow contract used by accelerator engines.
|
||||
|
||||
We need a deterministic and debuggable PE-internal execution contract that supports:
|
||||
|
||||
- simple single-engine commands
|
||||
- composite commands that build a tiled pipeline across DMA and accelerator engines
|
||||
|
||||
The simulator must produce deterministic traces and allow modeling of PE-internal pipelining without introducing nondeterministic engine scheduling.
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. PE internal component roles
|
||||
|
||||
Each PE contains the following logical components.
|
||||
|
||||
**PE_CPU**
|
||||
|
||||
- Executes kernel instruction stream or kernel control logic.
|
||||
- Generates PE commands.
|
||||
- Submits commands to PE_SCHEDULER.
|
||||
- PE_CPU does NOT enqueue work directly into engine queues.
|
||||
|
||||
**PE_SCHEDULER**
|
||||
|
||||
- The sole dispatcher inside a PE.
|
||||
- Receives commands from PE_CPU.
|
||||
- Expands composite commands into sub-commands.
|
||||
- Tracks dependencies and command state.
|
||||
- Dispatches work to engine queues.
|
||||
- Manages tile scheduling for composite commands.
|
||||
|
||||
**PE_DMA**
|
||||
|
||||
- Handles memory transfers between PE_TCM and external memory domains.
|
||||
- PE_DMA has **dual egress** at the CUBE level:
|
||||
- **→ XBAR**: dedicated path to HBM (local and cross-half via bridge)
|
||||
- **→ NOC**: path to non-HBM destinations (shared SRAM, inter-cube UCIe, etc.)
|
||||
- Supported directions include:
|
||||
- HBM → PE_TCM (via XBAR)
|
||||
- PE_TCM → HBM (via XBAR)
|
||||
- PE_TCM → shared SRAM (via NOC)
|
||||
- PE_TCM → other memory domains (via NOC, if supported by topology)
|
||||
|
||||
**PE_GEMM**
|
||||
|
||||
- Matrix multiplication engine.
|
||||
- Reads activations from PE_TCM.
|
||||
- May stream weights directly from HBM.
|
||||
|
||||
**PE_MATH**
|
||||
|
||||
- Element-wise computation engine.
|
||||
- Reads and writes PE_TCM.
|
||||
|
||||
**PE_TCM**
|
||||
|
||||
- Local SRAM used as the staging memory for accelerator operations.
|
||||
|
||||
---
|
||||
|
||||
### D2. Command lifecycle and queues
|
||||
|
||||
PE_SCHEDULER maintains three logical structures.
|
||||
|
||||
**SubmissionQueue**
|
||||
|
||||
- Written by PE_CPU.
|
||||
- Contains incoming PE commands waiting to be processed.
|
||||
|
||||
**InflightTable**
|
||||
|
||||
- Owned and mutated only by PE_SCHEDULER.
|
||||
- Tracks:
|
||||
- expanded sub-commands
|
||||
- dependency state
|
||||
- engine assignment
|
||||
- completion status
|
||||
|
||||
**CompletionQueue**
|
||||
|
||||
- Written by PE_SCHEDULER.
|
||||
- Contains final completion records for commands.
|
||||
|
||||
**Single-writer rule**
|
||||
|
||||
- Only PE_SCHEDULER is allowed to mutate command completion state.
|
||||
- Engine components must report completion via explicit completion events/messages.
|
||||
|
||||
**Command completion**
|
||||
|
||||
A command becomes DONE when:
|
||||
|
||||
- all sub-commands complete
|
||||
- PE_SCHEDULER publishes a completion record to CompletionQueue.
|
||||
|
||||
---
|
||||
|
||||
### D3. Dispatch modes
|
||||
|
||||
PE commands are divided into two categories.
|
||||
|
||||
#### D3.1 Simple command
|
||||
|
||||
A simple command expands to exactly one engine sub-command.
|
||||
|
||||
Examples include:
|
||||
|
||||
- DMA transfer
|
||||
- GEMM compute
|
||||
- MATH compute
|
||||
|
||||
Execution flow:
|
||||
|
||||
```
|
||||
PE_CPU → SubmissionQueue → PE_SCHEDULER → engine queue → engine execution → completion event → PE_SCHEDULER → CompletionQueue
|
||||
```
|
||||
|
||||
#### D3.2 Composite command (tiled pipeline)
|
||||
|
||||
Composite commands implement tiled pipelined execution across engines.
|
||||
|
||||
Each tile executes the following pipeline:
|
||||
|
||||
```
|
||||
Input DMA (READ)
|
||||
→ Compute (GEMM or MATH)
|
||||
→ Output DMA (WRITE)
|
||||
```
|
||||
|
||||
**Tiling rule**
|
||||
|
||||
If the DMA payload exceeds hardware tile size, PE_SCHEDULER splits the transfer into tiles.
|
||||
Each tile is assigned a monotonically increasing `tile_id`.
|
||||
|
||||
**Tile dependency rules**
|
||||
|
||||
For tile `t`:
|
||||
|
||||
- Compute must wait for input DMA: `DMA_READ(t) → COMPUTE(t)`
|
||||
- Output DMA must wait for compute: `COMPUTE(t) → DMA_WRITE(t)`
|
||||
- All dependencies are enforced by PE_SCHEDULER.
|
||||
|
||||
**Overlap policy (Phase 0 default)**
|
||||
|
||||
Operations for different tiles may overlap when engine resources permit.
|
||||
|
||||
Allowed overlaps:
|
||||
|
||||
```
|
||||
DMA_READ(t+1) ∥ COMPUTE(t)
|
||||
DMA_WRITE(t−1) ∥ COMPUTE(t)
|
||||
DMA_READ(t) ∥ DMA_WRITE(t)
|
||||
```
|
||||
|
||||
Disallowed overlaps:
|
||||
|
||||
```
|
||||
GEMM(t) ∥ GEMM(t′)
|
||||
MATH(t) ∥ MATH(t′)
|
||||
GEMM(t) ∥ MATH(t′)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### D4. Engine execution model (Phase 0 default)
|
||||
|
||||
Each engine behaves as a deterministic service resource.
|
||||
|
||||
**DMA engine**
|
||||
|
||||
PE_DMA contains two independent channels.
|
||||
|
||||
```
|
||||
DMA_READ capacity = 1
|
||||
DMA_WRITE capacity = 1
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- DMA_READ and DMA_WRITE may execute concurrently.
|
||||
- Multiple READs cannot overlap.
|
||||
- Multiple WRITEs cannot overlap.
|
||||
|
||||
Example allowed:
|
||||
|
||||
```
|
||||
DMA_READ(t+1) ∥ DMA_WRITE(t)
|
||||
```
|
||||
|
||||
Example not allowed:
|
||||
|
||||
```
|
||||
DMA_READ(t) ∥ DMA_READ(t+1)
|
||||
DMA_WRITE(t) ∥ DMA_WRITE(t+1)
|
||||
```
|
||||
|
||||
**Compute engine**
|
||||
|
||||
Compute operations share a single compute resource.
|
||||
|
||||
```
|
||||
PE_ACCEL capacity = 1
|
||||
```
|
||||
|
||||
Both GEMM and MATH require this shared compute slot.
|
||||
|
||||
Consequences:
|
||||
|
||||
- GEMM ∥ GEMM not allowed
|
||||
- MATH ∥ MATH not allowed
|
||||
- GEMM ∥ MATH not allowed
|
||||
|
||||
Only one compute operation can run in a PE at a time.
|
||||
|
||||
**Compute opcode restriction**
|
||||
|
||||
Composite commands contain one compute opcode only.
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
COMPOSITE_GEMM
|
||||
COMPOSITE_MATH
|
||||
```
|
||||
|
||||
Mixed compute pipelines such as `GEMM → MATH` are not supported in Phase 0.
|
||||
|
||||
**Engine completion signaling**
|
||||
|
||||
Every engine emits a completion event when a sub-command finishes.
|
||||
Completion events are delivered to PE_SCHEDULER.
|
||||
|
||||
---
|
||||
|
||||
### D5. Dataflow model
|
||||
|
||||
Compute operations use a TCM-centric dataflow model.
|
||||
|
||||
**Input path (HBM)**
|
||||
|
||||
```
|
||||
HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
|
||||
```
|
||||
|
||||
**Input path (shared SRAM)**
|
||||
|
||||
```
|
||||
Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM
|
||||
```
|
||||
|
||||
**Compute stage**
|
||||
|
||||
Compute engines read input tensors from PE_TCM.
|
||||
|
||||
```
|
||||
PE_TCM → GEMM / MATH
|
||||
```
|
||||
|
||||
Weights for GEMM may optionally stream directly from HBM (via XBAR).
|
||||
|
||||
**Output path (HBM)**
|
||||
|
||||
Compute results are written to PE_TCM, then DMA writes to HBM.
|
||||
|
||||
```
|
||||
PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
|
||||
```
|
||||
|
||||
**Output path (shared SRAM)**
|
||||
|
||||
```
|
||||
PE_TCM → PE_DMA (DMA_WRITE) → NOC → Shared SRAM
|
||||
```
|
||||
|
||||
#### D5.1 PE_TCM partitioning and ownership boundary
|
||||
|
||||
The PE_TCM address space is partitioned into two logical regions.
|
||||
|
||||
**SchedulerReservedTCM**
|
||||
|
||||
- A staging region owned exclusively by PE_SCHEDULER.
|
||||
- This region is used for composite command tile buffers.
|
||||
- PE_SCHEDULER:
|
||||
- partitions this region into tile buffers
|
||||
- assigns buffers for DMA_READ, COMPUTE, and DMA_WRITE stages
|
||||
- guarantees input/output buffer separation
|
||||
- manages tile buffer lifetime
|
||||
|
||||
**AllocatableTCM**
|
||||
|
||||
- General-purpose region managed by PEMemAllocator.
|
||||
- Used by host or DP-visible allocations.
|
||||
|
||||
**Visibility rule (hard isolation)**
|
||||
|
||||
- PEMemAllocator must not see or allocate memory inside SchedulerReservedTCM.
|
||||
- SchedulerReservedTCM is excluded from allocator-managed ranges by construction.
|
||||
- This prevents DP or host allocations from interfering with scheduler staging buffers.
|
||||
|
||||
**Tile buffer rules**
|
||||
|
||||
Within SchedulerReservedTCM:
|
||||
|
||||
- input buffers and output buffers must not overlap
|
||||
- PE_SCHEDULER assigns tile buffers for DMA and compute stages
|
||||
- tile buffers remain valid until the corresponding DMA_WRITE completes
|
||||
- Buffer reuse is allowed only after the tile lifetime finishes.
|
||||
|
||||
---
|
||||
|
||||
### D6. Observability and trace contract
|
||||
|
||||
The simulator must emit deterministic trace events.
|
||||
|
||||
Required events include:
|
||||
|
||||
- `command_submitted`
|
||||
- `sub_command_dispatched`
|
||||
- `engine_start`
|
||||
- `engine_complete`
|
||||
- `tile_ready`
|
||||
- `command_complete`
|
||||
|
||||
Trace ordering must be deterministic for identical inputs.
|
||||
|
||||
---
|
||||
|
||||
### D7. Topology representation
|
||||
|
||||
PE internal components are declared in `cube.pe_template`.
|
||||
|
||||
The template is instantiated once per PE.
|
||||
|
||||
PE instances are derived from `cube.pe_layout`.
|
||||
|
||||
External connectivity such as:
|
||||
|
||||
- PE_DMA → XBAR (HBM data path)
|
||||
- PE_DMA → NOC (non-HBM data path: shared SRAM, inter-cube UCIe)
|
||||
- NOC → PE_CPU (command path from M_CPU)
|
||||
|
||||
is modeled at the CUBE level (see ADR-0003 D3).
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC R3, R4
|
||||
- ADR-0003 D4 (PE-level system hierarchy)
|
||||
- ADR-0005 View C (PE-level diagram)
|
||||
- ADR-0008 D2 (PA-level allocation at PE scope; PEMemAllocator is the per-PE allocator instance)
|
||||
- ADR-0009 D3 (kernel execution fan-out and PE_CPU dispatch)
|
||||
@@ -0,0 +1,178 @@
|
||||
# ADR-0015: Component Port/Wire Model and Fabric Routing
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
ADR-0007 D2 assigns path-walking and low-level request decomposition to the simulation engine.
|
||||
In practice, the engine iterates the topology path and calls `run()` on each component
|
||||
sequentially — conflating routing policy with component behavior and preventing realistic
|
||||
hardware modeling (queues, contention, fan-out).
|
||||
|
||||
ADR-0007 D3 already states that components own fan-out and aggregation, but the current
|
||||
implementation does not enforce this for fabric traversal.
|
||||
|
||||
This ADR defines:
|
||||
|
||||
- how components communicate via typed port queues,
|
||||
- how propagation delay is modeled (wire processes),
|
||||
- the fabric path for Memory R/W through M_CPU.DMA,
|
||||
- the reduced role of the simulation engine,
|
||||
- M_CPU.DMA as an internal subcomponent of M_CPU.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Component port model
|
||||
|
||||
Each component has typed input/output ports modeled as SimPy Stores:
|
||||
|
||||
```
|
||||
in_ports: dict[str, simpy.Store] # keyed by source node_id
|
||||
out_ports: dict[str, simpy.Store] # keyed by destination node_id
|
||||
```
|
||||
|
||||
Ports are created at engine initialization based on graph edges.
|
||||
Each directed edge (src → dst) results in:
|
||||
|
||||
- `src.out_ports[dst]` — the sending end
|
||||
- `dst.in_ports[src]` — the receiving end
|
||||
|
||||
---
|
||||
|
||||
### D2. Wire process (propagation delay)
|
||||
|
||||
For each directed edge (src, dst) in the topology graph, a SimPy wire process
|
||||
models propagation delay:
|
||||
|
||||
```python
|
||||
def wire_process(env, out_port, in_port, delay_ns):
|
||||
while True:
|
||||
cmd = yield out_port.get()
|
||||
yield env.timeout(delay_ns)
|
||||
yield in_port.put(cmd)
|
||||
```
|
||||
|
||||
Wire processes are started at engine initialization.
|
||||
BW constraints are enforced by the sending component's out_port capacity or token model,
|
||||
not by the wire process itself.
|
||||
|
||||
---
|
||||
|
||||
### D3. Engine role (reduced)
|
||||
|
||||
The simulation engine MUST:
|
||||
|
||||
- wire components at initialization (create port Stores, start wire processes),
|
||||
- identify the entry component for each request type (PCIE_EP),
|
||||
- put the request into the entry component's in_port,
|
||||
- wait for a completion event.
|
||||
|
||||
The simulation engine MUST NOT:
|
||||
|
||||
- walk the topology path during request execution,
|
||||
- call component `run()` methods directly,
|
||||
- track per-hop latency or decompose fan-out.
|
||||
|
||||
This supersedes ADR-0007 D2's "decompose operations into low-level requests" clause.
|
||||
ADR-0007 D2 must be amended accordingly.
|
||||
|
||||
---
|
||||
|
||||
### D4. Unified fabric path for Memory R/W and Kernel Launch
|
||||
|
||||
Both Memory R/W and Kernel Launch use the same fabric path to reach the target cube's M_CPU.
|
||||
The difference is what M_CPU does upon receiving the request.
|
||||
|
||||
**Forward path (IO_CPU → target M_CPU):**
|
||||
|
||||
```
|
||||
IO_CPU
|
||||
→ [transit cubes: ucie_out → wire → ucie_in → noc → ucie_out] (zero or more)
|
||||
→ target cube: ucie_in → noc → M_CPU
|
||||
```
|
||||
|
||||
**At M_CPU (diverges by operation type):**
|
||||
|
||||
```
|
||||
Memory R/W: M_CPU → M_CPU.DMA → noc → hbm_ctrl
|
||||
Kernel Launch: M_CPU → PE[0..n] (parallel fan-out)
|
||||
```
|
||||
|
||||
**Completion path (reverse, same fabric):**
|
||||
|
||||
```
|
||||
Memory R/W: hbm_ctrl → noc → M_CPU.DMA → M_CPU
|
||||
Kernel Launch: PE[0..n] all complete → M_CPU (aggregation)
|
||||
|
||||
M_CPU → [transit cubes: ucie → noc → ucie] → IO_CPU → runtime_api
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### D5. M_CPU.DMA is an internal subcomponent of M_CPU
|
||||
|
||||
M_CPU.DMA is NOT a separate topology node.
|
||||
It is an internal subcomponent owned by the M_CPU component implementation.
|
||||
|
||||
M_CPU.DMA:
|
||||
|
||||
- owns the DMA READ and DMA WRITE queues (capacity=1 each, per ADR-0014 D4),
|
||||
- issues memory requests over the NOC to hbm_ctrl,
|
||||
- receives completion from hbm_ctrl via the NOC,
|
||||
- reports completion to M_CPU,
|
||||
- is created and managed inside M_CPU's `__init__` and `run()`.
|
||||
|
||||
M_CPU.DMA does not appear as a node in the compiled topology graph.
|
||||
|
||||
---
|
||||
|
||||
### D6. Transit cube forwarding
|
||||
|
||||
A cube that is not the target of a memory or kernel request acts as a transit node.
|
||||
Transit cubes forward requests without consuming them:
|
||||
|
||||
```
|
||||
ucie_in (from upstream) → noc → ucie_out (to downstream)
|
||||
```
|
||||
|
||||
Transit forwarding is implemented entirely within the ucie_in component.
|
||||
The noc and ucie_out components in a transit cube forward the packet without modification.
|
||||
|
||||
---
|
||||
|
||||
### D7. _formula_latency is preserved as a lower-bound cross-check
|
||||
|
||||
The path-based formula latency function (`_formula_latency`) is preserved in the engine
|
||||
as a lower bound for correctness verification.
|
||||
|
||||
Invariant:
|
||||
|
||||
- Phase 0: `_formula_latency == component model total_ns`
|
||||
- Phase 1+: `_formula_latency <= component model total_ns` (contention adds queueing)
|
||||
|
||||
This function is independent of the port/wire model and requires only the topology graph.
|
||||
It is used for shard comparison in `_route_kernel` and as a regression guard.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Components model realistic hardware behavior (queues, contention, fan-out).
|
||||
- Propagation delay is modeled accurately per edge.
|
||||
- Engine is decoupled from routing policy.
|
||||
- Component implementations remain swappable via DI (ADR-0007 D3).
|
||||
- ADR-0007 D2 must be amended to remove path-walking from engine responsibilities.
|
||||
- ADR-0009 D3 should be updated to reference the unified fabric path (D4 above).
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- ADR-0007 D2 (to be amended: engine path-walking clause)
|
||||
- ADR-0009 D3 (kernel execution fan-out; fabric path to be referenced)
|
||||
- ADR-0014 D4 (DMA engine capacity=1)
|
||||
- ADR-0012 D1 (host ↔ IO_CPU message schema; M_CPU.DMA is component-internal)
|
||||
@@ -0,0 +1,363 @@
|
||||
# 실무 DI 패턴: kernbench 구현으로 배우는 Dependency Injection
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 1 — 오늘 이야기할 것
|
||||
|
||||
**질문:** 코드를 어떻게 설계해야 테스트하기 쉽고, 갈아끼우기 쉬울까?
|
||||
|
||||
**답:** Dependency Injection (DI)
|
||||
|
||||
오늘은 이론이 아니라 **실제로 돌아가는 시뮬레이터 코드**를 보면서 배웁니다.
|
||||
|
||||
```
|
||||
kernbench
|
||||
└── AI 가속기 하드웨어를 Python으로 시뮬레이션하는 프레임워크
|
||||
- 수십 개의 하드웨어 컴포넌트 (NOC, HBM, PE, CPU...)
|
||||
- 각 컴포넌트는 런타임에 교체 가능
|
||||
- 테스트에서 Mock 컴포넌트로 즉시 대체 가능
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 2 — DI가 없으면 어떤 일이 생기나
|
||||
|
||||
```python
|
||||
# ❌ DI 없는 코드
|
||||
class IoCpuComponent:
|
||||
def run(self, env, nbytes):
|
||||
router = PathRouter() # 직접 생성 — 교체 불가
|
||||
hbm = HbmCtrlComponent() # 직접 생성 — 교체 불가
|
||||
yield env.timeout(10.0)
|
||||
```
|
||||
|
||||
**문제:**
|
||||
- 테스트할 때 실제 `PathRouter`와 `HbmCtrl`이 항상 따라온다
|
||||
- 컴포넌트를 Mock으로 바꾸려면 **소스 코드를 수정**해야 한다
|
||||
- 다른 topology(다른 라우팅 전략)를 쓰고 싶으면 **또 수정**
|
||||
|
||||
> 클래스가 자기 의존성을 스스로 만들면, 그 클래스는 의존성과 결합된다
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 3 — DI의 핵심 원칙
|
||||
|
||||
**의존성은 밖에서 만들어서 안으로 넣어준다**
|
||||
|
||||
```
|
||||
┌────────────────────────────┐
|
||||
│ 조립자 (Assembler) │ ← 누가 무엇을 쓸지 결정
|
||||
│ GraphEngine.__init__ │
|
||||
└────────────┬───────────────┘
|
||||
│ ctx 주입
|
||||
▼
|
||||
┌────────────────────────────┐
|
||||
│ 컴포넌트 (Component) │ ← 어떻게 동작하는지만 알면 됨
|
||||
│ IoCpuComponent │
|
||||
│ self.ctx.router.find_path(...) ← 그냥 사용
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
**세 가지 역할 분리:**
|
||||
1. **Interface** — 무엇을 할 수 있는가 (`ComponentBase`)
|
||||
2. **Implementation** — 어떻게 하는가 (`IoCpuComponent`, `HbmCtrlComponent`, ...)
|
||||
3. **Assembler** — 무엇을 연결할 것인가 (`GraphEngine`)
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 4 — 패턴 1: Constructor Injection
|
||||
|
||||
> 생성자로 의존성을 받는다
|
||||
|
||||
```python
|
||||
# kernbench/components/base.py
|
||||
|
||||
class ComponentBase(ABC):
|
||||
def __init__(self, node: Node, ctx: ComponentContext | None = None):
|
||||
self.node = node
|
||||
self.ctx = ctx # 외부에서 주입받은 의존성
|
||||
self.in_ports: dict[str, simpy.Store] = {}
|
||||
self.out_ports: dict[str, simpy.Store] = {}
|
||||
```
|
||||
|
||||
```python
|
||||
# 사용 측 — ctx를 직접 만들지 않는다
|
||||
class IoCpuComponent(ComponentBase):
|
||||
def _dispatch(self, env, txn):
|
||||
path = self.ctx.router.find_node_path(...) # ctx는 이미 들어와 있음
|
||||
yield self.out_ports[next_hop].put(...)
|
||||
```
|
||||
|
||||
**언제 쓰나:**
|
||||
- 컴포넌트가 살아있는 동안 의존성이 바뀌지 않을 때
|
||||
- 의존성 없이는 컴포넌트가 동작하지 않을 때 (필수 의존성)
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 5 — Context Object 패턴
|
||||
|
||||
> 의존성이 많아지면 묶어서 하나로
|
||||
|
||||
```python
|
||||
# kernbench/components/context.py
|
||||
|
||||
@dataclass
|
||||
class ComponentContext:
|
||||
router: PathRouter # 라우팅 정책
|
||||
resolver: AddressResolver # 주소 해석
|
||||
positions: dict[str, ...] # 물리적 위치 정보
|
||||
ns_per_mm: float # 전파 지연 상수
|
||||
edge_map: dict[...] # 엣지 정보
|
||||
spec: dict # 토폴로지 스펙
|
||||
```
|
||||
|
||||
**왜 Context로 묶나?**
|
||||
- 생성자 인자가 6개면 → 컴포넌트 추가할 때마다 시그니처 변경
|
||||
- Context 하나면 → 새 필드 추가해도 기존 컴포넌트 무영향
|
||||
- 컴포넌트는 **필요한 것만 꺼내 쓴다**
|
||||
|
||||
```python
|
||||
class TwoDMeshNocComponent(ComponentBase):
|
||||
def _route(self, env, txn):
|
||||
src_pos = self.ctx.positions.get(prev_hop) # 위치만 사용
|
||||
ns_per_mm = self.ctx.ns_per_mm # 상수만 사용
|
||||
# router, resolver 등은 건드리지 않음
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 6 — 패턴 2: Registry + Factory
|
||||
|
||||
> 문자열 키 → 클래스 매핑으로 런타임 교체
|
||||
|
||||
```python
|
||||
# kernbench/components/base.py
|
||||
|
||||
class ComponentRegistry:
|
||||
_registry: dict[str, type[ComponentBase]] = {}
|
||||
|
||||
@classmethod
|
||||
def register(cls, impl: str, component_cls: type[ComponentBase]):
|
||||
cls._registry[impl] = component_cls
|
||||
|
||||
@classmethod
|
||||
def create(cls, node, overrides=None, ctx=None) -> ComponentBase:
|
||||
if overrides and node.impl in overrides:
|
||||
return overrides[node.impl](node, ctx) # 1순위: 호출자 override
|
||||
if node.impl in cls._registry:
|
||||
return cls._registry[node.impl](node, ctx) # 2순위: 등록된 구현
|
||||
return DefaultComponent(node, ctx) # 3순위: 기본값 fallback
|
||||
```
|
||||
|
||||
**Resolution 우선순위:**
|
||||
```
|
||||
overrides[impl] ← 테스트/실험용 주입
|
||||
↓ (없으면)
|
||||
_registry[impl] ← 프로덕션 구현
|
||||
↓ (없으면)
|
||||
DefaultComponent ← 안전한 fallback
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 7 — Registry 등록 방식
|
||||
|
||||
```python
|
||||
# kernbench/components/impls/__init__.py
|
||||
|
||||
from kernbench.components.base import ComponentRegistry
|
||||
from kernbench.components.impls.noc import TwoDMeshNocComponent
|
||||
from kernbench.components.impls.io_cpu import IoCpuComponent
|
||||
# ...
|
||||
|
||||
ComponentRegistry.register("noc_2d_mesh_v1", TwoDMeshNocComponent)
|
||||
ComponentRegistry.register("io_cpu_v1", IoCpuComponent)
|
||||
ComponentRegistry.register("hbm_ctrl_v1", HbmCtrlComponent)
|
||||
# ...
|
||||
```
|
||||
|
||||
**topology.yaml (설정 파일)**
|
||||
```yaml
|
||||
nodes:
|
||||
- id: sip0.cube0.noc
|
||||
impl: noc_2d_mesh_v1 # ← 이 문자열이 Registry 키
|
||||
```
|
||||
|
||||
**흐름:**
|
||||
```
|
||||
YAML → impl 문자열 → Registry.create() → 실제 컴포넌트 인스턴스
|
||||
```
|
||||
|
||||
impl 문자열만 바꾸면 동작이 바뀐다. 코드 수정 없음.
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 8 — 패턴 3: Override Injection (테스트용)
|
||||
|
||||
> 호출자가 특정 impl만 갈아끼운다
|
||||
|
||||
```python
|
||||
# tests/test_component_registry.py
|
||||
|
||||
class SpyXbar(ComponentBase):
|
||||
calls = 0
|
||||
|
||||
def run(self, env, nbytes):
|
||||
SpyXbar.calls += 1
|
||||
yield env.timeout(0)
|
||||
|
||||
|
||||
# 테스트에서 xbar_v1만 SpyXbar로 교체
|
||||
engine = GraphEngine(
|
||||
graph,
|
||||
component_overrides={"xbar_v1": SpyXbar} # ← 이것만 추가
|
||||
)
|
||||
|
||||
result = engine.run(msg)
|
||||
assert SpyXbar.calls > 0 # Xbar가 실제로 호출됐는지 검증
|
||||
```
|
||||
|
||||
**핵심:** 테스트 코드가 프로덕션 코드를 **수정하지 않는다**
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 9 — 조립자: GraphEngine
|
||||
|
||||
> 컴포넌트를 생성하고 연결하는 유일한 곳
|
||||
|
||||
```python
|
||||
# kernbench/sim_engine/engine.py
|
||||
|
||||
class GraphEngine:
|
||||
def __init__(self, graph, component_overrides=None):
|
||||
|
||||
# 1. 공유 의존성 생성
|
||||
ctx = ComponentContext(
|
||||
router=PathRouter(graph),
|
||||
resolver=AddressResolver(graph),
|
||||
positions={nid: n.pos_mm for nid, n in graph.nodes.items()},
|
||||
ns_per_mm=...,
|
||||
)
|
||||
|
||||
# 2. 컴포넌트 생성 (DI: ctx 주입)
|
||||
self._components = {
|
||||
node_id: ComponentRegistry.create(node, overrides, ctx)
|
||||
for node_id, node in graph.nodes.items()
|
||||
}
|
||||
|
||||
# 3. 포트 연결 (배선)
|
||||
for e in graph.edges:
|
||||
store = simpy.Store(self._env)
|
||||
self._components[e.src].out_ports[e.dst] = store
|
||||
self._components[e.dst].in_ports[e.src] = store
|
||||
```
|
||||
|
||||
**생성 → 주입 → 연결** — 이 세 단계가 한 곳에서만 일어난다
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 10 — 전체 구조 한눈에 보기
|
||||
|
||||
```
|
||||
topology.yaml
|
||||
│ impl: "noc_2d_mesh_v1"
|
||||
▼
|
||||
GraphEngine.__init__() ← 조립자
|
||||
│
|
||||
├── ComponentContext 생성 ← 공유 의존성 묶음
|
||||
│ ├── PathRouter
|
||||
│ ├── AddressResolver
|
||||
│ └── positions, ns_per_mm, ...
|
||||
│
|
||||
├── ComponentRegistry.create(node, overrides, ctx)
|
||||
│ ├── overrides["noc_2d_mesh_v1"]? → SpyNoc (테스트)
|
||||
│ ├── registry["noc_2d_mesh_v1"]? → TwoDMeshNocComponent (프로덕션)
|
||||
│ └── fallback → DefaultComponent
|
||||
│
|
||||
└── 포트 배선: out_ports / in_ports 연결
|
||||
|
||||
Component (TwoDMeshNocComponent)
|
||||
└── self.ctx.positions, self.ctx.ns_per_mm 사용
|
||||
(라우터, 리졸버는 건드리지 않음 — 필요한 것만)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 11 — 무엇을 얻었나
|
||||
|
||||
| 상황 | DI 없이 | DI 있이 |
|
||||
|------|---------|---------|
|
||||
| NOC 알고리즘 교체 | 소스 코드 수정 | YAML에서 impl 문자열 변경 |
|
||||
| Xbar 동작 검증 | 실제 HW 전부 구동 | `overrides={"xbar_v1": SpyXbar}` |
|
||||
| 새 컴포넌트 추가 | 기존 코드 수정 | `register("new_v1", NewComp)` |
|
||||
| 컨텍스트 필드 추가 | 모든 생성자 수정 | `ComponentContext`에 필드 추가 |
|
||||
| 테스트 격리 | 불가능 | 필요한 것만 override |
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 12 — 실무 적용 체크리스트
|
||||
|
||||
**설계할 때 물어볼 것:**
|
||||
|
||||
1. **이 클래스가 직접 `new`(생성)하는 것은 무엇인가?**
|
||||
→ 생성하는 것 = 교체할 수 없는 것. 생성자로 받을 수 없는지 검토.
|
||||
|
||||
2. **의존성이 3개 이상이면?**
|
||||
→ Context Object로 묶어라.
|
||||
|
||||
3. **테스트에서 이 클래스를 단독으로 실행할 수 있는가?**
|
||||
→ 없다면 DI가 필요하다는 신호.
|
||||
|
||||
4. **설정(YAML/config)으로 동작을 바꾸고 싶은가?**
|
||||
→ Registry + 문자열 키 패턴.
|
||||
|
||||
5. **누가 조립하는가?**
|
||||
→ 조립자는 하나여야 한다. 컴포넌트 안에 조립 로직이 있으면 안 된다.
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 13 — 안티패턴: 이것은 하지 말자
|
||||
|
||||
```python
|
||||
# ❌ 서비스 로케이터 (컴포넌트 안에서 registry 호출)
|
||||
class BadComponent(ComponentBase):
|
||||
def run(self, env, nbytes):
|
||||
router = ComponentRegistry.get("router") # 컴포넌트가 직접 찾는다
|
||||
...
|
||||
|
||||
# ❌ 전역 싱글톤 직접 참조
|
||||
class BadComponent(ComponentBase):
|
||||
def run(self, env, nbytes):
|
||||
router = GlobalRouter.instance() # 교체 불가
|
||||
...
|
||||
|
||||
# ❌ 생성자 안에서 의존성 생성
|
||||
class BadComponent(ComponentBase):
|
||||
def __init__(self, node):
|
||||
self.router = PathRouter(node.graph) # 테스트에서 격리 불가
|
||||
```
|
||||
|
||||
**공통 문제:** 컴포넌트가 자기 의존성을 스스로 해결한다 → 결합도 증가
|
||||
|
||||
---
|
||||
|
||||
## 슬라이드 14 — 요약
|
||||
|
||||
> **DI = 의존성의 생성과 사용을 분리하는 것**
|
||||
|
||||
```
|
||||
생성 → Registry / Assembler (GraphEngine)
|
||||
사용 → Component (IoCpuComponent, TwoDMeshNocComponent, ...)
|
||||
```
|
||||
|
||||
**kernbench에서 배운 패턴 3가지:**
|
||||
|
||||
1. **Constructor Injection** — 필수 의존성은 생성자로
|
||||
2. **Context Object** — 의존성 묶음을 하나의 dataclass로
|
||||
3. **Registry + Override** — 문자열 키로 구현체 선택, 테스트에서 교체
|
||||
|
||||
**결과:** 141개 테스트, YAML 한 줄로 컴포넌트 교체, 프로덕션 코드 수정 없이 Mock 주입
|
||||
|
||||
---
|
||||
|
||||
*참고 코드: kernbench/src/kernbench/components/*
|
||||
@@ -0,0 +1,26 @@
|
||||
# Generated Diagrams
|
||||
|
||||
This directory contains diagrams generated from topology compilation.
|
||||
|
||||
## What these files are
|
||||
- Derived artifacts generated from:
|
||||
- compiled topology graph
|
||||
- distance (accumulated latency) metadata
|
||||
- view/layout rules (ADR-0005)
|
||||
|
||||
These files are meant for quick visual inspection and review.
|
||||
|
||||
## Default outputs
|
||||
- SIP view: `sip_view.mmd` (and/or `sip_view.dot`)
|
||||
- CUBE view: `cube_view.mmd` (and/or `cube_view.dot`)
|
||||
- PE view: `pe_view.mmd` (and/or `pe_view.dot`)
|
||||
|
||||
## How to preview
|
||||
- In VS Code:
|
||||
- open `.mmd` or `.md` containing Mermaid blocks and use Markdown Preview
|
||||
- for `.dot`, use a Graphviz preview extension or `dot -Tpng`
|
||||
|
||||
## Notes
|
||||
- Diagrams are representative and distance-aware by default.
|
||||
- Instance indices are not required unless debugging asymmetry.
|
||||
- Outputs should be deterministic for the same topology and rules.
|
||||
@@ -0,0 +1,156 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="556" height="472" viewBox="0 0 556 472">
|
||||
<title>cube</title>
|
||||
<rect width="556" height="472" fill="#f8fafc"/>
|
||||
<text x="278" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">CUBE VIEW</text>
|
||||
<rect x="40.0" y="40.0" width="476.0" height="392.0" rx="6" fill="none" stroke="#475569" stroke-width="2" stroke-dasharray="8,4"/>
|
||||
<rect x="152.0" y="166.0" width="252.0" height="140.0" rx="4" fill="#d1fae5" stroke="#10b981" stroke-width="1.5" stroke-dasharray="6,3" opacity="0.5"/>
|
||||
<text x="278.0" y="278.0" text-anchor="middle" font-family="monospace" font-size="11" fill="#047857" opacity="0.7">HBM</text>
|
||||
<polyline points="82.0,82.0 82.0,95.0 82.0,95.0 82.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="82.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="82.0,82.0 82.0,144.0 334.0,144.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,144.0 82.0,144.0 82.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="166.0,82.0 166.0,95.0 166.0,95.0 166.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="166.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="166.0,82.0 166.0,154.0 334.0,154.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,144.0 166.0,144.0 166.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="390.0,82.0 390.0,95.0 390.0,95.0 390.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="390.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="390.0,82.0 390.0,164.0 334.0,164.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,144.0 390.0,144.0 390.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="474.0,82.0 474.0,95.0 474.0,95.0 474.0,138.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="474.0" y="92.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="474.0,82.0 474.0,174.0 334.0,174.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,144.0 474.0,144.0 474.0,82.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="82.0,390.0 82.0,347.0 82.0,347.0 82.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="82.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="82.0,390.0 82.0,338.0 334.0,338.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,298.0 82.0,298.0 82.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="166.0,390.0 166.0,347.0 166.0,347.0 166.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="166.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="166.0,390.0 166.0,348.0 334.0,348.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,298.0 166.0,298.0 166.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="390.0,390.0 390.0,347.0 390.0,347.0 390.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="390.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="390.0,390.0 390.0,358.0 334.0,358.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,298.0 390.0,298.0 390.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="474.0,390.0 474.0,347.0 474.0,347.0 474.0,334.0" fill="none" stroke="#f97316" stroke-width="1" opacity="0.8"/>
|
||||
<text x="474.0" y="344.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">6.0mm 256GB/s</text>
|
||||
<polyline points="474.0,390.0 474.0,368.0 334.0,368.0 334.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="334.0,236.0 334.0,298.0 474.0,298.0 474.0,390.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="82.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="152.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="166.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="194.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="390.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="306.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="474.0,138.0 222.0,138.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="348.0" y="183.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="82.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="152.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="166.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="194.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="390.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="306.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<polyline points="474.0,334.0 222.0,334.0 222.0,236.0" fill="none" stroke="#10b981" stroke-width="1" opacity="0.8"/>
|
||||
<text x="348.0" y="281.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.5mm 256GB/s</text>
|
||||
<line x1="82.0" y1="138.0" x2="166.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="124.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="166.0" y1="138.0" x2="82.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="124.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="166.0" y1="138.0" x2="390.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="278.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
||||
<line x1="390.0" y1="138.0" x2="166.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="278.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
||||
<line x1="390.0" y1="138.0" x2="474.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="432.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="474.0" y1="138.0" x2="390.0" y2="138.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="432.0" y="134.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="82.0" y1="334.0" x2="166.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="124.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="166.0" y1="334.0" x2="82.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="124.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="166.0" y1="334.0" x2="390.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="278.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
||||
<line x1="390.0" y1="334.0" x2="166.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="278.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">10.0mm 128GB/s</text>
|
||||
<line x1="390.0" y1="334.0" x2="474.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="432.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<line x1="474.0" y1="334.0" x2="390.0" y2="334.0" stroke="#94a3b8" stroke-width="1" opacity="0.8"/>
|
||||
<text x="432.0" y="330.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">2.0mm 128GB/s</text>
|
||||
<polyline points="82.0,138.0 110.0,138.0 110.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="96.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="110.0,292.0 82.0,292.0 82.0,138.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="96.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="82.0,334.0 110.0,334.0 110.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="96.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="110.0,292.0 82.0,292.0 82.0,334.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="96.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="474.0,138.0 446.0,138.0 446.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="460.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="446.0,292.0 474.0,292.0 474.0,138.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="460.0" y="211.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="474.0,334.0 446.0,334.0 446.0,292.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="460.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="446.0,292.0 474.0,292.0 474.0,334.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.8"/>
|
||||
<text x="460.0" y="309.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.0mm 512GB/s</text>
|
||||
<polyline points="334.0,236.0 334.0,131.4 278.0,131.4 278.0,56.8" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="334.0,236.0 334.0,310.6 278.0,310.6 278.0,415.2" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="334.0,236.0 334.0,221.0 488.0,221.0 488.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="334.0,236.0 334.0,221.0 68.0,221.0 68.0,236.0" fill="none" stroke="#a78bfa" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="446.0,194.0 446.0,200.0 334.0,200.0 334.0,236.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="334.0,236.0 334.0,200.0 446.0,200.0 446.0,194.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.6"/>
|
||||
<polyline points="334.0,236.0 110.0,236.0 110.0,194.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.8"/>
|
||||
<polyline points="110.0,194.0 334.0,194.0 334.0,236.0" fill="none" stroke="#f59e0b" stroke-width="1" opacity="0.8"/>
|
||||
<rect x="250.0" y="40.0" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
||||
<text x="278.0" y="60.8" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-N</text>
|
||||
<rect x="250.0" y="398.4" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
||||
<text x="278.0" y="419.2" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-S</text>
|
||||
<rect x="460.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
||||
<text x="488.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-E</text>
|
||||
<rect x="40.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
||||
<text x="68.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">UCIe-W</text>
|
||||
<rect x="306.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#a78bfa" stroke="#475569" stroke-width="1"/>
|
||||
<text x="334.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">NOC</text>
|
||||
<rect x="418.0" y="177.2" width="56.0" height="33.6" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
|
||||
<text x="446.0" y="198.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">M CPU</text>
|
||||
<rect x="194.0" y="219.2" width="56.0" height="33.6" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
|
||||
<text x="222.0" y="240.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#ffffff">HBM CTRL</text>
|
||||
<rect x="82.0" y="177.2" width="56.0" height="33.6" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
|
||||
<text x="110.0" y="198.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">SRAM</text>
|
||||
<rect x="82.0" y="275.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="110.0" y="296.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">Bridge LEFT</text>
|
||||
<rect x="418.0" y="275.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="446.0" y="296.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">Bridge RIGHT</text>
|
||||
<rect x="56.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="82.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE0</text>
|
||||
<rect x="54.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="82.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE0</text>
|
||||
<rect x="140.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="166.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE1</text>
|
||||
<rect x="138.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="166.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE1</text>
|
||||
<rect x="364.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="390.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE2</text>
|
||||
<rect x="362.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="390.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE2</text>
|
||||
<rect x="448.8" y="68.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="474.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE3</text>
|
||||
<rect x="446.0" y="121.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="474.0" y="142.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE3</text>
|
||||
<rect x="56.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="82.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE4</text>
|
||||
<rect x="54.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="82.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE4</text>
|
||||
<rect x="140.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="166.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE5</text>
|
||||
<rect x="138.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="166.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE5</text>
|
||||
<rect x="364.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="390.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE6</text>
|
||||
<rect x="362.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="390.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE6</text>
|
||||
<rect x="448.8" y="376.0" width="50.4" height="28.0" rx="4" fill="#94a3b8" stroke="#475569" stroke-width="1"/>
|
||||
<text x="474.0" y="394.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">PE7</text>
|
||||
<rect x="446.0" y="317.2" width="56.0" height="33.6" rx="4" fill="#f97316" stroke="#475569" stroke-width="1"/>
|
||||
<text x="474.0" y="338.0" text-anchor="middle" font-family="monospace" font-size="8" fill="#1e293b">XBAR PE7</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 18 KiB |
@@ -0,0 +1,31 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="500" height="360" viewBox="0 0 500 360">
|
||||
<title>pe</title>
|
||||
<rect width="500" height="360" fill="#f8fafc"/>
|
||||
<text x="250" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">PE VIEW</text>
|
||||
<line x1="92.5" y1="180.0" x2="180.0" y2="180.0" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="136.2" y="176.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm</text>
|
||||
<polyline points="180.0,180.0 180.0,92.5 285.0,92.5" fill="none" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="232.5" y="132.2" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm</text>
|
||||
<line x1="180.0" y1="180.0" x2="285.0" y2="180.0" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="232.5" y="176.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm</text>
|
||||
<polyline points="180.0,180.0 180.0,267.5 285.0,267.5" fill="none" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="232.5" y="219.8" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm</text>
|
||||
<polyline points="285.0,92.5 390.0,92.5 390.0,180.0" fill="none" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="337.5" y="132.2" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm 512GB/s</text>
|
||||
<line x1="285.0" y1="180.0" x2="390.0" y2="180.0" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="337.5" y="176.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm 512GB/s</text>
|
||||
<polyline points="285.0,267.5 390.0,267.5 390.0,180.0" fill="none" stroke="#94a3b8" stroke-width="1.5" opacity="0.8"/>
|
||||
<text x="337.5" y="219.8" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">0.5mm 512GB/s</text>
|
||||
<rect x="48.8" y="155.5" width="87.5" height="49.0" rx="4" fill="#ef4444" stroke="#475569" stroke-width="1"/>
|
||||
<text x="92.5" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE CPU</text>
|
||||
<rect x="136.2" y="155.5" width="87.5" height="49.0" rx="4" fill="#f59e0b" stroke="#475569" stroke-width="1"/>
|
||||
<text x="180.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="9" fill="#1e293b">PE SCHEDULER</text>
|
||||
<rect x="241.2" y="68.0" width="87.5" height="49.0" rx="4" fill="#3b82f6" stroke="#475569" stroke-width="1"/>
|
||||
<text x="285.0" y="96.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE DMA</text>
|
||||
<rect x="241.2" y="155.5" width="87.5" height="49.0" rx="4" fill="#8b5cf6" stroke="#475569" stroke-width="1"/>
|
||||
<text x="285.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE GEMM</text>
|
||||
<rect x="241.2" y="243.0" width="87.5" height="49.0" rx="4" fill="#ec4899" stroke="#475569" stroke-width="1"/>
|
||||
<text x="285.0" y="271.5" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE MATH</text>
|
||||
<rect x="346.2" y="155.5" width="87.5" height="49.0" rx="4" fill="#10b981" stroke="#475569" stroke-width="1"/>
|
||||
<text x="390.0" y="184.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#ffffff">PE TCM</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.2 KiB |
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="820" height="500" viewBox="0 0 820 500" font-family="monospace">
|
||||
<rect width="820" height="500" fill="#f8fafc" rx="6"/>
|
||||
<text x="410" y="32" text-anchor="middle" font-size="16" font-weight="bold" fill="#1e293b">Placement: column_wise</text>
|
||||
<text x="410.0" y="54.0" text-anchor="middle" font-size="12" fill="#475569" font-weight="normal">Tensor (1024×512) fp16 → K axis split into 8 parts</text>
|
||||
<text x="320.0" y="82.0" text-anchor="middle" font-size="11" fill="#475569" font-weight="normal">← K=512 →</text>
|
||||
<text x="68.0" y="250.0" text-anchor="middle" font-size="11" fill="#475569" transform="rotate(-90 68.0 250.0)">↑ M=1024 ↓</text>
|
||||
<rect x="80.0" y="90.0" width="60.0" height="320.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="110.0" y="246.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="110.0" y="262.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(1024×64)</text>
|
||||
<rect x="140.0" y="90.0" width="60.0" height="320.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="170.0" y="246.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="170.0" y="262.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(1024×64)</text>
|
||||
<rect x="200.0" y="90.0" width="60.0" height="320.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="246.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="230.0" y="262.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">(1024×64)</text>
|
||||
<rect x="260.0" y="90.0" width="60.0" height="320.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="290.0" y="246.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="290.0" y="262.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(1024×64)</text>
|
||||
<rect x="320.0" y="90.0" width="60.0" height="320.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="350.0" y="246.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="350.0" y="262.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(1024×64)</text>
|
||||
<rect x="380.0" y="90.0" width="60.0" height="320.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="410.0" y="246.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="410.0" y="262.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(1024×64)</text>
|
||||
<rect x="440.0" y="90.0" width="60.0" height="320.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="470.0" y="246.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="470.0" y="262.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">(1024×64)</text>
|
||||
<rect x="500.0" y="90.0" width="60.0" height="320.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="530.0" y="246.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="530.0" y="262.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(1024×64)</text>
|
||||
<rect x="80.0" y="90.0" width="480.0" height="320.0" fill="none" stroke="#1e293b" stroke-width="2" fill-opacity="1.0" rx="2"/>
|
||||
<text x="110.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=0 B</text>
|
||||
<text x="110.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="170.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=128 KB</text>
|
||||
<text x="170.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="230.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=256 KB</text>
|
||||
<text x="230.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="290.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=384 KB</text>
|
||||
<text x="290.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="350.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=512 KB</text>
|
||||
<text x="350.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=640 KB</text>
|
||||
<text x="410.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="470.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=768 KB</text>
|
||||
<text x="470.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="530.0" y="426.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">off=896 KB</text>
|
||||
<text x="530.0" y="440.0" text-anchor="middle" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="670.0" y="100.0" text-anchor="middle" font-size="12" fill="#1e293b" font-weight="bold">PE Legend</text>
|
||||
<rect x="620.0" y="106.0" width="16.0" height="16.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="118.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE0</text>
|
||||
<rect x="620.0" y="128.0" width="16.0" height="16.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="140.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE1</text>
|
||||
<rect x="620.0" y="150.0" width="16.0" height="16.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="162.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE2</text>
|
||||
<rect x="620.0" y="172.0" width="16.0" height="16.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="184.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE3</text>
|
||||
<rect x="620.0" y="194.0" width="16.0" height="16.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="206.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE4</text>
|
||||
<rect x="620.0" y="216.0" width="16.0" height="16.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="228.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE5</text>
|
||||
<rect x="620.0" y="238.0" width="16.0" height="16.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="250.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE6</text>
|
||||
<rect x="620.0" y="260.0" width="16.0" height="16.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="642.0" y="272.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE7</text>
|
||||
<rect x="620.0" y="320.0" width="167.0" height="120.0" fill="#e2e8f0" stroke="#94a3b8" stroke-width="1" fill-opacity="1.0" rx="2"/>
|
||||
<text x="630.0" y="338.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Strategy: column_wise</text>
|
||||
<text x="630.0" y="356.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Split axis: K</text>
|
||||
<text x="630.0" y="374.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Shards: 8</text>
|
||||
<text x="630.0" y="392.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Each: (1024, 64)</text>
|
||||
<text x="630.0" y="410.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Each: 128 KB</text>
|
||||
<text x="630.0" y="428.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Total: 1 MB</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 8.1 KiB |
@@ -0,0 +1,47 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="820" height="500" viewBox="0 0 820 500" font-family="monospace">
|
||||
<rect width="820" height="500" fill="#f8fafc" rx="6"/>
|
||||
<text x="410" y="32" text-anchor="middle" font-size="16" font-weight="bold" fill="#1e293b">Placement: replicate</text>
|
||||
<text x="410.0" y="54.0" text-anchor="middle" font-size="12" fill="#475569" font-weight="normal">Tensor (1024×512) fp16 → full copy to each PE</text>
|
||||
<rect x="60.0" y="90.0" width="163.0" height="162.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="141.5" y="157.0" text-anchor="middle" font-size="14" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="141.5" y="177.0" text-anchor="middle" font-size="11" fill="#fff" font-weight="normal">(1024×512)</text>
|
||||
<text x="141.5" y="193.0" text-anchor="middle" font-size="10" fill="#fff" font-weight="normal">1 MB</text>
|
||||
<text x="141.5" y="207.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">offset=0</text>
|
||||
<rect x="239.0" y="90.0" width="163.0" height="162.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="320.5" y="157.0" text-anchor="middle" font-size="14" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="320.5" y="177.0" text-anchor="middle" font-size="11" fill="#fff" font-weight="normal">(1024×512)</text>
|
||||
<text x="320.5" y="193.0" text-anchor="middle" font-size="10" fill="#fff" font-weight="normal">1 MB</text>
|
||||
<text x="320.5" y="207.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">offset=0</text>
|
||||
<rect x="418.0" y="90.0" width="163.0" height="162.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="499.5" y="157.0" text-anchor="middle" font-size="14" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="499.5" y="177.0" text-anchor="middle" font-size="11" fill="#000" font-weight="normal">(1024×512)</text>
|
||||
<text x="499.5" y="193.0" text-anchor="middle" font-size="10" fill="#000" font-weight="normal">1 MB</text>
|
||||
<text x="499.5" y="207.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">offset=0</text>
|
||||
<rect x="597.0" y="90.0" width="163.0" height="162.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="678.5" y="157.0" text-anchor="middle" font-size="14" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="678.5" y="177.0" text-anchor="middle" font-size="11" fill="#fff" font-weight="normal">(1024×512)</text>
|
||||
<text x="678.5" y="193.0" text-anchor="middle" font-size="10" fill="#fff" font-weight="normal">1 MB</text>
|
||||
<text x="678.5" y="207.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">offset=0</text>
|
||||
<rect x="60.0" y="268.0" width="163.0" height="162.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="141.5" y="335.0" text-anchor="middle" font-size="14" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="141.5" y="355.0" text-anchor="middle" font-size="11" fill="#fff" font-weight="normal">(1024×512)</text>
|
||||
<text x="141.5" y="371.0" text-anchor="middle" font-size="10" fill="#fff" font-weight="normal">1 MB</text>
|
||||
<text x="141.5" y="385.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">offset=0</text>
|
||||
<rect x="239.0" y="268.0" width="163.0" height="162.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="320.5" y="335.0" text-anchor="middle" font-size="14" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="320.5" y="355.0" text-anchor="middle" font-size="11" fill="#fff" font-weight="normal">(1024×512)</text>
|
||||
<text x="320.5" y="371.0" text-anchor="middle" font-size="10" fill="#fff" font-weight="normal">1 MB</text>
|
||||
<text x="320.5" y="385.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">offset=0</text>
|
||||
<rect x="418.0" y="268.0" width="163.0" height="162.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="499.5" y="335.0" text-anchor="middle" font-size="14" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="499.5" y="355.0" text-anchor="middle" font-size="11" fill="#000" font-weight="normal">(1024×512)</text>
|
||||
<text x="499.5" y="371.0" text-anchor="middle" font-size="10" fill="#000" font-weight="normal">1 MB</text>
|
||||
<text x="499.5" y="385.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">offset=0</text>
|
||||
<rect x="597.0" y="268.0" width="163.0" height="162.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="678.5" y="335.0" text-anchor="middle" font-size="14" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="678.5" y="355.0" text-anchor="middle" font-size="11" fill="#fff" font-weight="normal">(1024×512)</text>
|
||||
<text x="678.5" y="371.0" text-anchor="middle" font-size="10" fill="#fff" font-weight="normal">1 MB</text>
|
||||
<text x="678.5" y="385.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">offset=0</text>
|
||||
<rect x="60.0" y="450.0" width="496.0" height="30.0" fill="#e2e8f0" stroke="#94a3b8" stroke-width="1" fill-opacity="1.0" rx="2"/>
|
||||
<text x="70.0" y="468.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Strategy: replicate | Shards: 8 | Each: 1 MB | Total mem: 8 MB</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.2 KiB |
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="820" height="560" viewBox="0 0 820 560" font-family="monospace">
|
||||
<rect width="820" height="560" fill="#f8fafc" rx="6"/>
|
||||
<text x="410" y="32" text-anchor="middle" font-size="16" font-weight="bold" fill="#1e293b">Placement: row_wise</text>
|
||||
<text x="410.0" y="54.0" text-anchor="middle" font-size="12" fill="#475569" font-weight="normal">Tensor (1024×512) fp16 → M axis split into 8 parts</text>
|
||||
<text x="240.0" y="82.0" text-anchor="middle" font-size="11" fill="#475569" font-weight="normal">← K=512 →</text>
|
||||
<text x="68.0" y="290.0" text-anchor="middle" font-size="11" fill="#475569" transform="rotate(-90 68.0 290.0)">↑ M=1024 ↓</text>
|
||||
<rect x="80.0" y="90.0" width="320.0" height="50.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="111.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="240.0" y="127.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="140.0" width="320.0" height="50.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="161.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="240.0" y="177.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="190.0" width="320.0" height="50.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="211.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="240.0" y="227.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="240.0" width="320.0" height="50.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="261.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="240.0" y="277.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="290.0" width="320.0" height="50.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="311.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="240.0" y="327.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="340.0" width="320.0" height="50.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="361.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="240.0" y="377.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="390.0" width="320.0" height="50.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="411.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="240.0" y="427.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="440.0" width="320.0" height="50.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="240.0" y="461.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="240.0" y="477.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">(128×512)</text>
|
||||
<rect x="80.0" y="90.0" width="320.0" height="400.0" fill="none" stroke="#1e293b" stroke-width="2" fill-opacity="1.0" rx="2"/>
|
||||
<text x="410.0" y="111.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=0 B</text>
|
||||
<text x="410.0" y="125.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="161.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=128 KB</text>
|
||||
<text x="410.0" y="175.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="211.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=256 KB</text>
|
||||
<text x="410.0" y="225.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="261.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=384 KB</text>
|
||||
<text x="410.0" y="275.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="311.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=512 KB</text>
|
||||
<text x="410.0" y="325.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="361.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=640 KB</text>
|
||||
<text x="410.0" y="375.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="411.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=768 KB</text>
|
||||
<text x="410.0" y="425.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="410.0" y="461.0" text-anchor="start" font-size="9" fill="#475569" font-weight="normal">off=896 KB</text>
|
||||
<text x="410.0" y="475.0" text-anchor="start" font-size="9" fill="#64748b" font-weight="normal">128 KB</text>
|
||||
<text x="630.0" y="100.0" text-anchor="middle" font-size="12" fill="#1e293b" font-weight="bold">PE Legend</text>
|
||||
<rect x="580.0" y="106.0" width="16.0" height="16.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="118.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE0</text>
|
||||
<rect x="580.0" y="128.0" width="16.0" height="16.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="140.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE1</text>
|
||||
<rect x="580.0" y="150.0" width="16.0" height="16.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="162.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE2</text>
|
||||
<rect x="580.0" y="172.0" width="16.0" height="16.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="184.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE3</text>
|
||||
<rect x="580.0" y="194.0" width="16.0" height="16.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="206.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE4</text>
|
||||
<rect x="580.0" y="216.0" width="16.0" height="16.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="228.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE5</text>
|
||||
<rect x="580.0" y="238.0" width="16.0" height="16.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="250.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE6</text>
|
||||
<rect x="580.0" y="260.0" width="16.0" height="16.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="602.0" y="272.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE7</text>
|
||||
<rect x="580.0" y="320.0" width="146.0" height="120.0" fill="#e2e8f0" stroke="#94a3b8" stroke-width="1" fill-opacity="1.0" rx="2"/>
|
||||
<text x="590.0" y="338.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Strategy: row_wise</text>
|
||||
<text x="590.0" y="356.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Split axis: M</text>
|
||||
<text x="590.0" y="374.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Shards: 8</text>
|
||||
<text x="590.0" y="392.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Each: (128, 512)</text>
|
||||
<text x="590.0" y="410.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Each: 128 KB</text>
|
||||
<text x="590.0" y="428.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Total: 1 MB</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 8.1 KiB |
@@ -0,0 +1,116 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="820" height="620" viewBox="0 0 820 620" font-family="monospace">
|
||||
<rect width="820" height="620" fill="#f8fafc" rx="6"/>
|
||||
<text x="410" y="32" text-anchor="middle" font-size="16" font-weight="bold" fill="#1e293b">Placement: tiled_column_major</text>
|
||||
<text x="410.0" y="54.0" text-anchor="middle" font-size="11" fill="#475569" font-weight="normal">Tensor (1024×512) fp16, tile=(256×128) → 4×4=16 tiles, column-major (K first)</text>
|
||||
<text x="280.0" y="82.0" text-anchor="middle" font-size="11" fill="#475569" font-weight="normal">← K=512 →</text>
|
||||
<text x="68.0" y="290.0" text-anchor="middle" font-size="11" fill="#475569" transform="rotate(-90 68.0 290.0)">↑ M=1024 ↓</text>
|
||||
<rect x="80.0" y="90.0" width="100.0" height="100.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="130.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t0</text>
|
||||
<rect x="180.0" y="90.0" width="100.0" height="100.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="230.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t1</text>
|
||||
<rect x="280.0" y="90.0" width="100.0" height="100.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="136.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="330.0" y="152.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t2</text>
|
||||
<rect x="380.0" y="90.0" width="100.0" height="100.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="430.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t3</text>
|
||||
<rect x="80.0" y="190.0" width="100.0" height="100.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="130.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t4</text>
|
||||
<rect x="180.0" y="190.0" width="100.0" height="100.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="230.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t5</text>
|
||||
<rect x="280.0" y="190.0" width="100.0" height="100.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="236.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="330.0" y="252.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t6</text>
|
||||
<rect x="380.0" y="190.0" width="100.0" height="100.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="430.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t7</text>
|
||||
<rect x="80.0" y="290.0" width="100.0" height="100.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="336.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="130.0" y="352.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t8</text>
|
||||
<rect x="180.0" y="290.0" width="100.0" height="100.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="336.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="230.0" y="352.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t9</text>
|
||||
<rect x="280.0" y="290.0" width="100.0" height="100.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="336.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="330.0" y="352.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t10</text>
|
||||
<rect x="380.0" y="290.0" width="100.0" height="100.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="336.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="430.0" y="352.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t11</text>
|
||||
<rect x="80.0" y="390.0" width="100.0" height="100.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="130.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t12</text>
|
||||
<rect x="180.0" y="390.0" width="100.0" height="100.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="230.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t13</text>
|
||||
<rect x="280.0" y="390.0" width="100.0" height="100.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="436.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="330.0" y="452.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t14</text>
|
||||
<rect x="380.0" y="390.0" width="100.0" height="100.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="430.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t15</text>
|
||||
<rect x="80.0" y="90.0" width="400.0" height="400.0" fill="none" stroke="#1e293b" stroke-width="2" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=0..127</text>
|
||||
<text x="230.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=128..255</text>
|
||||
<text x="330.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=256..383</text>
|
||||
<text x="430.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=384..511</text>
|
||||
<text x="64.0" y="140.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=0..255</text>
|
||||
<text x="64.0" y="240.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=256..511</text>
|
||||
<text x="64.0" y="340.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=512..767</text>
|
||||
<text x="64.0" y="440.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=768..1023</text>
|
||||
<text x="590.0" y="90.0" text-anchor="middle" font-size="12" fill="#1e293b" font-weight="bold">PE Legend</text>
|
||||
<rect x="540.0" y="96.0" width="16.0" height="16.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="108.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE0</text>
|
||||
<rect x="540.0" y="118.0" width="16.0" height="16.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="130.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE1</text>
|
||||
<rect x="540.0" y="140.0" width="16.0" height="16.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="152.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE2</text>
|
||||
<rect x="540.0" y="162.0" width="16.0" height="16.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="174.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE3</text>
|
||||
<rect x="540.0" y="184.0" width="16.0" height="16.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="196.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE4</text>
|
||||
<rect x="540.0" y="206.0" width="16.0" height="16.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="218.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE5</text>
|
||||
<rect x="540.0" y="228.0" width="16.0" height="16.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="240.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE6</text>
|
||||
<rect x="540.0" y="250.0" width="16.0" height="16.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="262.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE7</text>
|
||||
<text x="540.0" y="310.0" text-anchor="middle" font-size="12" fill="#1e293b" font-weight="bold">Tile Assignment Order</text>
|
||||
<rect x="540.0" y="318.0" width="12.0" height="12.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="328.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 0 → PE0 (0,0) off=0 B</text>
|
||||
<rect x="540.0" y="334.0" width="12.0" height="12.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="344.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 1 → PE1 (0,1) off=256 B</text>
|
||||
<rect x="540.0" y="350.0" width="12.0" height="12.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="360.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 2 → PE2 (0,2) off=512 B</text>
|
||||
<rect x="540.0" y="366.0" width="12.0" height="12.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="376.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 3 → PE3 (0,3) off=768 B</text>
|
||||
<rect x="540.0" y="382.0" width="12.0" height="12.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="392.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 4 → PE4 (1,0) off=256 KB</text>
|
||||
<rect x="540.0" y="398.0" width="12.0" height="12.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="408.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 5 → PE5 (1,1) off=256 KB</text>
|
||||
<rect x="540.0" y="414.0" width="12.0" height="12.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="424.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 6 → PE6 (1,2) off=256 KB</text>
|
||||
<rect x="540.0" y="430.0" width="12.0" height="12.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="440.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 7 → PE7 (1,3) off=256 KB</text>
|
||||
<rect x="540.0" y="446.0" width="12.0" height="12.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="456.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 8 → PE0 (2,0) off=512 KB</text>
|
||||
<rect x="540.0" y="462.0" width="12.0" height="12.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="472.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 9 → PE1 (2,1) off=512 KB</text>
|
||||
<rect x="540.0" y="478.0" width="12.0" height="12.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="488.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t10 → PE2 (2,2) off=512 KB</text>
|
||||
<rect x="540.0" y="494.0" width="12.0" height="12.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="504.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t11 → PE3 (2,3) off=512 KB</text>
|
||||
<rect x="540.0" y="510.0" width="12.0" height="12.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="520.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t12 → PE4 (3,0) off=768 KB</text>
|
||||
<rect x="540.0" y="526.0" width="12.0" height="12.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="536.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t13 → PE5 (3,1) off=768 KB</text>
|
||||
<rect x="540.0" y="542.0" width="12.0" height="12.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="552.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t14 → PE6 (3,2) off=768 KB</text>
|
||||
<rect x="540.0" y="558.0" width="12.0" height="12.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="568.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t15 → PE7 (3,3) off=768 KB</text>
|
||||
<rect x="80.0" y="560.0" width="608.0" height="30.0" fill="#e2e8f0" stroke="#94a3b8" stroke-width="1" fill-opacity="1.0" rx="2"/>
|
||||
<text x="90.0" y="578.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Strategy: tiled_column_major | Tile: (256×128)=64 KB | Tiles: 16 | Total: 1 MB</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 14 KiB |
@@ -0,0 +1,116 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="820" height="620" viewBox="0 0 820 620" font-family="monospace">
|
||||
<rect width="820" height="620" fill="#f8fafc" rx="6"/>
|
||||
<text x="410" y="32" text-anchor="middle" font-size="16" font-weight="bold" fill="#1e293b">Placement: tiled_row_major</text>
|
||||
<text x="410.0" y="54.0" text-anchor="middle" font-size="11" fill="#475569" font-weight="normal">Tensor (1024×512) fp16, tile=(256×128) → 4×4=16 tiles, row-major (M first)</text>
|
||||
<text x="280.0" y="82.0" text-anchor="middle" font-size="11" fill="#475569" font-weight="normal">← K=512 →</text>
|
||||
<text x="68.0" y="290.0" text-anchor="middle" font-size="11" fill="#475569" transform="rotate(-90 68.0 290.0)">↑ M=1024 ↓</text>
|
||||
<rect x="80.0" y="90.0" width="100.0" height="100.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="130.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t0</text>
|
||||
<rect x="80.0" y="190.0" width="100.0" height="100.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="130.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t1</text>
|
||||
<rect x="80.0" y="290.0" width="100.0" height="100.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="336.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="130.0" y="352.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t2</text>
|
||||
<rect x="80.0" y="390.0" width="100.0" height="100.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="130.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t3</text>
|
||||
<rect x="180.0" y="90.0" width="100.0" height="100.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="230.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t4</text>
|
||||
<rect x="180.0" y="190.0" width="100.0" height="100.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="230.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t5</text>
|
||||
<rect x="180.0" y="290.0" width="100.0" height="100.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="336.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="230.0" y="352.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t6</text>
|
||||
<rect x="180.0" y="390.0" width="100.0" height="100.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="230.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="230.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t7</text>
|
||||
<rect x="280.0" y="90.0" width="100.0" height="100.0" fill="#3b82f6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE0</text>
|
||||
<text x="330.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t8</text>
|
||||
<rect x="280.0" y="190.0" width="100.0" height="100.0" fill="#10b981" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE1</text>
|
||||
<text x="330.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t9</text>
|
||||
<rect x="280.0" y="290.0" width="100.0" height="100.0" fill="#f59e0b" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="336.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE2</text>
|
||||
<text x="330.0" y="352.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t10</text>
|
||||
<rect x="280.0" y="390.0" width="100.0" height="100.0" fill="#ef4444" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="330.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE3</text>
|
||||
<text x="330.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t11</text>
|
||||
<rect x="380.0" y="90.0" width="100.0" height="100.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="136.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE4</text>
|
||||
<text x="430.0" y="152.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t12</text>
|
||||
<rect x="380.0" y="190.0" width="100.0" height="100.0" fill="#ec4899" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="236.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE5</text>
|
||||
<text x="430.0" y="252.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t13</text>
|
||||
<rect x="380.0" y="290.0" width="100.0" height="100.0" fill="#06b6d4" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="336.0" text-anchor="middle" font-size="12" fill="#000" font-weight="bold">PE6</text>
|
||||
<text x="430.0" y="352.0" text-anchor="middle" font-size="9" fill="#000" font-weight="normal">t14</text>
|
||||
<rect x="380.0" y="390.0" width="100.0" height="100.0" fill="#f97316" stroke="#334155" stroke-width="1.5" fill-opacity="1.0" rx="2"/>
|
||||
<text x="430.0" y="436.0" text-anchor="middle" font-size="12" fill="#fff" font-weight="bold">PE7</text>
|
||||
<text x="430.0" y="452.0" text-anchor="middle" font-size="9" fill="#fff" font-weight="normal">t15</text>
|
||||
<rect x="80.0" y="90.0" width="400.0" height="400.0" fill="none" stroke="#1e293b" stroke-width="2" fill-opacity="1.0" rx="2"/>
|
||||
<text x="130.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=0..127</text>
|
||||
<text x="230.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=128..255</text>
|
||||
<text x="330.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=256..383</text>
|
||||
<text x="430.0" y="506.0" text-anchor="middle" font-size="9" fill="#475569" font-weight="normal">k=384..511</text>
|
||||
<text x="64.0" y="140.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=0..255</text>
|
||||
<text x="64.0" y="240.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=256..511</text>
|
||||
<text x="64.0" y="340.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=512..767</text>
|
||||
<text x="64.0" y="440.0" text-anchor="end" font-size="9" fill="#475569" font-weight="normal">m=768..1023</text>
|
||||
<text x="590.0" y="90.0" text-anchor="middle" font-size="12" fill="#1e293b" font-weight="bold">PE Legend</text>
|
||||
<rect x="540.0" y="96.0" width="16.0" height="16.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="108.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE0</text>
|
||||
<rect x="540.0" y="118.0" width="16.0" height="16.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="130.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE1</text>
|
||||
<rect x="540.0" y="140.0" width="16.0" height="16.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="152.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE2</text>
|
||||
<rect x="540.0" y="162.0" width="16.0" height="16.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="174.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE3</text>
|
||||
<rect x="540.0" y="184.0" width="16.0" height="16.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="196.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE4</text>
|
||||
<rect x="540.0" y="206.0" width="16.0" height="16.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="218.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE5</text>
|
||||
<rect x="540.0" y="228.0" width="16.0" height="16.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="240.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE6</text>
|
||||
<rect x="540.0" y="250.0" width="16.0" height="16.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="562.0" y="262.0" text-anchor="start" font-size="11" fill="#1e293b" font-weight="normal">PE7</text>
|
||||
<text x="540.0" y="310.0" text-anchor="middle" font-size="12" fill="#1e293b" font-weight="bold">Tile Assignment Order</text>
|
||||
<rect x="540.0" y="318.0" width="12.0" height="12.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="328.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 0 → PE0 (0,0) off=0 B</text>
|
||||
<rect x="540.0" y="334.0" width="12.0" height="12.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="344.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 1 → PE1 (1,0) off=256 KB</text>
|
||||
<rect x="540.0" y="350.0" width="12.0" height="12.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="360.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 2 → PE2 (2,0) off=512 KB</text>
|
||||
<rect x="540.0" y="366.0" width="12.0" height="12.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="376.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 3 → PE3 (3,0) off=768 KB</text>
|
||||
<rect x="540.0" y="382.0" width="12.0" height="12.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="392.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 4 → PE4 (0,1) off=256 B</text>
|
||||
<rect x="540.0" y="398.0" width="12.0" height="12.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="408.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 5 → PE5 (1,1) off=256 KB</text>
|
||||
<rect x="540.0" y="414.0" width="12.0" height="12.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="424.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 6 → PE6 (2,1) off=512 KB</text>
|
||||
<rect x="540.0" y="430.0" width="12.0" height="12.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="440.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 7 → PE7 (3,1) off=768 KB</text>
|
||||
<rect x="540.0" y="446.0" width="12.0" height="12.0" fill="#3b82f6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="456.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 8 → PE0 (0,2) off=512 B</text>
|
||||
<rect x="540.0" y="462.0" width="12.0" height="12.0" fill="#10b981" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="472.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t 9 → PE1 (1,2) off=256 KB</text>
|
||||
<rect x="540.0" y="478.0" width="12.0" height="12.0" fill="#f59e0b" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="488.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t10 → PE2 (2,2) off=512 KB</text>
|
||||
<rect x="540.0" y="494.0" width="12.0" height="12.0" fill="#ef4444" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="504.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t11 → PE3 (3,2) off=768 KB</text>
|
||||
<rect x="540.0" y="510.0" width="12.0" height="12.0" fill="#8b5cf6" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="520.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t12 → PE4 (0,3) off=768 B</text>
|
||||
<rect x="540.0" y="526.0" width="12.0" height="12.0" fill="#ec4899" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="536.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t13 → PE5 (1,3) off=256 KB</text>
|
||||
<rect x="540.0" y="542.0" width="12.0" height="12.0" fill="#06b6d4" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="552.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t14 → PE6 (2,3) off=512 KB</text>
|
||||
<rect x="540.0" y="558.0" width="12.0" height="12.0" fill="#f97316" stroke="#334155" stroke-width="1.0" fill-opacity="1.0" rx="2"/>
|
||||
<text x="558.0" y="568.0" text-anchor="start" font-size="9" fill="#334155" font-weight="normal">t15 → PE7 (3,3) off=768 KB</text>
|
||||
<rect x="80.0" y="560.0" width="587.0" height="30.0" fill="#e2e8f0" stroke="#94a3b8" stroke-width="1" fill-opacity="1.0" rx="2"/>
|
||||
<text x="90.0" y="578.0" text-anchor="start" font-size="10" fill="#334155" font-weight="normal">Strategy: tiled_row_major | Tile: (256×128)=64 KB | Tiles: 16 | Total: 1 MB</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 14 KiB |
@@ -0,0 +1,95 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="648" height="648" viewBox="0 0 648 648">
|
||||
<title>sip</title>
|
||||
<rect width="648" height="648" fill="#f8fafc"/>
|
||||
<text x="324" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">SIP VIEW</text>
|
||||
<line x1="108.0" y1="144.0" x2="252.0" y2="144.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="180.0" y="140.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="108.0" y1="144.0" x2="108.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="108.0" y="200.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="144.0" x2="396.0" y2="144.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="324.0" y="140.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="144.0" x2="252.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="252.0" y="200.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="144.0" x2="540.0" y2="144.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="468.0" y="140.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="144.0" x2="396.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="396.0" y="200.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="540.0" y1="144.0" x2="540.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="540.0" y="200.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="108.0" y1="264.0" x2="252.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="180.0" y="260.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="108.0" y1="264.0" x2="108.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="108.0" y="320.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="264.0" x2="396.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="324.0" y="260.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="264.0" x2="252.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="252.0" y="320.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="264.0" x2="540.0" y2="264.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="468.0" y="260.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="264.0" x2="396.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="396.0" y="320.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="540.0" y1="264.0" x2="540.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="540.0" y="320.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="108.0" y1="384.0" x2="252.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="180.0" y="380.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="108.0" y1="384.0" x2="108.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="108.0" y="440.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="384.0" x2="396.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="324.0" y="380.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="384.0" x2="252.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="252.0" y="440.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="384.0" x2="540.0" y2="384.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="468.0" y="380.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="384.0" x2="396.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="396.0" y="440.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="540.0" y1="384.0" x2="540.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="540.0" y="440.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="108.0" y1="504.0" x2="252.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="180.0" y="500.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="252.0" y1="504.0" x2="396.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="324.0" y="500.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<line x1="396.0" y1="504.0" x2="540.0" y2="504.0" stroke="#3b82f6" stroke-width="1" opacity="0.8"/>
|
||||
<text x="468.0" y="500.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">1.0mm 512GB/s</text>
|
||||
<polyline points="324.0,56.0 108.0,56.0 108.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||
<text x="216.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
||||
<polyline points="324.0,56.0 252.0,56.0 252.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||
<text x="288.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
||||
<polyline points="324.0,56.0 396.0,56.0 396.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||
<text x="360.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
||||
<polyline points="324.0,56.0 540.0,56.0 540.0,144.0" fill="none" stroke="#0ea5e9" stroke-width="1" opacity="0.8"/>
|
||||
<text x="432.0" y="96.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">3.5mm 512GB/s</text>
|
||||
<rect x="84.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="108.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,0)</text>
|
||||
<rect x="228.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="252.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (1,0)</text>
|
||||
<rect x="372.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="396.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (2,0)</text>
|
||||
<rect x="516.0" y="128.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="540.0" y="148.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (3,0)</text>
|
||||
<rect x="84.0" y="248.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="108.0" y="268.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,1)</text>
|
||||
<rect x="228.0" y="248.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="252.0" y="268.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (1,1)</text>
|
||||
<rect x="372.0" y="248.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="396.0" y="268.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (2,1)</text>
|
||||
<rect x="516.0" y="248.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="540.0" y="268.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (3,1)</text>
|
||||
<rect x="84.0" y="368.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="108.0" y="388.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,2)</text>
|
||||
<rect x="228.0" y="368.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="252.0" y="388.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (1,2)</text>
|
||||
<rect x="372.0" y="368.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="396.0" y="388.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (2,2)</text>
|
||||
<rect x="516.0" y="368.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="540.0" y="388.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (3,2)</text>
|
||||
<rect x="84.0" y="488.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="108.0" y="508.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (0,3)</text>
|
||||
<rect x="228.0" y="488.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="252.0" y="508.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (1,3)</text>
|
||||
<rect x="372.0" y="488.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="396.0" y="508.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (2,3)</text>
|
||||
<rect x="516.0" y="488.0" width="48.0" height="32.0" rx="4" fill="#cbd5e1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="540.0" y="508.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#1e293b">CUBE (3,3)</text>
|
||||
<rect x="308.0" y="50.0" width="32.0" height="12.0" rx="4" fill="#0ea5e9" stroke="#475569" stroke-width="1"/>
|
||||
<text x="324.0" y="60.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">IO io0</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 10 KiB |
@@ -0,0 +1,19 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="768" height="396" viewBox="0 0 768 396">
|
||||
<title>system</title>
|
||||
<rect width="768" height="396" fill="#f8fafc"/>
|
||||
<text x="384" y="18" text-anchor="middle" font-family="monospace" font-size="14" font-weight="bold" fill="#1e293b">SYSTEM VIEW</text>
|
||||
<polyline points="384.0,60.0 182.0,60.0 182.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
|
||||
<text x="283.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 256GB/s</text>
|
||||
<polyline points="384.0,60.0 586.0,60.0 586.0,120.0" fill="none" stroke="#6366f1" stroke-width="1" opacity="0.8"/>
|
||||
<text x="485.0" y="86.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#64748b">20.0mm 256GB/s</text>
|
||||
<rect x="374.0" y="57.0" width="20.0" height="6.0" rx="4" fill="#6366f1" stroke="#475569" stroke-width="1"/>
|
||||
<text x="384.0" y="64.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">Fabric Switch</text>
|
||||
<rect x="62.0" y="138.0" width="240.0" height="200.0" rx="4" fill="#e0e7ff" stroke="#475569" stroke-width="1"/>
|
||||
<text x="182.0" y="242.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">SIP 0</text>
|
||||
<rect x="174.0" y="117.0" width="16.0" height="6.0" rx="4" fill="#0ea5e9" stroke="#475569" stroke-width="1"/>
|
||||
<text x="182.0" y="124.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">IO io0</text>
|
||||
<rect x="466.0" y="138.0" width="240.0" height="200.0" rx="4" fill="#e0e7ff" stroke="#475569" stroke-width="1"/>
|
||||
<text x="586.0" y="242.0" text-anchor="middle" font-family="monospace" font-size="10" fill="#1e293b">SIP 1</text>
|
||||
<rect x="578.0" y="117.0" width="16.0" height="6.0" rx="4" fill="#0ea5e9" stroke="#475569" stroke-width="1"/>
|
||||
<text x="586.0" y="124.0" text-anchor="middle" font-family="monospace" font-size="7" fill="#ffffff">IO io0</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 1.9 KiB |
@@ -0,0 +1,381 @@
|
||||
# Latency Model
|
||||
|
||||
## Overview
|
||||
|
||||
kernbench uses a discrete-event simulation (SimPy) to compute end-to-end latency.
|
||||
Every request flows through a graph of **components** connected by **wires**.
|
||||
The total latency reported is the **actual SimPy wall-clock** (`env.now` delta),
|
||||
not a static formula—so contention and queueing are captured automatically.
|
||||
|
||||
```
|
||||
total_ns (actual) = wire_prop + component_overhead + drain + queueing
|
||||
├── deterministic ──────────────────┘ │
|
||||
└── contention-dependent ────────────────────┘
|
||||
```
|
||||
|
||||
## Three Deterministic Cost Components
|
||||
|
||||
### 1. Wire Propagation
|
||||
|
||||
```
|
||||
wire_ns = distance_mm × ns_per_mm (global: 0.01 = 10 ps/mm)
|
||||
```
|
||||
|
||||
Every edge in the topology graph has a `distance_mm`. A SimPy wire process
|
||||
delays each message by `wire_ns` before delivering it to the next component.
|
||||
For on-chip silicon this is ~10 ps/mm; the same constant applies everywhere
|
||||
since all links are on-die or interposer. Wire propagation is typically <1 ns
|
||||
and negligible compared to other costs.
|
||||
|
||||
### 2. Component Overhead (`overhead_ns`)
|
||||
|
||||
```
|
||||
component_ns = node.attrs["overhead_ns"]
|
||||
```
|
||||
|
||||
Each component on the path adds a fixed processing delay via `yield env.timeout(overhead_ns)`.
|
||||
This models arbitration, protocol processing, pipeline stages, etc.
|
||||
|
||||
| Component | overhead_ns | Meaning |
|
||||
|-----------|-------------|---------|
|
||||
| pcie_ep | 5.0 | PCIe protocol processing |
|
||||
| io_cpu | 10.0 | Command decode / dispatch |
|
||||
| m_cpu | 5.0 | DMA scheduling |
|
||||
| fabric switch | 5.0 | Packet arbitration |
|
||||
| xbar | 2.0 | Crossbar arbitration |
|
||||
| xbar bridge | 1.0 | Bridge traversal between xbar halves |
|
||||
| ucie | 1.0 | UCIe protocol overhead per port |
|
||||
| noc (2D mesh) | 0.0 | Hop delay modeled internally via manhattan distance |
|
||||
| hbm_ctrl | 0.0 | Access time captured in drain_ns |
|
||||
| pe_cpu | 2.0 | Command dispatch |
|
||||
| pe_scheduler | 1.0 | PE-internal scheduling |
|
||||
| pe_gemm/math | 0.0 | Placeholder; will use flops-based model |
|
||||
|
||||
### 3. Drain (Serialization Delay)
|
||||
|
||||
```
|
||||
drain_ns = nbytes / bottleneck_bw_gbs
|
||||
```
|
||||
|
||||
**Wormhole (cut-through) model**: data flows through intermediate nodes as a
|
||||
pipeline. Serialization cost is paid **once** at the terminal node, not at
|
||||
every hop. The bottleneck is the minimum `bw_gbs` across all edges in the path.
|
||||
|
||||
Example: 4096 bytes through a path with bottleneck 128 GB/s → `4096 / 128 = 32.0 ns`.
|
||||
|
||||
### Formula (Theoretical Lower Bound)
|
||||
|
||||
```
|
||||
formula_ns = Σ(wire_prop) + Σ(overhead_ns) + drain_ns
|
||||
```
|
||||
|
||||
This is the latency with **zero contention**—no other request competing for
|
||||
any resource. The engine provides `_formula_latency()` for verification.
|
||||
With no contention: `actual == formula`. With contention: `actual > formula`.
|
||||
|
||||
### Diagram: PE DMA Read (pe0 → local slice0, 4096 bytes)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant D as pe_dma
|
||||
participant X as xbar.pe0
|
||||
participant H as hbm_ctrl.slice0
|
||||
|
||||
D->>X: txn (4096B)
|
||||
Note over X: overhead 2.0 ns
|
||||
X->>H: txn (wire 0.025 ns)
|
||||
Note over H: acquire Resource
|
||||
Note over H: overhead 0 ns
|
||||
Note over H: drain 4096/256 = 16.0 ns
|
||||
Note over H: release Resource
|
||||
H-->>D: done.succeed()
|
||||
|
||||
Note over D,H: total_ns = 18.09 ns<br/>formula = wire(0.025) + ovhd(2.0) + drain(16.0) = 18.025 ns<br/>actual ≈ formula (no contention)
|
||||
```
|
||||
|
||||
### Diagram: Two Requests — No Contention vs HOL Blocking
|
||||
|
||||
#### Case 1: Different slices (parallel, no contention)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant A as Request A
|
||||
participant S0 as hbm_ctrl.slice0<br/>Resource(cap=1)
|
||||
participant S1 as hbm_ctrl.slice1<br/>Resource(cap=1)
|
||||
|
||||
Note over A,S1: t=2 ns — both requests arrive at their own slice
|
||||
A->>S0: A (4KB)
|
||||
A->>S1: B (4KB)
|
||||
Note over S0: acquire (immediate)
|
||||
Note over S1: acquire (immediate)
|
||||
Note over S0: drain 16.0 ns
|
||||
Note over S1: drain 16.0 ns
|
||||
Note over S0: t=18 release
|
||||
Note over S1: t=18 release
|
||||
|
||||
Note over A,S1: A actual = 18 ns, B actual = 18 ns<br/>No waiting — separate Resources
|
||||
```
|
||||
|
||||
#### Case 2: Same slice (HOL blocking)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant A as Request A (4KB)
|
||||
participant Q as hbm_ctrl.slice0<br/>Resource(cap=1)
|
||||
participant B as Request B (64B)
|
||||
|
||||
Note over A,B: t=0 — A arrives first
|
||||
A->>Q: acquire (immediate)
|
||||
Note over Q: drain A = 16.0 ns
|
||||
|
||||
Note over B,Q: t=5 — B arrives, yield req → BLOCKED
|
||||
B--xQ: waiting...
|
||||
|
||||
Note over Q: t=16 — A drain done, release
|
||||
Q->>B: B acquires resource
|
||||
Note over Q: drain B = 0.25 ns
|
||||
Note over Q: t=16.25 — B done, release
|
||||
|
||||
Note over A,B: A actual = 16.0 ns (== formula)<br/>B actual = 11.25 ns (formula 0.25 + queueing 11.0)<br/>HOL blocking: short request waits behind long drain
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## How SimPy Tracks Latency
|
||||
|
||||
### Measurement
|
||||
|
||||
```python
|
||||
start_ns = env.now
|
||||
yield txn_done # wait for the transaction to complete
|
||||
total_ns = env.now - start_ns # ← this is what probe reports
|
||||
```
|
||||
|
||||
`env.now` is SimPy's simulation clock. It only advances when a process `yield`s
|
||||
a timeout or waits on a resource/store. The delta between start and done captures
|
||||
**everything**: wire delays, component overheads, drain, and any queueing.
|
||||
|
||||
### Component Pipeline
|
||||
|
||||
Each component is a SimPy process:
|
||||
|
||||
```
|
||||
_fan_in (per in_port) → _inbox (Store) → _worker → out_ports
|
||||
```
|
||||
|
||||
1. **`_fan_in`**: relays messages from each `in_port` into a shared `_inbox` Store.
|
||||
2. **`_worker`**: pulls from `_inbox`, spawns `_forward_txn` per message.
|
||||
3. **`_forward_txn`**: calls `run()` (overhead), then puts to `out_ports[next_hop]`.
|
||||
|
||||
The worker uses `env.process()` (pipeline model), so multiple messages can be
|
||||
in-flight through the same component concurrently. Contention happens when
|
||||
they compete for shared resources (e.g., `simpy.Resource` in hbm_ctrl).
|
||||
|
||||
### Wire Process
|
||||
|
||||
```python
|
||||
while True:
|
||||
msg = yield out_port.get() # wait for sender
|
||||
yield env.timeout(prop_ns) # propagation delay
|
||||
yield in_port.put(msg) # deliver to receiver
|
||||
```
|
||||
|
||||
Each directed edge has its own wire process. Messages are delayed by exactly
|
||||
`distance_mm × ns_per_mm`.
|
||||
|
||||
---
|
||||
|
||||
## Contention and Queueing
|
||||
|
||||
Queueing delay is **not a separate formula term**—it emerges from SimPy's
|
||||
event scheduling when multiple requests compete for the same resource.
|
||||
|
||||
### Where Contention Occurs
|
||||
|
||||
| Resource | SimPy Type | Capacity | Effect |
|
||||
|----------|-----------|----------|--------|
|
||||
| hbm_ctrl | `simpy.Resource` | 1 | Serializes HBM access |
|
||||
| m_cpu DMA read engine | `simpy.Resource` | 1 | Serializes DMA reads |
|
||||
| m_cpu DMA write engine | `simpy.Resource` | 1 | Serializes DMA writes |
|
||||
| pe_dma channels | `simpy.Resource` | configurable | Serializes PE DMA ops |
|
||||
| component inbox | `simpy.Store` | unbounded | No backpressure (FIFO) |
|
||||
|
||||
### How Queueing Works
|
||||
|
||||
```python
|
||||
# hbm_ctrl._worker
|
||||
with self._resource.request() as req:
|
||||
yield req # ← BLOCKS if resource is occupied
|
||||
yield from self.run(env, txn.nbytes)
|
||||
yield env.timeout(drain_ns)
|
||||
```
|
||||
|
||||
If request A holds the resource and request B arrives:
|
||||
- B's `yield req` blocks until A releases the resource
|
||||
- SimPy advances B's `env.now` by A's remaining service time
|
||||
- This "extra" time shows up in B's `total_ns` automatically
|
||||
|
||||
```
|
||||
No contention: actual_ns == formula_ns
|
||||
Contention: actual_ns > formula_ns
|
||||
queueing_delay = actual_ns - formula_ns
|
||||
```
|
||||
|
||||
### Head-of-Line (HOL) Blocking at hbm_ctrl
|
||||
|
||||
The `simpy.Resource` is held for the **entire** `with` block—both overhead and
|
||||
drain. The resource is NOT released between overhead and drain:
|
||||
|
||||
```python
|
||||
with self._resource.request() as req:
|
||||
yield req # acquire (or wait)
|
||||
yield from self.run(env, txn.nbytes) # overhead_ns ─┐
|
||||
yield env.timeout(drain_ns) # drain_ns │ resource held
|
||||
# ← resource released here ───────────────────────────────┘
|
||||
```
|
||||
|
||||
This means a short request arriving during a long request's drain must wait
|
||||
for the full remaining drain time—classic head-of-line blocking:
|
||||
|
||||
```
|
||||
Request A: 4 KB, drain = 16.0 ns (arrives at t=0)
|
||||
Request B: 64 B, drain = 0.25 ns (arrives at t=5)
|
||||
|
||||
Timeline:
|
||||
t=0.00 A acquires resource
|
||||
t=0.00 A: overhead (0 ns)
|
||||
t=0.00 A: drain starts (16.0 ns)
|
||||
t=5.00 B arrives → yield req → BLOCKED (A holds resource)
|
||||
t=16.00 A: drain done → resource released
|
||||
t=16.00 B acquires resource
|
||||
t=16.00 B: overhead (0 ns)
|
||||
t=16.25 B: drain done → resource released
|
||||
|
||||
B actual = 11.25 ns (waited 11.0 + own 0.25)
|
||||
B formula = 0.25 ns
|
||||
B queueing = 11.0 ns ← HOL blocking penalty
|
||||
```
|
||||
|
||||
**Why this is physically realistic**: An HBM channel processes one burst at a
|
||||
time. While data is being serialized onto the channel (drain), no other request
|
||||
can use that channel. The FIFO ordering (`simpy.Resource` default) reflects
|
||||
the simplest controller scheduling policy.
|
||||
|
||||
**Alternative: priority scheduling**: If needed, `simpy.PriorityResource` can
|
||||
prioritize shorter requests (Shortest Job First), but this is not currently
|
||||
used since FIFO matches typical HBM controller behavior.
|
||||
|
||||
---
|
||||
|
||||
## Worked Example: Two Concurrent PE DMA Reads
|
||||
|
||||
Setup: PE0 and PE1 in cube0 both read 4096 bytes from their local HBM slices
|
||||
(slice0 and slice1), submitted to the **same engine** at the same time.
|
||||
|
||||
### Paths
|
||||
|
||||
```
|
||||
DMA A: pe0.pe_dma → xbar.pe0 → hbm_ctrl.slice0
|
||||
DMA B: pe1.pe_dma → xbar.pe1 → hbm_ctrl.slice1
|
||||
```
|
||||
|
||||
### No Contention (different HBM slices)
|
||||
|
||||
Since slice0 and slice1 are **separate** hbm_ctrl instances, each with its own
|
||||
`simpy.Resource(capacity=1)`, there is no resource competition.
|
||||
|
||||
```
|
||||
DMA A timeline:
|
||||
t=0.00 pe_dma dequeues txn
|
||||
t=0.00 xbar.pe0: overhead_ns=2.0 → t=2.00
|
||||
t=2.025 wire prop (2.5mm × 0.01) → t=2.025
|
||||
t=2.025 hbm_ctrl.slice0: yield req → immediate (no contention)
|
||||
t=2.025 hbm_ctrl.slice0: overhead_ns=0 → t=2.025
|
||||
t=18.025 drain_ns = 4096/256 = 16.0 → t=18.025
|
||||
t=18.025 done
|
||||
|
||||
DMA B timeline: (identical, on its own slice)
|
||||
t=0.00 → ... → t=18.09 done
|
||||
```
|
||||
|
||||
Both complete at ~18.09 ns. `actual == formula` for both.
|
||||
|
||||
### With Contention (same HBM slice)
|
||||
|
||||
Now suppose both PE0 and PE1 read from **slice0**:
|
||||
|
||||
```
|
||||
DMA A: pe0.pe_dma → xbar.pe0 → hbm_ctrl.slice0
|
||||
DMA B: pe1.pe_dma → xbar.pe1 → xbar.pe0 → hbm_ctrl.slice0
|
||||
(chain traversal to reach slice0)
|
||||
```
|
||||
|
||||
```
|
||||
DMA A timeline:
|
||||
t=0.00 xbar.pe0(2.0) → wire → hbm_ctrl.slice0
|
||||
t=2.025 yield req → immediate (first to arrive)
|
||||
t=18.025 drain 16.0 → release resource → done
|
||||
actual_A = 18.025 ns (== formula)
|
||||
|
||||
DMA B timeline:
|
||||
t=0.00 xbar.pe1(2.0) → xbar.pe0(2.0) → wire → hbm_ctrl.slice0
|
||||
t=4.035 yield req → BLOCKED (A holds resource until t=18.025)
|
||||
t=18.025 acquire resource
|
||||
t=34.025 drain 16.0 → release → done
|
||||
actual_B = 34.035 ns
|
||||
|
||||
formula_B = wire(0.035) + overhead(4.0) + drain(32.0) = 36.035 ns
|
||||
But actual_B is different because drain uses bottleneck BW of B's path (128 GB/s)
|
||||
while A's path has BW 256 GB/s. Let's recalculate:
|
||||
|
||||
B's bottleneck: xbar_x_bw = 128 GB/s → drain = 4096/128 = 32.0 ns
|
||||
formula_B = 0.035 + 4.0 + 32.0 = 36.035 ns
|
||||
actual_B = 36.035 + queueing ≈ 50+ ns
|
||||
queueing = time waiting for A to release hbm_ctrl
|
||||
```
|
||||
|
||||
The key insight: **queueing delay is not in the formula**. It only appears in
|
||||
the actual SimPy simulation when resources are contested. The probe reports
|
||||
`actual_ns`, which includes all queueing. To see pure queueing overhead,
|
||||
compare `actual_ns` vs `formula_ns` (available in PE DMA traces).
|
||||
|
||||
---
|
||||
|
||||
## Probe Output Explained
|
||||
|
||||
```
|
||||
=== PE DMA Latency ===
|
||||
Case Target Actual Ovhd Drain Wire Ovhd% Drain% Eff.BW BN.BW Util%
|
||||
pe-local-hbm c0.pe0->c0.slice0 18.09 2.0 16.0 0.08 11.1% 88.5% 226.49 256.0 88.5%
|
||||
pe-cross-half-hbm c0.pe0->c0.slice4 37.14 5.0 32.0 0.14 13.5% 86.1% 110.27 128.0 86.1%
|
||||
```
|
||||
|
||||
| Column | Meaning |
|
||||
|--------|---------|
|
||||
| **Actual** | SimPy measured `env.now` delta (includes contention if any) |
|
||||
| **Ovhd** | Sum of `overhead_ns` for all components on the forward path |
|
||||
| **Drain** | `nbytes / bottleneck_bw` — serialization at terminal |
|
||||
| **Wire** | Sum of `distance_mm × ns_per_mm` for all edges |
|
||||
| **Ovhd%** | `Ovhd / Actual × 100` — fraction of time spent in component processing |
|
||||
| **Drain%** | `Drain / Actual × 100` — fraction of time spent in data transfer |
|
||||
| **Eff.BW** | `nbytes / Actual` — achieved bandwidth |
|
||||
| **BN.BW** | Bottleneck bandwidth (min `bw_gbs` on path) |
|
||||
| **Util%** | `Eff.BW / BN.BW × 100` — how close to theoretical max BW |
|
||||
|
||||
### Why Util% < 100%
|
||||
|
||||
`Util% = Drain% = drain_ns / actual_ns`. The gap from 100% is the overhead
|
||||
fraction. For small transfers (4KB), overhead is significant relative to drain.
|
||||
For large transfers, drain dominates and utilization approaches 100%.
|
||||
|
||||
```
|
||||
4 KB: Ovhd=2.0, Drain=16.0 → Util=88.5% (overhead is 11% of time)
|
||||
64 KB: Ovhd=2.0, Drain=256.0 → Util=99.2% (overhead is <1% of time)
|
||||
```
|
||||
|
||||
### H2D Path: Why Ovhd% is ~40%
|
||||
|
||||
H2D traverses many components (pcie_ep → io_cpu → ucie → noc → m_cpu → noc →
|
||||
xbar → hbm_ctrl + response path). Total forward overhead is ~23 ns vs drain
|
||||
of 32 ns for 4KB, so overhead is comparable to data transfer time—resulting
|
||||
in ~55% utilization. This is expected for small command-path transfers.
|
||||