adr: add ADR-0050-0053 — close /report's second-pass G4 candidates
Documents four cross-cutting surfaces one layer deeper than the prior G4 batch: - 0050 par-ccl-algorithm-module-contract: how to author a new CCL algorithm in src/kernbench/ccl/algorithms/. Pairs with ADR-0045's bench-module contract. Pins the four required public symbols (kernel, kernel_args, TOPO_NAME_TO_KIND constants, kernel alias), the 9 + tl standardized kernel signature, the kernel_args tuple format, sip_topo_kind dispatch, and the ccl.yaml entry workflow. - 0051 lat-routing-helper-api: every public method of AddressResolver (resolve, find_m_cpu, find_pcie_ep, find_io_cpu, find_all_pcie_eps) and PathRouter (find_path, find_path_with_distance, find_mcpu_dma_path, find_memory_path, find_node_path + 2 shims). Pins the four adjacency graphs (_adj_all / _adj / _adj_mcpu_dma / _adj_local) and the edge-kind exclusion sets they use, plus the single-owner naming convention. - 0052 dev-oplog-memory-store-schemas: OpRecord's 7 fields, the per-op_name params matrix (dma_read, dma_write, gemm_*, math, math reduction, composite_gemm, ipcq_copy, unknown), snapshot timing rules (math = all inputs, dma_write = HBM-only — ADR-0027 race avoidance), TileToken stage_type capture, and MemoryStore's (space, addr) two-level dict with reference-store semantics. - 0053 dev-topology-builder-algorithms: the 6-stage compile pipeline, cube_mesh.yaml's source_hash cache and its 5 input fields, the cube NoC auto-layout algorithm (row/col placement, HBM exclusion zone, PE/M_CPU/SRAM attachment via nearest-router, UCIe N/S/E/W distribution), the node naming convention (single-owner with router.py), the edge-kind catalog, the 4 view projections, and a table of spec-field changes vs mesh regeneration. Bilingual pair verifier passes for all four EN/KO pairs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,288 @@
|
||||
# ADR-0051: Routing Helper API — `AddressResolver` + `PathRouter`
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-05-22).
|
||||
|
||||
Pins down every public API, argument, return value, and adjacency-graph
|
||||
selection of the two helper classes (`AddressResolver`, `PathRouter`)
|
||||
exposed by `policy/routing/router.py`. ADR-0002 defines routing
|
||||
distance, ordering, and bypass rules, but **the helper API surface
|
||||
itself** has had no ADR-level coverage.
|
||||
|
||||
## First action
|
||||
|
||||
### `AddressResolver(graph)`
|
||||
|
||||
On construction, caches two pieces of state:
|
||||
|
||||
1. `self._node_ids = set(graph.nodes)` — a set of all node ids for
|
||||
lookup.
|
||||
2. `self._hbm_slice_bytes = hbm_total_gb * (1 << 30) // slices_per_cube`
|
||||
— derived from `graph.spec.cube.memory_map` (default `48 GB / 8
|
||||
slices = 6 GB`). `resolve()` uses this value to decode `pe_id` from
|
||||
an HBM PA's `hbm_offset`.
|
||||
|
||||
In short, **AddressResolver's first act is "precompute the full set of
|
||||
node ids and the HBM slice size"**. It does not retain the graph
|
||||
itself.
|
||||
|
||||
### `PathRouter(graph)`
|
||||
|
||||
On construction, **builds four separate adjacency graphs in one pass**:
|
||||
|
||||
1. `self._adj_all`: every edge (used for component-to-component
|
||||
routing).
|
||||
2. `self._adj`: edges with `kind != "command"` (PE DMA / generic data
|
||||
paths).
|
||||
3. `self._adj_mcpu_dma`: excludes
|
||||
`_MCPU_DMA_EXCLUDE = {"pe_internal", "pe_to_router"}` (M_CPU DMA
|
||||
must not pass through PE pipeline nodes).
|
||||
4. `self._adj_local`: excludes the 8-element `_UCIE_KINDS` set (UCIe
|
||||
would look like a zero-distance bus to Dijkstra, which would prefer
|
||||
it over the mesh — for cube-local routing this must be avoided).
|
||||
|
||||
Each graph is a `defaultdict(list)` of `(neighbor, weight)`. The
|
||||
weight is `edge.routing_weight_mm or edge.distance_mm`.
|
||||
|
||||
In short, **PathRouter's first act is "classify topology edges into
|
||||
four policy-specific adjacency lists simultaneously"**. Each `find_*()`
|
||||
call picks the appropriate graph and runs Dijkstra.
|
||||
|
||||
## Context
|
||||
|
||||
`policy/routing/router.py` performs two responsibilities together:
|
||||
|
||||
- **Naming**: it is the sole owner of the topology naming convention
|
||||
(`sip{S}.cube{C}.<comp>`, `sip{S}.io{I}.pcie_ep`, etc.). Components /
|
||||
probe / IPCQ install / runtime API do not build node-id strings
|
||||
themselves — they call helpers.
|
||||
- **Path decisions**: policy separation by `edge.kind`. For the same
|
||||
src→dst, different routing intents (PE DMA vs M_CPU DMA vs general
|
||||
component routing) call for different adjacencies and so produce
|
||||
different paths.
|
||||
|
||||
This helper API is widely consumed (probe.py / distributed.py /
|
||||
install.py / various components / tests), yet **the exact signatures /
|
||||
return semantics / adjacency picks** are not gathered in any ADR. This
|
||||
ADR closes that gap.
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. `AddressResolver` exposes five public methods
|
||||
|
||||
#### D1.1. `resolve(addr: PhysAddr) -> str`
|
||||
|
||||
Translates a `PhysAddr` to a destination node id in the topology:
|
||||
|
||||
```
|
||||
addr.kind == "hbm" → f"sip{s}.cube{d}.hbm_ctrl.pe{pe_id}"
|
||||
where pe_id = addr.hbm_offset // self._hbm_slice_bytes (ADR-0017 D4/D9)
|
||||
|
||||
addr.kind == "pe_resource":
|
||||
addr.unit_type == PE → f"sip{s}.cube{d}.pe{addr.pe_id}.pe_tcm"
|
||||
addr.unit_type == SRAM → f"sip{s}.cube{d}.sram"
|
||||
addr.unit_type == MCPU → f"sip{s}.cube{d}.m_cpu"
|
||||
others → RoutingError("unsupported unit_type")
|
||||
|
||||
other kinds → RoutingError("unsupported address kind")
|
||||
```
|
||||
|
||||
If the derived node id is not in `self._node_ids`, raises
|
||||
`RoutingError(f"node {node_id} not found in topology")`. So even when
|
||||
the address has valid syntax, an absent node in the topology
|
||||
fails-loud.
|
||||
|
||||
#### D1.2. `find_m_cpu(sip, cube) -> str`
|
||||
|
||||
Returns `f"sip{sip}.cube{cube}.m_cpu"`; absent → `RoutingError`.
|
||||
|
||||
#### D1.3. `find_pcie_ep(sip, io_id="io0") -> str`
|
||||
|
||||
Returns `f"sip{sip}.{io_id}.pcie_ep"`; absent → `RoutingError`.
|
||||
|
||||
#### D1.4. `find_io_cpu(sip, io_id="io0") -> str`
|
||||
|
||||
Returns `f"sip{sip}.{io_id}.io_cpu"`; absent → `RoutingError`.
|
||||
|
||||
#### D1.5. `find_all_pcie_eps() -> list[str]`
|
||||
|
||||
All PCIE_EP node ids across all SIPs, sorted. Filtered by
|
||||
`endswith(".pcie_ep")`. Cross-SIP IPCQ uses this when enumerating
|
||||
PCIE_EPs.
|
||||
|
||||
This class is the sole owner of the naming convention
|
||||
(`sip{S}.cube{C}.<comp>`, `sip{S}.{io_id}.<comp>`) — ADR-0015 D4.
|
||||
The topology builder produces nodes with the same naming convention;
|
||||
components never build node-id strings directly — they go through
|
||||
these helpers.
|
||||
|
||||
### D2. `PathRouter`'s four adjacency graphs
|
||||
|
||||
Constructed in one pass. `edge.kind` drives policy:
|
||||
|
||||
| graph | excluded edge kinds | use case |
|
||||
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| `_adj_all` | (none) | M_CPU↔NOC command included, IO_CPU/M_CPU routes |
|
||||
| `_adj` | `"command"` | PE DMA / generic data paths |
|
||||
| `_adj_mcpu_dma` | `"pe_internal"`, `"pe_to_router"` | M_CPU DMA (skips PE pipeline) |
|
||||
| `_adj_local` | `_UCIE_KINDS` (`ucie_internal`, `ucie_conn_to_router`, `router_to_ucie_conn`, `ucie_conn_to_noc`, `noc_to_ucie_conn`, `ucie_mesh`, `io_to_cube`, `cube_to_io`) | same-cube routing (UCIe bus excluded) |
|
||||
|
||||
Each graph is `dict[node_id, list[(neighbor, weight)]]` with weight =
|
||||
`edge.routing_weight_mm or edge.distance_mm`. Excluding command edges
|
||||
prevents them from influencing routing; isolating `_adj_local` keeps
|
||||
UCIe's "zero-distance bus" from out-competing the mesh — consistent
|
||||
with ADR-0017 D7's cross-PE-slice mesh-distance requirement.
|
||||
|
||||
### D3. `PathRouter` exposes six public methods (+ two backward-compat shims)
|
||||
|
||||
#### D3.1. `find_path(src_pe: str, dst_node: str) -> list[str]`
|
||||
|
||||
**PE DMA routing**. `src_pe` is a PE prefix (e.g.,
|
||||
`"sip0.cube0.pe0"`); the function auto-prepends `.pe_dma`, making the
|
||||
true start node `"sip0.cube0.pe0.pe_dma"`.
|
||||
|
||||
Adjacency depends on cube-locality (`_same_cube`):
|
||||
|
||||
- **Same-cube** (src and dst share `sip{S}.cube{C}.` prefix): uses
|
||||
`_adj_local`. Excluding UCIe lets cross-PE-slice access pay accurate
|
||||
mesh distance (ADR-0017 D7).
|
||||
- **Cross-cube**: uses `_adj`. UCIe naturally becomes the right choice
|
||||
for the cross-cube portion.
|
||||
|
||||
#### D3.2. `find_path_with_distance(src_pe, dst_node) -> tuple[list[str], float]`
|
||||
|
||||
Same adjacency policy as D3.1, but returns `(path, total_distance)`.
|
||||
Used by probe and analysis tools that need the distance metric.
|
||||
|
||||
#### D3.3. `find_mcpu_dma_path(m_cpu_id: str, dst_hbm_id: str) -> list[str]`
|
||||
|
||||
**M_CPU DMA path**. Same cube → `_adj_local` (stay within the mesh);
|
||||
different cube → `_adj_all` (cross via UCIe). The
|
||||
`_MCPU_DMA_EXCLUDE` set ensures PE-pipeline nodes never appear on
|
||||
M_CPU's routes.
|
||||
|
||||
#### D3.4. `find_memory_path(src: str, dst: str) -> list[str]`
|
||||
|
||||
Direct memory path like
|
||||
`pcie_ep → io_noc → cube → router mesh → hbm_ctrl`. Uses
|
||||
`_adj_mcpu_dma` to exclude `pe_internal` and `pe_to_router`, so
|
||||
host-issued reads/writes never leak into the PE pipeline. Probe
|
||||
(ADR-0049 D1's H2D/D2H cases) calls this directly.
|
||||
|
||||
#### D3.5. `find_node_path(src: str, dst: str) -> list[str]`
|
||||
|
||||
Generic routing between arbitrary nodes, **including command edges**
|
||||
(via `_adj_all`). IoCpuComponent / MCpuComponent use this when they
|
||||
need to route through M_CPU ↔ NOC command-kind links.
|
||||
|
||||
#### D3.6. Backward-compat shims
|
||||
|
||||
- `_dijkstra(start, goal) -> list[str]` — thin wrapper for
|
||||
`_run_dijkstra(self._adj, …)`.
|
||||
- `_dijkstra_with_dist(start, goal) -> tuple[list[str], float]` —
|
||||
distance-aware variant.
|
||||
|
||||
Despite the underscore prefixes (suggesting internal API), existing
|
||||
tests call these directly. New code should prefer D3.1–D3.5; these two
|
||||
shims are deprecation candidates.
|
||||
|
||||
### D4. Dijkstra — single-source shortest path
|
||||
|
||||
`_run_dijkstra_with_dist(adj, start, goal)`:
|
||||
|
||||
- `heapq` priority queue.
|
||||
- `best: dict[node, distance]` — best known distance to each node.
|
||||
- `prev: dict[node, predecessor]` — for path reconstruction.
|
||||
- Edge weight = `routing_weight_mm or distance_mm`. The separation
|
||||
matters because UCIe (and a few others) declare an explicit
|
||||
`routing_weight_mm` distinct from physical `distance_mm`.
|
||||
|
||||
`start == goal` short-circuits to `([start], 0.0)`. Unreachable target
|
||||
→ `RoutingError(f"no path from {start} to {goal}")`.
|
||||
|
||||
The algorithm is **deterministic**: identical graph + start/goal gives
|
||||
the same path, satisfying SPEC R1 ("routing MUST be deterministic").
|
||||
Tie-breaks follow `heapq`'s push order (Python list order is
|
||||
deterministic).
|
||||
|
||||
### D5. Single-owner principle for helper-API decisions
|
||||
|
||||
The following decisions live only inside router.py:
|
||||
|
||||
- Naming convention: `sip{S}.cube{C}.<comp>`,
|
||||
`sip{S}.{io_id}.<comp>`,
|
||||
`sip{S}.cube{C}.hbm_ctrl.pe{pe_id}`.
|
||||
- Adjacency policy: which edge kinds belong to which graph.
|
||||
- Algorithm for recovering PE id from an HBM slice size.
|
||||
- Dijkstra weight selection
|
||||
(`routing_weight_mm or distance_mm`).
|
||||
|
||||
Breaking single ownership (e.g., a component starting to build
|
||||
`f"sip{s}..."` itself) would explode the blast radius of naming-
|
||||
convention changes. This aligns with ADR-0015 D4.
|
||||
|
||||
### D6. Consumers of the helper API
|
||||
|
||||
Methods listed in this ADR are called from (current corpus):
|
||||
|
||||
- `probes/probe.py` (ADR-0049): `find_pcie_ep`, `find_io_cpu`,
|
||||
`find_m_cpu`, `find_node_path`, `find_mcpu_dma_path`,
|
||||
`find_memory_path`, `find_path`, `resolve`.
|
||||
- `runtime_api/distributed.py` (ADR-0047): indirectly (engine-internal
|
||||
routing).
|
||||
- `ccl/install.py` (ADR-0023): `find_all_pcie_eps`, `resolve`.
|
||||
- `sim_engine/event_log.py`: like probe — `find_pcie_ep`,
|
||||
`find_memory_path`.
|
||||
- `components/builtin/m_cpu.py`, `components/builtin/io_cpu.py`:
|
||||
`find_node_path`, `find_mcpu_dma_path`.
|
||||
- Tests (test_routing.py, test_cross_sip_routing.py, …): most of
|
||||
D3.1–D3.5.
|
||||
|
||||
When a new consumer arrives, D1/D3 act as a first-pass guide on
|
||||
whether an existing method matches the intent or a new one is needed.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### A1. One adjacency graph + per-call edge-kind filtering
|
||||
|
||||
Rejected. Re-filtering the graph on every `find_*()` call hurts
|
||||
Dijkstra cache locality. Constructing four graphs in one pass (D2)
|
||||
has modest memory cost (edges ≤ a few × 10⁴), and selection happens
|
||||
in O(1) at call time.
|
||||
|
||||
### A2. Drive adjacency separation by separate edge metadata rather than `kind`
|
||||
|
||||
Rejected. `edge.kind` is already assigned by the topology builder
|
||||
(ADR-0015 D4 + ADR-0017); a parallel metadata field would force
|
||||
synchronization between two systems.
|
||||
|
||||
### A3. Use BFS with uniform weights instead of Dijkstra
|
||||
|
||||
Rejected. With per-edge `routing_weight_mm` (mesh link / UCIe /
|
||||
IO-internal), BFS minimizes hop count rather than total
|
||||
latency/distance. SPEC R1 + R2 require deterministic and accurate
|
||||
routing, which BFS does not deliver.
|
||||
|
||||
### A4. Express the helper API as module functions instead of classes
|
||||
|
||||
Rejected. Each class
|
||||
(`AddressResolver`, `PathRouter`) maintains caches
|
||||
(`_node_ids`, `_hbm_slice_bytes`, four adjacency graphs) reused across
|
||||
many routing queries on the same graph. Module functions would have
|
||||
to rebuild state per call or go global, hurting safety and
|
||||
performance.
|
||||
|
||||
## Consequences
|
||||
|
||||
- When components / probe / IPCQ install / runtime API all go through
|
||||
router.py helpers, a naming-convention change (e.g., `.io0.` →
|
||||
`.iochiplet0.`) is a one-file edit (D5).
|
||||
- D2's four-graph split is now ADR-locked, so when a new edge kind is
|
||||
added (e.g., a new inter-die UCIe-link kind), the right adjacency
|
||||
category is decided explicitly rather than by default.
|
||||
- D3.1's same-cube vs cross-cube branching (ADR-0017 D7) is explicit,
|
||||
so anyone changing routing knows which adjacency to touch.
|
||||
- D6's consumer list bounds PR-review scope for helper-API changes,
|
||||
and the backward-compat shims (D3.6) are flagged as deprecation
|
||||
candidates.
|
||||
Reference in New Issue
Block a user