Documents four cross-cutting surfaces one layer deeper than the prior G4 batch: - 0050 par-ccl-algorithm-module-contract: how to author a new CCL algorithm in src/kernbench/ccl/algorithms/. Pairs with ADR-0045's bench-module contract. Pins the four required public symbols (kernel, kernel_args, TOPO_NAME_TO_KIND constants, kernel alias), the 9 + tl standardized kernel signature, the kernel_args tuple format, sip_topo_kind dispatch, and the ccl.yaml entry workflow. - 0051 lat-routing-helper-api: every public method of AddressResolver (resolve, find_m_cpu, find_pcie_ep, find_io_cpu, find_all_pcie_eps) and PathRouter (find_path, find_path_with_distance, find_mcpu_dma_path, find_memory_path, find_node_path + 2 shims). Pins the four adjacency graphs (_adj_all / _adj / _adj_mcpu_dma / _adj_local) and the edge-kind exclusion sets they use, plus the single-owner naming convention. - 0052 dev-oplog-memory-store-schemas: OpRecord's 7 fields, the per-op_name params matrix (dma_read, dma_write, gemm_*, math, math reduction, composite_gemm, ipcq_copy, unknown), snapshot timing rules (math = all inputs, dma_write = HBM-only — ADR-0027 race avoidance), TileToken stage_type capture, and MemoryStore's (space, addr) two-level dict with reference-store semantics. - 0053 dev-topology-builder-algorithms: the 6-stage compile pipeline, cube_mesh.yaml's source_hash cache and its 5 input fields, the cube NoC auto-layout algorithm (row/col placement, HBM exclusion zone, PE/M_CPU/SRAM attachment via nearest-router, UCIe N/S/E/W distribution), the node naming convention (single-owner with router.py), the edge-kind catalog, the 4 view projections, and a table of spec-field changes vs mesh regeneration. Bilingual pair verifier passes for all four EN/KO pairs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
ADR-0051: Routing Helper API — AddressResolver + PathRouter
Status
Accepted (2026-05-22).
Pins down every public API, argument, return value, and adjacency-graph
selection of the two helper classes (AddressResolver, PathRouter)
exposed by policy/routing/router.py. ADR-0002 defines routing
distance, ordering, and bypass rules, but the helper API surface
itself has had no ADR-level coverage.
First action
AddressResolver(graph)
On construction, caches two pieces of state:
self._node_ids = set(graph.nodes)— a set of all node ids for lookup.self._hbm_slice_bytes = hbm_total_gb * (1 << 30) // slices_per_cube— derived fromgraph.spec.cube.memory_map(default48 GB / 8 slices = 6 GB).resolve()uses this value to decodepe_idfrom an HBM PA'shbm_offset.
In short, AddressResolver's first act is "precompute the full set of node ids and the HBM slice size". It does not retain the graph itself.
PathRouter(graph)
On construction, builds four separate adjacency graphs in one pass:
self._adj_all: every edge (used for component-to-component routing).self._adj: edges withkind != "command"(PE DMA / generic data paths).self._adj_mcpu_dma: excludes_MCPU_DMA_EXCLUDE = {"pe_internal", "pe_to_router"}(M_CPU DMA must not pass through PE pipeline nodes).self._adj_local: excludes the 8-element_UCIE_KINDSset (UCIe would look like a zero-distance bus to Dijkstra, which would prefer it over the mesh — for cube-local routing this must be avoided).
Each graph is a defaultdict(list) of (neighbor, weight). The
weight is edge.routing_weight_mm or edge.distance_mm.
In short, PathRouter's first act is "classify topology edges into
four policy-specific adjacency lists simultaneously". Each find_*()
call picks the appropriate graph and runs Dijkstra.
Context
policy/routing/router.py performs two responsibilities together:
- Naming: it is the sole owner of the topology naming convention
(
sip{S}.cube{C}.<comp>,sip{S}.io{I}.pcie_ep, etc.). Components / probe / IPCQ install / runtime API do not build node-id strings themselves — they call helpers. - Path decisions: policy separation by
edge.kind. For the same src→dst, different routing intents (PE DMA vs M_CPU DMA vs general component routing) call for different adjacencies and so produce different paths.
This helper API is widely consumed (probe.py / distributed.py / install.py / various components / tests), yet the exact signatures / return semantics / adjacency picks are not gathered in any ADR. This ADR closes that gap.
Decision
D1. AddressResolver exposes five public methods
D1.1. resolve(addr: PhysAddr) -> str
Translates a PhysAddr to a destination node id in the topology:
addr.kind == "hbm" → f"sip{s}.cube{d}.hbm_ctrl.pe{pe_id}"
where pe_id = addr.hbm_offset // self._hbm_slice_bytes (ADR-0017 D4/D9)
addr.kind == "pe_resource":
addr.unit_type == PE → f"sip{s}.cube{d}.pe{addr.pe_id}.pe_tcm"
addr.unit_type == SRAM → f"sip{s}.cube{d}.sram"
addr.unit_type == MCPU → f"sip{s}.cube{d}.m_cpu"
others → RoutingError("unsupported unit_type")
other kinds → RoutingError("unsupported address kind")
If the derived node id is not in self._node_ids, raises
RoutingError(f"node {node_id} not found in topology"). So even when
the address has valid syntax, an absent node in the topology
fails-loud.
D1.2. find_m_cpu(sip, cube) -> str
Returns f"sip{sip}.cube{cube}.m_cpu"; absent → RoutingError.
D1.3. find_pcie_ep(sip, io_id="io0") -> str
Returns f"sip{sip}.{io_id}.pcie_ep"; absent → RoutingError.
D1.4. find_io_cpu(sip, io_id="io0") -> str
Returns f"sip{sip}.{io_id}.io_cpu"; absent → RoutingError.
D1.5. find_all_pcie_eps() -> list[str]
All PCIE_EP node ids across all SIPs, sorted. Filtered by
endswith(".pcie_ep"). Cross-SIP IPCQ uses this when enumerating
PCIE_EPs.
This class is the sole owner of the naming convention
(sip{S}.cube{C}.<comp>, sip{S}.{io_id}.<comp>) — ADR-0015 D4.
The topology builder produces nodes with the same naming convention;
components never build node-id strings directly — they go through
these helpers.
D2. PathRouter's four adjacency graphs
Constructed in one pass. edge.kind drives policy:
| graph | excluded edge kinds | use case |
|---|---|---|
_adj_all |
(none) | M_CPU↔NOC command included, IO_CPU/M_CPU routes |
_adj |
"command" |
PE DMA / generic data paths |
_adj_mcpu_dma |
"pe_internal", "pe_to_router" |
M_CPU DMA (skips PE pipeline) |
_adj_local |
_UCIE_KINDS (ucie_internal, ucie_conn_to_router, router_to_ucie_conn, ucie_conn_to_noc, noc_to_ucie_conn, ucie_mesh, io_to_cube, cube_to_io) |
same-cube routing (UCIe bus excluded) |
Each graph is dict[node_id, list[(neighbor, weight)]] with weight =
edge.routing_weight_mm or edge.distance_mm. Excluding command edges
prevents them from influencing routing; isolating _adj_local keeps
UCIe's "zero-distance bus" from out-competing the mesh — consistent
with ADR-0017 D7's cross-PE-slice mesh-distance requirement.
D3. PathRouter exposes six public methods (+ two backward-compat shims)
D3.1. find_path(src_pe: str, dst_node: str) -> list[str]
PE DMA routing. src_pe is a PE prefix (e.g.,
"sip0.cube0.pe0"); the function auto-prepends .pe_dma, making the
true start node "sip0.cube0.pe0.pe_dma".
Adjacency depends on cube-locality (_same_cube):
- Same-cube (src and dst share
sip{S}.cube{C}.prefix): uses_adj_local. Excluding UCIe lets cross-PE-slice access pay accurate mesh distance (ADR-0017 D7). - Cross-cube: uses
_adj. UCIe naturally becomes the right choice for the cross-cube portion.
D3.2. find_path_with_distance(src_pe, dst_node) -> tuple[list[str], float]
Same adjacency policy as D3.1, but returns (path, total_distance).
Used by probe and analysis tools that need the distance metric.
D3.3. find_mcpu_dma_path(m_cpu_id: str, dst_hbm_id: str) -> list[str]
M_CPU DMA path. Same cube → _adj_local (stay within the mesh);
different cube → _adj_all (cross via UCIe). The
_MCPU_DMA_EXCLUDE set ensures PE-pipeline nodes never appear on
M_CPU's routes.
D3.4. find_memory_path(src: str, dst: str) -> list[str]
Direct memory path like
pcie_ep → io_noc → cube → router mesh → hbm_ctrl. Uses
_adj_mcpu_dma to exclude pe_internal and pe_to_router, so
host-issued reads/writes never leak into the PE pipeline. Probe
(ADR-0049 D1's H2D/D2H cases) calls this directly.
D3.5. find_node_path(src: str, dst: str) -> list[str]
Generic routing between arbitrary nodes, including command edges
(via _adj_all). IoCpuComponent / MCpuComponent use this when they
need to route through M_CPU ↔ NOC command-kind links.
D3.6. Backward-compat shims
_dijkstra(start, goal) -> list[str]— thin wrapper for_run_dijkstra(self._adj, …)._dijkstra_with_dist(start, goal) -> tuple[list[str], float]— distance-aware variant.
Despite the underscore prefixes (suggesting internal API), existing tests call these directly. New code should prefer D3.1–D3.5; these two shims are deprecation candidates.
D4. Dijkstra — single-source shortest path
_run_dijkstra_with_dist(adj, start, goal):
heapqpriority queue.best: dict[node, distance]— best known distance to each node.prev: dict[node, predecessor]— for path reconstruction.- Edge weight =
routing_weight_mm or distance_mm. The separation matters because UCIe (and a few others) declare an explicitrouting_weight_mmdistinct from physicaldistance_mm.
start == goal short-circuits to ([start], 0.0). Unreachable target
→ RoutingError(f"no path from {start} to {goal}").
The algorithm is deterministic: identical graph + start/goal gives
the same path, satisfying SPEC R1 ("routing MUST be deterministic").
Tie-breaks follow heapq's push order (Python list order is
deterministic).
D5. Single-owner principle for helper-API decisions
The following decisions live only inside router.py:
- Naming convention:
sip{S}.cube{C}.<comp>,sip{S}.{io_id}.<comp>,sip{S}.cube{C}.hbm_ctrl.pe{pe_id}. - Adjacency policy: which edge kinds belong to which graph.
- Algorithm for recovering PE id from an HBM slice size.
- Dijkstra weight selection
(
routing_weight_mm or distance_mm).
Breaking single ownership (e.g., a component starting to build
f"sip{s}..." itself) would explode the blast radius of naming-
convention changes. This aligns with ADR-0015 D4.
D6. Consumers of the helper API
Methods listed in this ADR are called from (current corpus):
probes/probe.py(ADR-0049):find_pcie_ep,find_io_cpu,find_m_cpu,find_node_path,find_mcpu_dma_path,find_memory_path,find_path,resolve.runtime_api/distributed.py(ADR-0047): indirectly (engine-internal routing).ccl/install.py(ADR-0023):find_all_pcie_eps,resolve.sim_engine/event_log.py: like probe —find_pcie_ep,find_memory_path.components/builtin/m_cpu.py,components/builtin/io_cpu.py:find_node_path,find_mcpu_dma_path.- Tests (test_routing.py, test_cross_sip_routing.py, …): most of D3.1–D3.5.
When a new consumer arrives, D1/D3 act as a first-pass guide on whether an existing method matches the intent or a new one is needed.
Alternatives Considered
A1. One adjacency graph + per-call edge-kind filtering
Rejected. Re-filtering the graph on every find_*() call hurts
Dijkstra cache locality. Constructing four graphs in one pass (D2)
has modest memory cost (edges ≤ a few × 10⁴), and selection happens
in O(1) at call time.
A2. Drive adjacency separation by separate edge metadata rather than kind
Rejected. edge.kind is already assigned by the topology builder
(ADR-0015 D4 + ADR-0017); a parallel metadata field would force
synchronization between two systems.
A3. Use BFS with uniform weights instead of Dijkstra
Rejected. With per-edge routing_weight_mm (mesh link / UCIe /
IO-internal), BFS minimizes hop count rather than total
latency/distance. SPEC R1 + R2 require deterministic and accurate
routing, which BFS does not deliver.
A4. Express the helper API as module functions instead of classes
Rejected. Each class
(AddressResolver, PathRouter) maintains caches
(_node_ids, _hbm_slice_bytes, four adjacency graphs) reused across
many routing queries on the same graph. Module functions would have
to rebuild state per call or go global, hurting safety and
performance.
Consequences
- When components / probe / IPCQ install / runtime API all go through
router.py helpers, a naming-convention change (e.g.,
.io0.→.iochiplet0.) is a one-file edit (D5). - D2's four-graph split is now ADR-locked, so when a new edge kind is added (e.g., a new inter-die UCIe-link kind), the right adjacency category is decided explicitly rather than by default.
- D3.1's same-cube vs cross-cube branching (ADR-0017 D7) is explicit, so anyone changing routing knows which adjacency to touch.
- D6's consumer list bounds PR-review scope for helper-API changes, and the backward-compat shims (D3.6) are flagged as deprecation candidates.