# ADR-0002: Routing Distance, Ordering & Bypass Rules ## Status Accepted ## Date 2026-02-27 ## Context The KernBench Graph Latency Simulator must compare kernel execution time across different architectures and topologies by computing end-to-end latency from graph traversal. To support meaningful comparison: - routing must be deterministic - latency must reflect actual interconnect structure - local vs remote traffic must be distinguishable - “bypass” optimizations must not undermine debuggability or correctness The simulator also aims to avoid software-managed metadata and hidden shortcuts that obscure control paths. ## Decision ### D1. Distance is accumulated latency, not hop count - Routing “distance” is defined as the **sum of per-node and per-link latency**. - Hop count alone must not be used for ordering or path selection. - Size-aware serialization latency (bytes / BW) contributes to distance. ### D2. Routing order is derived from graph traversal - The chosen route is the path with minimum accumulated latency given the constructed graph and routing policy. - Deterministic ordering must be guaranteed for identical inputs (topology + policy + request). ### D3. Bypass is explicit and graph-represented - Any bypass (e.g., local cube HBM access via XBAR instead of NOC) must be: - explicitly represented as a graph path, and - subject to latency accumulation like any other path. - Example: PE_DMA has dual egress — one to XBAR (HBM path) and one to NOC (non-HBM path). Both are explicit graph edges; neither is a “bypass” — they are distinct data paths serving different memory domains. - Implicit or “magic” bypass paths are disallowed. ### D4. No zero-latency end-to-end paths - Every routed request must incur **end-to-end** latency > 0. - Individual fabric segments (e.g., NOC hops) MAY have distance_mm = 0 when the fabric is distributed and distance is not meaningful at that granularity. This is allowed because other components on the same path (e.g., PE_DMA, SRAM, UCIe endpoints) contribute non-zero latency, ensuring the end-to-end invariant holds. - Fully zero-latency end-to-end paths are disallowed, except for explicit test-only stubs clearly marked as such. ### D5. Policy vs topology responsibility split - Topology builder: - defines nodes and links and their latency/BW parameters - Routing policy: - selects among available graph paths based on decoded domains - Routing policy must not assume missing links; missing connectivity is a topology construction error. ### D6. No software-managed routing metadata - Routing decisions must not rely on per-request software-managed metadata that tracks distance, hop count, or ordering outside the graph model. - All distance/order computation is derived from traversal itself. ## Alternatives Considered 1) **Hop-count based routing** - Rejected: ignores heterogeneous latency/BW and misrepresents architectural differences. 2) **Implicit local shortcuts** - Rejected: breaks debuggability and violates traversal-based latency. 3) **Software-managed distance metadata** - Rejected: increases control overhead and obscures routing semantics. ## Consequences ### Positive - Clear, debuggable hop-by-hop traces (SPEC R2, R4). - Architecture comparisons reflect real interconnect structure. - Routing behavior is reproducible and deterministic. ### Tradeoffs / Costs - Graph construction must be correct and complete. - Bypass modeling requires explicit graph representation, which slightly increases topology description complexity. ## Implementation Notes (Non-normative) - Recommended responsibilities: - Graph builder: ensure all required paths exist. - Router: select next hop based on decoded domains and policy. - Tests should assert: - non-zero end-to-end latency - deterministic routing for identical inputs - bypass paths appear explicitly in emitted traces ## Links - SPEC.md: R1 (routing), R2 (latency), R3 (topology), R5 (multi-domain comm) - ADR-0001: PhysAddr layout & decoding contract