Files
kernbench2/docs/adr/ADR-0001-physaddr-layout.md
2026-03-18 11:47:48 -07:00

4.1 KiB

ADR-0001: PhysAddr Layout & Address Decoding Contract

Status

Accepted

Date

2026-02-27

Context

KernBench Graph Latency Simulator must route requests deterministically and compute end-to-end latency strictly by graph traversal. To model local vs remote traffic (same/different SIP, same/different CUBE, optional PE-group), requests need a stable, parsable address/location scheme that:

  • can be decoded into routing domains (SIP/CUBE/HBM/PE-resource, etc.)
  • remains topology-agnostic (no hardcoded counts)
  • supports swappable policy and DI-first components without leaking topology assumptions into node implementations

Decision

We define a PhysAddr value object and an address decoding contract that converts an integer address into routing domains.

D1. PhysAddr is an immutable value object

  • PhysAddr is immutable and comparable as a pure value.
  • Any allocator returns a fully specified PhysAddr (not partial metadata).
  • No global state may be required to interpret a PhysAddr.

D2. PhysAddr fields (logical contract)

PhysAddr must be able to represent at least:

  • rack_id (optional but reserved for scale-out)
  • sip_id (device / SIP domain)
  • sip_seg (SIP-level segment/window selection, e.g., cube window)
  • local_offset (offset within the chosen segment/window)

Decoded/derived fields may include (optional):

  • cube_id
  • kind (e.g., HBM vs PE-resource vs raw)
  • unit_type / pe_id (if PE-level addressing is modeled)

Important: The exact bit allocation may evolve, but the semantic fields above must remain decodable without hidden assumptions.

D3. Decoding is deterministic and policy-compatible

  • Decoding must deterministically map an integer address to:
    • destination SIP domain (sip_id)
    • destination sub-domain (cube_id if applicable)
    • destination target kind (HBM/PE-resource/other)
  • Decoding must not depend on runtime topology sizes; it may depend on explicit topology parameters provided through configuration (e.g., segment size, slice size), and those parameters must live in the topology/config layer (not in random components).

D4. Topology-derived constants live in the topology layer

Constants such as segment sizes (e.g., HBM slice size / window size) are derived from topology configuration (YAML/JSON/dict) and are provided to the decoder via DI/config. They must not be hardcoded in node implementations.

D5. Routing consumes decoded domains, not raw bits

Routing policy uses decoded domains:

  • src location (sip/cube/pe or node_id)
  • dst domains derived from PhysAddr decoding
  • size_bytes for size-aware link latency Routing must not inspect raw bit-fields directly except inside the decoding module.

Alternatives Considered

  1. Use raw integers everywhere, decode ad-hoc in routing
  • Rejected: leads to duplicated logic, inconsistent routing, and hidden assumptions embedded in multiple components.
  1. Hardcode topology sizes (SIP/CUBE/PE counts) into decoding
  • Rejected: violates SPEC (R3) and breaks swappability and configuration-driven topologies.
  1. Put decoding inside memory controllers or routers
  • Rejected: leaks policy into components and undermines DI-first, swappable implementations (SPEC R4).

Consequences

Positive

  • Deterministic routing domains enable clear test invariants for local vs remote paths (SPEC R1, R5).
  • Keeps topology variability (SPEC R3) while preserving consistent semantics.
  • DI-first: decoder can be swapped or extended without changing components or tests (SPEC R4).

Tradeoffs / Costs

  • Requires explicit configuration for any topology-derived sizes.
  • Introduces a single “blessed” decoding module that must remain stable and well-tested.

Implementation Notes (Non-normative)

  • Recommended module boundary:

    • src/kernbench/policy/address/phyaddr.py
  • Tests should cover:

    • deterministic decoding
    • local vs remote classification from decoded fields
    • invariants: “allocator returns full PhysAddr”, “decoding requires no global state”
  • SPEC.md: R1 (routing), R3 (configurable topology), R4 (DI-first), R5 (multi-domain comm)