Files
kernbench2/docs/adr/ADR-0001-physaddr-layout.md
2026-03-18 11:47:48 -07:00

109 lines
4.1 KiB
Markdown

# ADR-0001: PhysAddr Layout & Address Decoding Contract
## Status
Accepted
## Date
2026-02-27
## Context
KernBench Graph Latency Simulator must route requests deterministically and compute end-to-end latency strictly by graph traversal.
To model local vs remote traffic (same/different SIP, same/different CUBE, optional PE-group), requests need a stable, parsable address/location scheme that:
- can be decoded into routing domains (SIP/CUBE/HBM/PE-resource, etc.)
- remains topology-agnostic (no hardcoded counts)
- supports swappable policy and DI-first components without leaking topology assumptions into node implementations
## Decision
We define a **PhysAddr value object** and an **address decoding contract** that converts an integer address into routing domains.
### D1. PhysAddr is an immutable value object
- PhysAddr is immutable and comparable as a pure value.
- Any allocator returns a **fully specified PhysAddr** (not partial metadata).
- No global state may be required to interpret a PhysAddr.
### D2. PhysAddr fields (logical contract)
PhysAddr must be able to represent at least:
- `rack_id` (optional but reserved for scale-out)
- `sip_id` (device / SIP domain)
- `sip_seg` (SIP-level segment/window selection, e.g., cube window)
- `local_offset` (offset within the chosen segment/window)
Decoded/derived fields may include (optional):
- `cube_id`
- `kind` (e.g., HBM vs PE-resource vs raw)
- `unit_type` / `pe_id` (if PE-level addressing is modeled)
**Important:** The exact bit allocation may evolve, but the *semantic fields above* must remain decodable without hidden assumptions.
### D3. Decoding is deterministic and policy-compatible
- Decoding must deterministically map an integer address to:
- destination SIP domain (`sip_id`)
- destination sub-domain (`cube_id` if applicable)
- destination target kind (HBM/PE-resource/other)
- Decoding must not depend on runtime topology sizes; it may depend on **explicit topology parameters** provided through configuration (e.g., segment size, slice size), and those parameters must live in the topology/config layer (not in random components).
### D4. Topology-derived constants live in the topology layer
Constants such as segment sizes (e.g., HBM slice size / window size) are derived from topology configuration (YAML/JSON/dict) and are provided to the decoder via DI/config.
They must not be hardcoded in node implementations.
### D5. Routing consumes decoded domains, not raw bits
Routing policy uses decoded domains:
- `src` location (sip/cube/pe or node_id)
- `dst` domains derived from PhysAddr decoding
- `size_bytes` for size-aware link latency
Routing must not inspect raw bit-fields directly except inside the decoding module.
## Alternatives Considered
1) **Use raw integers everywhere, decode ad-hoc in routing**
- Rejected: leads to duplicated logic, inconsistent routing, and hidden assumptions embedded in multiple components.
1) **Hardcode topology sizes (SIP/CUBE/PE counts) into decoding**
- Rejected: violates SPEC (R3) and breaks swappability and configuration-driven topologies.
1) **Put decoding inside memory controllers or routers**
- Rejected: leaks policy into components and undermines DI-first, swappable implementations (SPEC R4).
## Consequences
### Positive
- Deterministic routing domains enable clear test invariants for local vs remote paths (SPEC R1, R5).
- Keeps topology variability (SPEC R3) while preserving consistent semantics.
- DI-first: decoder can be swapped or extended without changing components or tests (SPEC R4).
### Tradeoffs / Costs
- Requires explicit configuration for any topology-derived sizes.
- Introduces a single “blessed” decoding module that must remain stable and well-tested.
## Implementation Notes (Non-normative)
- Recommended module boundary:
- `src/kernbench/policy/address/phyaddr.py`
- Tests should cover:
- deterministic decoding
- local vs remote classification from decoded fields
- invariants: “allocator returns full PhysAddr”, “decoding requires no global state”
## Links
- SPEC.md: R1 (routing), R3 (configurable topology), R4 (DI-first), R5 (multi-domain comm)