ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables
Remove rack_id (4 bits), rename sip_seg→die_id, shift fields to enable 42-bit local_offset (4 TB per die). Define PE_LOCAL/MCPU_LOCAL/CUBE_SRAM sub-unit tables for AHBM dies and IOCPU sub-unit table for IOCHIPLET dies (1 TB window). Supersedes ADR-0031. Also fixes latent VA/PA confusion in pe_dma pipeline DMA path where virtual addresses were decoded as physical addresses without MMU translation — previously masked by coincidental bit-position alignment. 529 passed (+6 recovered), 10 pre-existing failures unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,25 +1,39 @@
|
||||
# ADR-0001: PhysAddr Layout & Address Decoding Contract
|
||||
# ADR-0001: 51-bit Physical Address Layout & Decoding Contract
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
Accepted (Revision 2 — 2026-04-27: concrete bit layout, rack_id removal,
|
||||
Tray->SIP / SIP->DIE renaming, PE/MCPU/IOCPU sub-unit tables.
|
||||
Supersedes ADR-0031.)
|
||||
|
||||
## Date
|
||||
|
||||
2026-02-27
|
||||
2026-04-27 (original: 2026-02-27)
|
||||
|
||||
## Context
|
||||
|
||||
KernBench Graph Latency Simulator must route requests deterministically and compute end-to-end latency strictly by graph traversal.
|
||||
To model local vs remote traffic (same/different SIP, same/different CUBE, optional PE-group), requests need a stable, parsable address/location scheme that:
|
||||
KernBench requires a stable, parsable physical address scheme that:
|
||||
|
||||
- can be decoded into routing domains (SIP/CUBE/HBM/PE-resource, etc.)
|
||||
- can be decoded into routing domains (SIP / die / HBM / PE-resource / IOCPU)
|
||||
- remains topology-agnostic (no hardcoded counts)
|
||||
- supports swappable policy and DI-first components without leaking topology assumptions into node implementations
|
||||
- supports swappable policy and DI-first components
|
||||
- covers multiple SIPs, AHBM dies, and IO chiplet dies in a unified space
|
||||
|
||||
### History
|
||||
|
||||
- Original ADR-0001 defined a 51-bit layout with `rack_id(4) + sip_id(4) +
|
||||
sip_seg(5) + local_offset(38)`. `rack_id` was never used in practice.
|
||||
- ADR-0031 (stub) requested PE-resource range partition but was never
|
||||
implemented.
|
||||
|
||||
Revision 2 removes `rack_id`, renames `sip_seg -> die_id`, and provides
|
||||
concrete sub-unit tables for PE, MCPU, CUBE_SRAM, and IOCPU resources.
|
||||
ADR-0031 is superseded.
|
||||
|
||||
## Decision
|
||||
|
||||
We define a **PhysAddr value object** and an **address decoding contract** that converts an integer address into routing domains.
|
||||
We define a **PhysAddr value object** and an **address decoding contract**
|
||||
that converts an integer address into routing domains.
|
||||
|
||||
### D1. PhysAddr is an immutable value object
|
||||
|
||||
@@ -27,82 +41,322 @@ We define a **PhysAddr value object** and an **address decoding contract** that
|
||||
- Any allocator returns a **fully specified PhysAddr** (not partial metadata).
|
||||
- No global state may be required to interpret a PhysAddr.
|
||||
|
||||
### D2. PhysAddr fields (logical contract)
|
||||
### D2. 51-bit Physical Address Layout
|
||||
|
||||
PhysAddr must be able to represent at least:
|
||||
A 51-bit physical address is adopted.
|
||||
|
||||
- `rack_id` (optional but reserved for scale-out)
|
||||
- `sip_id` (device / SIP domain)
|
||||
- `sip_seg` (SIP-level segment/window selection, e.g., cube window)
|
||||
- `local_offset` (offset within the chosen segment/window)
|
||||
#### 2.1 Top-Level Address Map
|
||||
|
||||
Decoded/derived fields may include (optional):
|
||||
```text
|
||||
[50:47] sip_id (4) -- 16 SIPs
|
||||
[46:42] die_id (5) -- 32 dies per SIP
|
||||
[41: 0] local_offset (42) -- 4 TB per die
|
||||
```
|
||||
|
||||
- `cube_id`
|
||||
- `kind` (e.g., HBM vs PE-resource vs raw)
|
||||
- `unit_type` / `pe_id` (if PE-level addressing is modeled)
|
||||
```text
|
||||
50 47 46 42 41 0
|
||||
+---------+----------+-------------------------+
|
||||
| sip_id | die_id | local_offset |
|
||||
+---------+----------+-------------------------+
|
||||
```
|
||||
|
||||
**Important:** The exact bit allocation may evolve, but the *semantic fields above* must remain decodable without hidden assumptions.
|
||||
#### 2.2 die_id Allocation
|
||||
|
||||
### D3. Decoding is deterministic and policy-compatible
|
||||
| die_id | Meaning |
|
||||
|--------|---------|
|
||||
| 0..15 | AHBM dies |
|
||||
| 16..20 | IOCHIPLET dies |
|
||||
| 21..31 | Reserved |
|
||||
|
||||
- Decoding must deterministically map an integer address to:
|
||||
- destination SIP domain (`sip_id`)
|
||||
- destination sub-domain (`cube_id` if applicable)
|
||||
- destination target kind (HBM/PE-resource/other)
|
||||
- Decoding must not depend on runtime topology sizes; it may depend on **explicit topology parameters** provided through configuration (e.g., segment size, slice size), and those parameters must live in the topology/config layer (not in random components).
|
||||
#### 2.3 AHBM Die Layout
|
||||
|
||||
### D4. Topology-derived constants live in the topology layer
|
||||
Only lower 256 GB of the 4 TB die-local window is assigned.
|
||||
|
||||
Constants such as segment sizes (e.g., HBM slice size / window size) are derived from topology configuration (YAML/JSON/dict) and are provided to the decoder via DI/config.
|
||||
They must not be hardcoded in node implementations.
|
||||
```text
|
||||
[41:38] MBZ (4)
|
||||
[37] addr_space (1) -- 0 = local resource, 1 = HBM memory
|
||||
[36: 0] sub-address (37)
|
||||
```
|
||||
|
||||
| addr_space | Meaning |
|
||||
|------------|---------|
|
||||
| 0 | Local resource |
|
||||
| 1 | HBM memory |
|
||||
|
||||
##### 2.3.1 HBM Window (addr_space = 1)
|
||||
|
||||
```text
|
||||
[36:0] hbm_offset (37) -- 128 GB decode window
|
||||
```
|
||||
|
||||
The architectural decode window is fixed at 128 GB. Implemented capacity
|
||||
may be smaller depending on SKU/topology (see D4).
|
||||
|
||||
##### 2.3.2 Resource Window (addr_space = 0)
|
||||
|
||||
```text
|
||||
[36:34] resource_kind (3)
|
||||
[33: 0] kind_local (34) -- 16 GB per kind
|
||||
```
|
||||
|
||||
| resource_kind | Meaning |
|
||||
|---------------|---------|
|
||||
| 000 | PE_LOCAL |
|
||||
| 001 | MCPU_LOCAL |
|
||||
| 010 | CUBE_SRAM |
|
||||
| 011..111 | Reserved |
|
||||
|
||||
Each kind gets a 16 GB decode region.
|
||||
|
||||
##### 2.3.3 PE_LOCAL (resource_kind = 000)
|
||||
|
||||
```text
|
||||
[33] MBZ (1)
|
||||
[32:29] pe_id (4) -- 0..15
|
||||
[28:25] pe_sub_unit (4)
|
||||
[24: 0] sub_offset (25) -- 32 MB per slot
|
||||
```
|
||||
|
||||
16 PEs x 16 sub-unit slots x 32 MB = 8 GB active decode.
|
||||
|
||||
| pe_sub_unit | Name | Budget |
|
||||
|-------------|------|--------|
|
||||
| 0 | PE_CPU_DTCM | 8 KB |
|
||||
| 1 | MATH_ENGINE_DTCM | 8 KB |
|
||||
| 2 | IPCQ | 256 KB |
|
||||
| 3 | PE_CPU_SFR | 16 KB |
|
||||
| 4 | MATH_ENGINE_SFR | 16 KB |
|
||||
| 5 | DMA_ENGINE_SFR | 192 KB |
|
||||
| 6 | PE_TCM | 2 MB |
|
||||
| 7..15 | Reserved | -- |
|
||||
|
||||
##### 2.3.4 MCPU_LOCAL (resource_kind = 001)
|
||||
|
||||
```text
|
||||
[33:30] MBZ (4)
|
||||
[29:25] mcpu_sub_unit (5)
|
||||
[24: 0] sub_offset (25) -- 32 MB per slot
|
||||
```
|
||||
|
||||
1 GB active decode.
|
||||
|
||||
| mcpu_sub_unit | Name | Budget |
|
||||
|---------------|------|--------|
|
||||
| 0 | MCPU_ITCM | 512 KB |
|
||||
| 1 | MCPU_DTCM | 512 KB |
|
||||
| 2 | IPCQ | 256 KB |
|
||||
| 3 | MCPU_SFR | 8 KB |
|
||||
| 4 | MCPU_DMA_SFR | 16 KB |
|
||||
| 5 | MCPU_SRAM | 10 MB |
|
||||
| 6..31 | Reserved | -- |
|
||||
|
||||
##### 2.3.5 CUBE_SRAM (resource_kind = 010)
|
||||
|
||||
```text
|
||||
[33:25] MBZ (9)
|
||||
[24: 0] sram_offset (25) -- flat 32 MB
|
||||
```
|
||||
|
||||
#### 2.4 IOCHIPLET Die Layout
|
||||
|
||||
Only lower 1 TB of the 4 TB die-local window is assigned.
|
||||
|
||||
```text
|
||||
[41:40] MBZ (2)
|
||||
[39: 0] chiplet_offset (40) -- 1 TB
|
||||
```
|
||||
|
||||
Region split by address range:
|
||||
|
||||
| Range | Meaning | Decode condition |
|
||||
|-------|---------|------------------|
|
||||
| [0, 2 GB) | IOCPU resource | chiplet_offset < 0x8000_0000 |
|
||||
| [2 GB, 1 TB) | UAL | chiplet_offset >= 0x8000_0000 |
|
||||
|
||||
##### 2.4.1 IOCPU Region
|
||||
|
||||
```text
|
||||
[30:27] iocpu_sub_unit (4)
|
||||
[26: 0] sub_offset (27) -- 128 MB per slot
|
||||
```
|
||||
|
||||
16 x 128 MB slots. 2 GB active decode.
|
||||
|
||||
| iocpu_sub_unit | Name | Budget |
|
||||
|----------------|------|--------|
|
||||
| 0 | IOCPU_ITCM | 512 KB |
|
||||
| 1 | IOCPU_DTCM | 512 KB |
|
||||
| 2 | IPCQ | 2 MB |
|
||||
| 3 | IOCPU_SFR | 8 KB |
|
||||
| 4 | IO_DMA_SFR | 16 KB |
|
||||
| 5 | IO_SRAM | 64 MB |
|
||||
| 6..15 | Reserved | -- |
|
||||
|
||||
##### 2.4.2 UAL Region
|
||||
|
||||
Sub-layout TBD (separate ADR).
|
||||
|
||||
#### 2.5 Addressing Rules
|
||||
|
||||
1. MBZ bits must be zero. An address with non-zero MBZ bits is
|
||||
**architecturally invalid**. Implementation may raise a decode fault
|
||||
or return an error -- behavior is not prescribed by this ADR.
|
||||
2. Fixed slot sizes are chosen for simple hardware decode; actual
|
||||
implemented capacity may be smaller than the slot.
|
||||
3. Access beyond a sub-unit's implemented budget within a slot is
|
||||
**architecturally invalid** (same policy as MBZ).
|
||||
|
||||
### D3. Bitfield decoding is deterministic
|
||||
|
||||
Given an integer address, field extraction (`sip_id`, `die_id`, `kind`,
|
||||
`sub_unit`, `offset`) is purely positional. No runtime state is required.
|
||||
Decoding deterministically maps an integer address to destination domains:
|
||||
`sip_id`, `die_id`, target kind (HBM / PE_LOCAL / MCPU_LOCAL / CUBE_SRAM /
|
||||
IOCPU / UAL).
|
||||
|
||||
### D4. Capacity validation may depend on topology config
|
||||
|
||||
Whether a decoded address falls within **implemented capacity** (e.g.,
|
||||
HBM 96 GB on a specific SKU) is checked against topology parameters
|
||||
provided via DI/config. Decode itself (D3) never consults topology --
|
||||
only validation does. These parameters must live in the topology/config
|
||||
layer, not in node implementations.
|
||||
|
||||
### D5. Routing consumes decoded domains, not raw bits
|
||||
|
||||
Routing policy uses decoded domains:
|
||||
|
||||
- `src` location (sip/cube/pe or node_id)
|
||||
- `src` location (sip / die / pe or node_id)
|
||||
- `dst` domains derived from PhysAddr decoding
|
||||
- `size_bytes` for size-aware link latency
|
||||
Routing must not inspect raw bit-fields directly except inside the decoding module.
|
||||
|
||||
Routing must not inspect raw bit-fields directly except inside the
|
||||
decoding module.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
1) **Use raw integers everywhere, decode ad-hoc in routing**
|
||||
1. **Keep `rack_id` (4 bits)**: Rejected -- never used in practice,
|
||||
consumes 4 bits that enable die-local expansion to 42 bits
|
||||
(IOCHIPLET 1 TB).
|
||||
|
||||
- Rejected: leads to duplicated logic, inconsistent routing, and hidden assumptions embedded in multiple components.
|
||||
2. **Uniform 256 GB per die**: Rejected -- IOCHIPLET UAL requires ~1 TB.
|
||||
Freed rack_id bits enable 42-bit local_offset.
|
||||
|
||||
1) **Hardcode topology sizes (SIP/CUBE/PE counts) into decoding**
|
||||
3. **Variable-width die windows (AHBM 256 GB, CHIPLET 1 TB via multi-seg
|
||||
spanning)**: Rejected -- complicates D3 (deterministic decoding).
|
||||
Uniform 4 TB window with MBZ padding is simpler.
|
||||
|
||||
- Rejected: violates SPEC (R3) and breaks swappability and configuration-driven topologies.
|
||||
4. **Use raw integers everywhere, decode ad-hoc in routing**: Rejected --
|
||||
leads to duplicated logic, inconsistent routing, and hidden
|
||||
assumptions.
|
||||
|
||||
1) **Put decoding inside memory controllers or routers**
|
||||
5. **Hardcode topology sizes (SIP/CUBE/PE counts) into decoding**:
|
||||
Rejected -- violates SPEC R3 and breaks swappability.
|
||||
|
||||
- Rejected: leaks policy into components and undermines DI-first, swappable implementations (SPEC R4).
|
||||
6. **Put decoding inside memory controllers or routers**: Rejected --
|
||||
leaks policy into components, violates SPEC R4 / D5.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Deterministic routing domains enable clear test invariants for local vs remote paths (SPEC R1, R5).
|
||||
- Keeps topology variability (SPEC R3) while preserving consistent semantics.
|
||||
- DI-first: decoder can be swapped or extended without changing components or tests (SPEC R4).
|
||||
- Simple hierarchical decoder: SIP -> die -> kind -> sub-unit.
|
||||
- Clean separation of memory (HBM) vs local resource (PE/MCPU/SRAM/IOCPU).
|
||||
- Deterministic routing domains enable clear test invariants (SPEC R1, R5).
|
||||
- Expandable: 11 reserved die_id slots, reserved resource_kind / sub-unit
|
||||
slots, reserved MBZ bits.
|
||||
- DI-first: decoder can be swapped without changing components (SPEC R4).
|
||||
|
||||
### Tradeoffs / Costs
|
||||
### Tradeoffs
|
||||
|
||||
- Requires explicit configuration for any topology-derived sizes.
|
||||
- Introduces a single “blessed” decoding module that must remain stable and well-tested.
|
||||
- Sparse address holes due to power-of-2 slot alignment.
|
||||
- Large reserved/MBZ regions (intentional for future extension).
|
||||
- Requires explicit configuration for topology-derived sizes (D4).
|
||||
- Introduces a single "blessed" decoding module that must remain stable
|
||||
and well-tested.
|
||||
|
||||
## Supersedes
|
||||
|
||||
- **ADR-0031 (PhysAddr PE-Resource Extension)**: stub status. The
|
||||
PE_LOCAL / MCPU_LOCAL / CUBE_SRAM sub-unit tables in D2.3.3-D2.3.5
|
||||
fulfill ADR-0031's stated goals.
|
||||
|
||||
## Implementation Notes (Non-normative)
|
||||
|
||||
- Recommended module boundary:
|
||||
- `src/kernbench/policy/address/phyaddr.py`
|
||||
- Recommended module: `src/kernbench/policy/address/phyaddr.py`
|
||||
- Tests should cover: encode/decode round-trip per kind, MBZ enforcement,
|
||||
die_id dispatch (AHBM / IOCHIPLET / reserved), sub-unit boundary
|
||||
values, backward compatibility of factory APIs.
|
||||
- Factory methods: `hbm_addr`, `pe_hbm_addr`, `pe_tcm_addr`,
|
||||
`cube_sram_addr` retain signatures (minus `rack_id`); `cube_id`
|
||||
parameter renamed to `die_id`.
|
||||
- New factories: `pe_resource_addr`, `mcpu_resource_addr`,
|
||||
`iocpu_resource_addr`, `ual_addr`.
|
||||
|
||||
- Tests should cover:
|
||||
- deterministic decoding
|
||||
- local vs remote classification from decoded fields
|
||||
- invariants: “allocator returns full PhysAddr”, “decoding requires no global state”
|
||||
## Appendix A. Address Examples
|
||||
|
||||
### A.1 AHBM HBM access
|
||||
|
||||
sip=2, die=5, HBM offset=0x1000
|
||||
|
||||
```text
|
||||
sip_id = 2 -> [50:47] = 0b0010
|
||||
die_id = 5 -> [46:42] = 0b00101
|
||||
addr_space = 1 -> [37] = 1 (HBM)
|
||||
hbm_offset = 0x1000 -> [36:0]
|
||||
|
||||
51-bit addr = (2 << 47) | (5 << 42) | (1 << 37) | 0x1000
|
||||
```
|
||||
|
||||
### A.2 AHBM PE_LOCAL -- PE3 PE_TCM, offset=0x400
|
||||
|
||||
```text
|
||||
sip_id = 0 -> [50:47] = 0
|
||||
die_id = 0 -> [46:42] = 0
|
||||
addr_space = 0 -> [37] = 0
|
||||
resource_kind = 0 -> [36:34] = 000 (PE_LOCAL)
|
||||
pe_id = 3 -> [32:29] = 0011
|
||||
pe_sub_unit = 6 -> [28:25] = 0110 (PE_TCM)
|
||||
sub_offset = 0x400 -> [24:0]
|
||||
|
||||
local_offset = (0 << 34) | (3 << 29) | (6 << 25) | 0x400
|
||||
```
|
||||
|
||||
### A.3 AHBM MCPU_LOCAL -- MCPU_SRAM, offset=0x0
|
||||
|
||||
```text
|
||||
sip_id = 1 -> [50:47] = 0001
|
||||
die_id = 3 -> [46:42] = 00011
|
||||
addr_space = 0 -> [37] = 0
|
||||
resource_kind = 1 -> [36:34] = 001 (MCPU_LOCAL)
|
||||
mcpu_sub_unit = 5 -> [29:25] = 00101 (MCPU_SRAM)
|
||||
sub_offset = 0 -> [24:0] = 0
|
||||
|
||||
local_offset = (1 << 34) | (5 << 25)
|
||||
```
|
||||
|
||||
### A.4 IOCHIPLET -- IOCPU IPCQ, offset=0x20000
|
||||
|
||||
```text
|
||||
sip_id = 1 -> [50:47] = 0001
|
||||
die_id = 17 -> [46:42] = 10001 (IOCHIPLET[1])
|
||||
iocpu_sub_unit = 2 -> [30:27] = 0010 (IPCQ)
|
||||
sub_offset = 0x20000 -> [26:0]
|
||||
|
||||
chiplet_offset = (2 << 27) | 0x20000
|
||||
(< 0x8000_0000 -> IOCPU region)
|
||||
```
|
||||
|
||||
### A.5 IOCHIPLET -- UAL region, offset=4 GB
|
||||
|
||||
```text
|
||||
sip_id = 0 -> [50:47] = 0
|
||||
die_id = 16 -> [46:42] = 10000 (IOCHIPLET[0])
|
||||
chiplet_offset = 0x1_0000_0000 (4 GB >= 2 GB -> UAL region)
|
||||
```
|
||||
|
||||
## Links
|
||||
|
||||
- SPEC.md: R1 (routing), R3 (configurable topology), R4 (DI-first), R5 (multi-domain comm)
|
||||
- SPEC.md: R1 (routing), R3 (configurable topology), R4 (DI-first),
|
||||
R5 (multi-domain comm)
|
||||
- ADR-0031: Superseded
|
||||
|
||||
Reference in New Issue
Block a user