Establish English as the canonical ADR language with Korean translations held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror). Promotion from adr-proposed/ to adr/ now writes English to adr/ and the Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md. - Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English, 2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix dropped). ADR-0023 EN regenerated against KO source which had newer HW Realization Notes (D16-D23) section. - docs/adr-history/ left frozen by design (transitional state). - CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline section covering bidirectional sync, conflict resolution (EN wins), and proposed-language freedom. - tools/verify_adr_lang_pairs.py: new verification tool checking pair completeness, filename mirroring, ADR-ID match, Status byte-equality. Pre-commit hook intentionally not added; run on demand or in CI. - tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF normalization, em-dash title separator, underscore-slug edge case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 KiB
ADR-0001: 51-bit Physical Address Layout & Decoding Contract
Status
Accepted (Revision 2 — 2026-04-27: concrete bit layout, rack_id removal, Tray->SIP / SIP->DIE renaming, PE/MCPU/IOCPU sub-unit tables. Supersedes ADR-0031.)
Date
2026-04-27 (original: 2026-02-27)
Context
KernBench requires a stable, parsable physical address scheme that:
- can be decoded into routing domains (SIP / die / HBM / PE-resource / IOCPU)
- remains topology-agnostic (no hardcoded counts)
- supports swappable policy and DI-first components
- covers multiple SIPs, AHBM dies, and IO chiplet dies in a unified space
History
- Original ADR-0001 defined a 51-bit layout with
rack_id(4) + sip_id(4) + sip_seg(5) + local_offset(38).rack_idwas never used in practice. - ADR-0031 (stub) requested PE-resource range partition but was never implemented.
Revision 2 removes rack_id, renames sip_seg -> die_id, and provides
concrete sub-unit tables for PE, MCPU, CUBE_SRAM, and IOCPU resources.
ADR-0031 is superseded.
Decision
We define a PhysAddr value object and an address decoding contract that converts an integer address into routing domains.
D1. PhysAddr is an immutable value object
- PhysAddr is immutable and comparable as a pure value.
- Any allocator returns a fully specified PhysAddr (not partial metadata).
- No global state may be required to interpret a PhysAddr.
D2. 51-bit Physical Address Layout
A 51-bit physical address is adopted.
2.1 Top-Level Address Map
[50:47] sip_id (4) -- 16 SIPs
[46:42] die_id (5) -- 32 dies per SIP
[41: 0] local_offset (42) -- 4 TB per die
50 47 46 42 41 0
+---------+----------+-------------------------+
| sip_id | die_id | local_offset |
+---------+----------+-------------------------+
2.2 die_id Allocation
| die_id | Meaning |
|---|---|
| 0..15 | AHBM dies |
| 16..20 | IOCHIPLET dies |
| 21..31 | Reserved |
2.3 AHBM Die Layout
Only lower 256 GB of the 4 TB die-local window is assigned.
[41:38] MBZ (4)
[37] addr_space (1) -- 0 = local resource, 1 = HBM memory
[36: 0] sub-address (37)
| addr_space | Meaning |
|---|---|
| 0 | Local resource |
| 1 | HBM memory |
2.3.1 HBM Window (addr_space = 1)
[36:0] hbm_offset (37) -- 128 GB decode window
The architectural decode window is fixed at 128 GB. Implemented capacity may be smaller depending on SKU/topology (see D4).
2.3.2 Resource Window (addr_space = 0)
[36:34] resource_kind (3)
[33: 0] kind_local (34) -- 16 GB per kind
| resource_kind | Meaning |
|---|---|
| 000 | PE_LOCAL |
| 001 | MCPU_LOCAL |
| 010 | CUBE_SRAM |
| 011..111 | Reserved |
Each kind gets a 16 GB decode region.
2.3.3 PE_LOCAL (resource_kind = 000)
[33] MBZ (1)
[32:29] pe_id (4) -- 0..15
[28:25] pe_sub_unit (4)
[24: 0] sub_offset (25) -- 32 MB per slot
16 PEs x 16 sub-unit slots x 32 MB = 8 GB active decode.
| pe_sub_unit | Name | Budget |
|---|---|---|
| 0 | PE_CPU_DTCM | 8 KB |
| 1 | MATH_ENGINE_DTCM | 8 KB |
| 2 | IPCQ | 256 KB |
| 3 | PE_CPU_SFR | 16 KB |
| 4 | MATH_ENGINE_SFR | 16 KB |
| 5 | DMA_ENGINE_SFR | 192 KB |
| 6 | PE_TCM | 2 MB |
| 7..15 | Reserved | -- |
2.3.4 MCPU_LOCAL (resource_kind = 001)
[33:30] MBZ (4)
[29:25] mcpu_sub_unit (5)
[24: 0] sub_offset (25) -- 32 MB per slot
1 GB active decode.
| mcpu_sub_unit | Name | Budget |
|---|---|---|
| 0 | MCPU_ITCM | 512 KB |
| 1 | MCPU_DTCM | 512 KB |
| 2 | IPCQ | 256 KB |
| 3 | MCPU_SFR | 8 KB |
| 4 | MCPU_DMA_SFR | 16 KB |
| 5 | MCPU_SRAM | 10 MB |
| 6..31 | Reserved | -- |
2.3.5 CUBE_SRAM (resource_kind = 010)
[33:25] MBZ (9)
[24: 0] sram_offset (25) -- flat 32 MB
2.4 IOCHIPLET Die Layout
Only lower 1 TB of the 4 TB die-local window is assigned.
[41:40] MBZ (2)
[39: 0] chiplet_offset (40) -- 1 TB
Region split by address range:
| Range | Meaning | Decode condition |
|---|---|---|
| [0, 2 GB) | IOCPU resource | chiplet_offset < 0x8000_0000 |
| [2 GB, 1 TB) | UAL | chiplet_offset >= 0x8000_0000 |
2.4.1 IOCPU Region
[30:27] iocpu_sub_unit (4)
[26: 0] sub_offset (27) -- 128 MB per slot
16 x 128 MB slots. 2 GB active decode.
| iocpu_sub_unit | Name | Budget |
|---|---|---|
| 0 | IOCPU_ITCM | 512 KB |
| 1 | IOCPU_DTCM | 512 KB |
| 2 | IPCQ | 2 MB |
| 3 | IOCPU_SFR | 8 KB |
| 4 | IO_DMA_SFR | 16 KB |
| 5 | IO_SRAM | 64 MB |
| 6..15 | Reserved | -- |
2.4.2 UAL Region
Sub-layout TBD (separate ADR).
2.5 Addressing Rules
- MBZ bits must be zero. An address with non-zero MBZ bits is architecturally invalid. Implementation may raise a decode fault or return an error -- behavior is not prescribed by this ADR.
- Fixed slot sizes are chosen for simple hardware decode; actual implemented capacity may be smaller than the slot.
- Access beyond a sub-unit's implemented budget within a slot is architecturally invalid (same policy as MBZ).
D3. Bitfield decoding is deterministic
Given an integer address, field extraction (sip_id, die_id, kind,
sub_unit, offset) is purely positional. No runtime state is required.
Decoding deterministically maps an integer address to destination domains:
sip_id, die_id, target kind (HBM / PE_LOCAL / MCPU_LOCAL / CUBE_SRAM /
IOCPU / UAL).
D4. Capacity validation may depend on topology config
Whether a decoded address falls within implemented capacity (e.g., HBM 96 GB on a specific SKU) is checked against topology parameters provided via DI/config. Decode itself (D3) never consults topology -- only validation does. These parameters must live in the topology/config layer, not in node implementations.
D5. Routing consumes decoded domains, not raw bits
Routing policy uses decoded domains:
srclocation (sip / die / pe or node_id)dstdomains derived from PhysAddr decodingsize_bytesfor size-aware link latency
Routing must not inspect raw bit-fields directly except inside the decoding module.
Alternatives Considered
-
Keep
rack_id(4 bits): Rejected -- never used in practice, consumes 4 bits that enable die-local expansion to 42 bits (IOCHIPLET 1 TB). -
Uniform 256 GB per die: Rejected -- IOCHIPLET UAL requires ~1 TB. Freed rack_id bits enable 42-bit local_offset.
-
Variable-width die windows (AHBM 256 GB, CHIPLET 1 TB via multi-seg spanning): Rejected -- complicates D3 (deterministic decoding). Uniform 4 TB window with MBZ padding is simpler.
-
Use raw integers everywhere, decode ad-hoc in routing: Rejected -- leads to duplicated logic, inconsistent routing, and hidden assumptions.
-
Hardcode topology sizes (SIP/CUBE/PE counts) into decoding: Rejected -- violates SPEC R3 and breaks swappability.
-
Put decoding inside memory controllers or routers: Rejected -- leaks policy into components, violates SPEC R4 / D5.
Consequences
Positive
- Simple hierarchical decoder: SIP -> die -> kind -> sub-unit.
- Clean separation of memory (HBM) vs local resource (PE/MCPU/SRAM/IOCPU).
- Deterministic routing domains enable clear test invariants (SPEC R1, R5).
- Expandable: 11 reserved die_id slots, reserved resource_kind / sub-unit slots, reserved MBZ bits.
- DI-first: decoder can be swapped without changing components (SPEC R4).
Tradeoffs
- Sparse address holes due to power-of-2 slot alignment.
- Large reserved/MBZ regions (intentional for future extension).
- Requires explicit configuration for topology-derived sizes (D4).
- Introduces a single "blessed" decoding module that must remain stable and well-tested.
Supersedes
- ADR-0031 (PhysAddr PE-Resource Extension): stub status. The PE_LOCAL / MCPU_LOCAL / CUBE_SRAM sub-unit tables in D2.3.3-D2.3.5 fulfill ADR-0031's stated goals.
Implementation Notes (Non-normative)
- Recommended module:
src/kernbench/policy/address/phyaddr.py - Tests should cover: encode/decode round-trip per kind, MBZ enforcement, die_id dispatch (AHBM / IOCHIPLET / reserved), sub-unit boundary values, backward compatibility of factory APIs.
- Factory methods:
hbm_addr,pe_hbm_addr,pe_tcm_addr,cube_sram_addrretain signatures (minusrack_id);cube_idparameter renamed todie_id. - New factories:
pe_resource_addr,mcpu_resource_addr,iocpu_resource_addr,ual_addr.
Appendix A. Address Examples
A.1 AHBM HBM access
sip=2, die=5, HBM offset=0x1000
sip_id = 2 -> [50:47] = 0b0010
die_id = 5 -> [46:42] = 0b00101
addr_space = 1 -> [37] = 1 (HBM)
hbm_offset = 0x1000 -> [36:0]
51-bit addr = (2 << 47) | (5 << 42) | (1 << 37) | 0x1000
A.2 AHBM PE_LOCAL -- PE3 PE_TCM, offset=0x400
sip_id = 0 -> [50:47] = 0
die_id = 0 -> [46:42] = 0
addr_space = 0 -> [37] = 0
resource_kind = 0 -> [36:34] = 000 (PE_LOCAL)
pe_id = 3 -> [32:29] = 0011
pe_sub_unit = 6 -> [28:25] = 0110 (PE_TCM)
sub_offset = 0x400 -> [24:0]
local_offset = (0 << 34) | (3 << 29) | (6 << 25) | 0x400
A.3 AHBM MCPU_LOCAL -- MCPU_SRAM, offset=0x0
sip_id = 1 -> [50:47] = 0001
die_id = 3 -> [46:42] = 00011
addr_space = 0 -> [37] = 0
resource_kind = 1 -> [36:34] = 001 (MCPU_LOCAL)
mcpu_sub_unit = 5 -> [29:25] = 00101 (MCPU_SRAM)
sub_offset = 0 -> [24:0] = 0
local_offset = (1 << 34) | (5 << 25)
A.4 IOCHIPLET -- IOCPU IPCQ, offset=0x20000
sip_id = 1 -> [50:47] = 0001
die_id = 17 -> [46:42] = 10001 (IOCHIPLET[1])
iocpu_sub_unit = 2 -> [30:27] = 0010 (IPCQ)
sub_offset = 0x20000 -> [26:0]
chiplet_offset = (2 << 27) | 0x20000
(< 0x8000_0000 -> IOCPU region)
A.5 IOCHIPLET -- UAL region, offset=4 GB
sip_id = 0 -> [50:47] = 0
die_id = 16 -> [46:42] = 10000 (IOCHIPLET[0])
chiplet_offset = 0x1_0000_0000 (4 GB >= 2 GB -> UAL region)
Links
- SPEC.md: R1 (routing), R3 (configurable topology), R4 (DI-first), R5 (multi-domain comm)
- ADR-0031: Superseded