Establish English as the canonical ADR language with Korean translations held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror). Promotion from adr-proposed/ to adr/ now writes English to adr/ and the Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md. - Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English, 2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix dropped). ADR-0023 EN regenerated against KO source which had newer HW Realization Notes (D16-D23) section. - docs/adr-history/ left frozen by design (transitional state). - CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline section covering bidirectional sync, conflict resolution (EN wins), and proposed-language freedom. - tools/verify_adr_lang_pairs.py: new verification tool checking pair completeness, filename mirroring, ADR-ID match, Status byte-equality. Pre-commit hook intentionally not added; run on demand or in CI. - tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF normalization, em-dash title separator, underscore-slug edge case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
ADR-0026: DPPolicy = Intra-Device Only — remove sip/num_sips fields
Status
Accepted (Revision 5 — Phase 2 landed 2026-04-14, 523 passed + 1 strict xfail)
Context
Goal
Clarify DPPolicy as a pure intra-device abstraction that only expresses
cube × PE distribution within a single device (SIP). Inter-SIP
distribution (TP) is split into a separate layer (handled by ADR-0024's
torch.ahbm.set_device(rank) or by ADR-0027's Megatron-style parallel
layers).
Decision
D1. Remove sip + num_sips fields from DPPolicy
@dataclass(frozen=True)
class DPPolicy:
"""Intra-device (cube × PE) data-parallel policy.
SIP-level placement is controlled by ``torch.ahbm.set_device(rank)``
(ADR-0024 D3) and, for model-level TP, by Megatron-style parallel
layers (ADR-0027). DPPolicy does not cross SIP boundaries.
"""
cube: Literal["replicate", "column_wise", "row_wise"] = "replicate"
pe: Literal["replicate", "column_wise", "row_wise"] = "replicate"
num_pes: int | None = None
num_cubes: int | None = None
Removed fields: sip, num_sips.
D2. ShardSpec — structural (sip, cube, pe) coordinates, pe_index fully removed
The current ShardSpec.pe_index is a global flat index
(sip × cubes × pes + cube × pes + pe). This is the form ADR-0024 D4
flagged as "abstraction leakage".
This ADR redefines ShardSpec in structural coordinates and does
not even leave pe_index as a property:
# src/kernbench/policy/placement/dp.py (after)
@dataclass(frozen=True)
class ShardSpec:
"""Structural shard placement — intra-SIP (cube × PE) coord.
Global-flat `pe_index` was removed in ADR-0026. Callers must use
structural coords (sip, cube, pe) directly. If a flat integer key is
needed (e.g. dict lookup), compute it explicitly at the call site.
"""
sip: int # structural — which SIP this shard lives on
cube: int # local within SIP
pe: int # local within cube
offset_bytes: int
nbytes: int
Core principle:
- The identity of ShardSpec is the
(sip, cube, pe)3-tuple. - No
pe_indexproperty either — blocks silent semantics drift. - Existing callers expecting global-flat get an immediate
AttributeErroron.pe_indexaccess → forced migration to structural coordinates. - Local contexts that genuinely need a flat integer key (e.g. internal
dict lookup) explicitly compute
spec.sip * N_CUBES * N_PE + spec.cube * N_PE + spec.peat the call site.
Justification for removing the property: KernBench is an internal project with a limited number of call sites. Explicit breakage (AttributeError) is much safer than the risk of silent drift (semantics change while the type stays int).
D3. resolve_dp_policy takes target_sip and produces structural coordinates
Implements the contract of ADR-0024 D4. No post-hoc shifting.
# src/kernbench/policy/placement/dp.py (after)
@dataclass(frozen=True)
class _LocalPeShard:
"""Internal — return value of the PE resolver. Cube-local PE id + payload."""
local_pe: int # cube-local PE index (0..num_pe-1)
offset_bytes: int
nbytes: int
def resolve_dp_policy(
policy: DPPolicy,
*,
shape: tuple[int, int],
itemsize: int,
num_pe: int,
num_cubes: int = 1,
target_sip: int, # NEW — explicitly state which SIP to place on
) -> list[ShardSpec]:
"""2-level resolution (cube × PE) on a specified SIP.
Returns ShardSpecs with structural coords (sip=target_sip, cube, pe).
No SIP-level split — DPPolicy is intra-device only.
"""
resolver = _PE_RESOLVERS[policy.pe]
all_shards: list[ShardSpec] = []
# Level 1: cube within SIP
cube_splits = _split_shape(policy.cube, shape, num_cubes, itemsize)
for cube_id, (cube_shape, cube_offset) in enumerate(cube_splits):
# Level 2: PE within cube — resolver returns _LocalPeShard (local_pe)
local_shards = resolver(shape=cube_shape, itemsize=itemsize,
num_pe=num_pe)
for ls in local_shards:
all_shards.append(ShardSpec(
sip=target_sip, # from caller (current_device)
cube=cube_id, # local within SIP
pe=ls.local_pe, # local within cube (explicit name)
offset_bytes=cube_offset + ls.offset_bytes,
nbytes=ls.nbytes,
))
return all_shards
Internal resolvers (column_wise, row_wise, replicate) return a
list of _LocalPeShard — the local_pe field name makes it explicit
that this is a "cube-local PE identifier". This resolves the previous
confusion with the name ShardSpec.pe_index.
Naming convention summary (whole ADR):
ShardSpec.pe: the final external API — cube-local PE (structural coord)_LocalPeShard.local_pe: the same meaning at the internal resolver stagepe_index: removed. Not retained anywhere, internal or external (additional benefit of preventing silent drift: the name does not reappear).
D4. _create_tensor — placement directly in structural coordinates
Continuation of ADR-0024 D4. Post-hoc shifting removed; structural
coordinates are specified directly at the resolve_dp_policy call site.
# context.py _create_tensor (after)
current_sip = self.ahbm.current_device()
if current_sip is None:
# Single-driver fallback (consistent with ADR-0024 D2).
# In launcher-based code, forgetting set_device() silently sticks the
# tensor on SIP 0 — emit a warning in debug mode.
if os.environ.get("KERNBENCH_DEBUG"):
import warnings
warnings.warn(
"torch.ahbm.current_device() is None; defaulting to SIP 0. "
"If this is a multi-rank launcher context, you likely forgot "
"torch.ahbm.set_device(rank) inside the worker.",
stacklevel=2,
)
current_sip = 0
placement = resolve_dp_policy(
dp,
shape=shape_2d,
itemsize=itemsize,
num_pe=eff_num_pe,
num_cubes=eff_num_cubes,
target_sip=current_sip, # ← structural coord specified up front
)
# Each ShardSpec in placement already carries (sip=current_sip, cube=local, pe=local).
# The old post-hoc shifting block is removed entirely.
Every tensor is placed on the current device's SIP. If you need a multi-SIP tensor, use the TP primitive of ADR-0027.
Trade-off of the single-driver fallback: When set_device is not
called, defaulting to SIP 0 is kept for compatibility with existing
single-driver tests. With KERNBENCH_DEBUG=1, a warning is emitted so
that accidentally omitting set_device in a launcher context — which would
silently place the tensor on the wrong SIP — can be detected.
D5. Downstream — allocator lookup by structural tuple key
Existing deploy_tensor (src/kernbench/runtime_api/tensor.py):
for spec in placement:
alloc = allocators[spec.pe_index] # ← AttributeError (property removed)
With pe_index gone, migration to structural coordinates is forced:
for spec in placement:
alloc = allocators[(spec.sip, spec.cube, spec.pe)]
The dict population in _ensure_allocators is also tuple-keyed:
# context.py _ensure_allocators (after)
for sip_id in sip_range:
for cube_id in range(cubes_per_sip):
for pe_id in range(pes_per_cube):
self._allocators[(sip_id, cube_id, pe_id)] = PEMemAllocator(
rack_id=0, sip_id=sip_id, cube_id=cube_id, pe_id=pe_id, cfg=cfg,
)
_free_tensor is the same: the old
flat_idx = sip * ... + cube * ... + pe computation block is removed,
and (shard.sip, shard.cube, shard.pe) is used directly.
Tuple vs dataclass PEIdentity: Recommend the tuple — it is simple
and hashable out of the box. A PEIdentity value object has the upside
of an explicit type, but the boilerplate is large and it is currently
the only key of the allocator dict, so it would be over-engineering.
Keep the tuple.
D7. Backward compatibility — none (cleanup ADR)
This ADR is a breaking change.
DPPolicy(sip=...)orDPPolicy(num_sips=...)→TypeErrorShardSpec.pe_indexaccess →AttributeError
Both are immediate, explicit breakage. No deprecation warning / fallback path. KernBench is an internal project with a bounded set of call sites, so migration happens in one pass.
Blocking silent drift is the main upside of fully removing the property: code that expected a global flat could otherwise silently receive a SIP-local result and index incorrectly — that possibility is eliminated.
Dependencies
- ADR-0024 (launcher):
set_device(rank)and current-device scoping provide the SIP placement mechanism. This ADR sits on top and narrows DPPolicy to pure intra-device. - ADR-0027 (Megatron TP): the alternative path when a tensor spans multiple SIPs. After this ADR is applied, multi-SIP use cases move to ADR-0027.
Non-goals
- Redesign of
DPPolicy.cube/pe: existing replicate/column_wise/row_wise semantics are kept. - Tiling policy consolidation:
tiled_column_major/tiled_row_majorstay as they are. - New multi-device tensor abstraction: a DTensor-like is ADR-0028.
Open questions
- Default value of current_sip in
_create_tensor: for calls without set_device, whether to fall back to rank=0 (SIP 0) or to raise an error. The recommendation is fallback (compatibility with existing single-driver tests). - Scope of
test_sip_parallel.pyrewrite: porting the existing unit tests to the launcher base while preserving their intent requires additional fixtures. Scoped as separate work. - Meaning of
num_sips=NoneonDPPolicy: once the field is gone, the concept ofnum_sipsdisappears entirely. The explicit answer for expressing multi-SIP is to use the TP primitive of ADR-0027.
Resolved (items that were open in earlier revs):
Whether to keep the→ fully removed (D2)ShardSpec.pe_indexpropertyForm of→ tuple_ensure_allocatorsdict key(sip, cube, pe)(D5)
Consequences
Positive
- Clean conceptual separation: DPPolicy = intra-device, TP = inter-device.
- API simplification: about a 33% reduction in DPPolicy constructor fields.
- Structural-coordinate consistency: ShardSpec is expressed as a
(sip, cube, pe)tuple → abstraction leakage resolved (the ADR-0024 D4 contract is satisfied). - Clear meaning of
pe_index: the single interpretation is SIP-local. If global-flat is needed, it must be made explicit. - Launcher-model consistency: ADR-0024's "1 worker per SIP" model is the sole SIP-boundary control mechanism.
Negative
- Breaking change (explicit):
DPPolicy(sip=...)→TypeError,spec.pe_index→AttributeError. All callers need to be fixed at once. - ShardSpec schema change: a single
pe_indexfield becomes three fieldssip/cube/pe. Cascading edits downstream (deploy_tensor,_free_tensor,_ensure_allocators,allocatorsdict key, etc.). - No silent drift: with the property fully removed, runtime failure is immediate → migration leakage is blocked at the source. (Not a negative but an explicit tradeoff.)
- The cost of rewriting
test_sip_parallel.py.
Neutral
- The meaning of the existing
cube/pefields is unchanged.