ADR: bilingual structure — EN canonical in adr/, KO mirror in adr-ko/

Establish English as the canonical ADR language with Korean translations
held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror).
Promotion from adr-proposed/ to adr/ now writes English to adr/ and the
Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md.

- Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English,
  2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix
  dropped). ADR-0023 EN regenerated against KO source which had newer
  HW Realization Notes (D16-D23) section.
- docs/adr-history/ left frozen by design (transitional state).
- CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark
  docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline
  section covering bidirectional sync, conflict resolution (EN wins),
  and proposed-language freedom.
- tools/verify_adr_lang_pairs.py: new verification tool checking pair
  completeness, filename mirroring, ADR-ID match, Status byte-equality.
  Pre-commit hook intentionally not added; run on demand or in CI.
- tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF
  normalization, em-dash title separator, underscore-slug edge case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 01:38:44 -07:00
parent 687c98086d
commit a796c1d2f7
42 changed files with 10515 additions and 3422 deletions
+130 -103
View File
@@ -1,4 +1,4 @@
# ADR-0026: DPPolicy = Intra-Device Only — sip/num_sips 필드 제거
# ADR-0026: DPPolicy = Intra-Device Only — remove sip/num_sips fields
## Status
@@ -6,16 +6,17 @@ Accepted (Revision 5 — Phase 2 landed 2026-04-14, 523 passed + 1 strict xfail)
## Context
### 목표
### Goal
`DPPolicy`를 **한 device(SIP) 내부의 cube × PE 분산**만 표현하는 순수한
intra-device 추상화로 명확화한다. SIP 간 분산(TP)은 별도 레이어로 분리
(ADR-0024의 `torch.ahbm.set_device(rank)` 또는 ADR-0027의 Megatron parallel
layers가 담당).
Clarify `DPPolicy` as a pure intra-device abstraction that only expresses
**cube × PE distribution within a single device (SIP)**. Inter-SIP
distribution (TP) is split into a separate layer (handled by ADR-0024's
`torch.ahbm.set_device(rank)` or by ADR-0027's Megatron-style parallel
layers).
## Decision
### D1. `DPPolicy`에서 `sip` + `num_sips` 필드 제거
### D1. Remove `sip` + `num_sips` fields from `DPPolicy`
```python
@dataclass(frozen=True)
@@ -32,15 +33,16 @@ class DPPolicy:
num_cubes: int | None = None
```
제거되는 필드: `sip`, `num_sips`.
Removed fields: `sip`, `num_sips`.
### D2. `ShardSpec` — structural (sip, cube, pe) 좌표, `pe_index` 완전 제거
### D2. `ShardSpec` — structural (sip, cube, pe) coordinates, `pe_index` fully removed
현재 `ShardSpec.pe_index` **global flat index** (`sip × cubes × pes + cube ×
pes + pe`). 이는 ADR-0024 D4이 "abstraction leakage"로 지적한 형태.
The current `ShardSpec.pe_index` is a **global flat index**
(`sip × cubes × pes + cube × pes + pe`). This is the form ADR-0024 D4
flagged as "abstraction leakage".
본 ADR에서 ShardSpec을 **structural 좌표로 재정의**하고, `pe_index`
property로도 **남기지 않는다**:
This ADR **redefines ShardSpec in structural coordinates** and **does
not even leave `pe_index` as a property**:
```python
# src/kernbench/policy/placement/dp.py (after)
@@ -59,28 +61,32 @@ class ShardSpec:
nbytes: int
```
**핵심 원칙**:
- ShardSpec의 정체성은 `(sip, cube, pe)` 3튜플.
- **`pe_index` property도 없음** — silent semantics drift 차단.
- Global flat을 기대한 기존 호출자는 `.pe_index` 접근 시 **즉시
`AttributeError`** → 반드시 구조적 좌표로 migration.
- Flat integer key가 필요한 국소 문맥 (예: 내부 dict lookup)은 호출자가
명시적으로 `spec.sip * N_CUBES * N_PE + spec.cube * N_PE + spec.pe`를 계산.
**Core principle**:
- The identity of ShardSpec is the `(sip, cube, pe)` 3-tuple.
- **No `pe_index` property either** — blocks silent semantics drift.
- Existing callers expecting global-flat get an **immediate
`AttributeError`** on `.pe_index` access → forced migration to
structural coordinates.
- Local contexts that genuinely need a flat integer key (e.g. internal
dict lookup) explicitly compute
`spec.sip * N_CUBES * N_PE + spec.cube * N_PE + spec.pe` at the call
site.
**Property 제거 정당화**: KernBench는 사내 프로젝트로 call site가 한정되어
있음. Silent drift 위험 (의미만 바뀌고 타입은 같은 int) 대비 explicit breakage
(AttributeError)가 훨씬 안전.
**Justification for removing the property**: KernBench is an internal
project with a limited number of call sites. Explicit breakage
(AttributeError) is much safer than the risk of silent drift (semantics
change while the type stays int).
### D3. `resolve_dp_policy` `target_sip`을 받아 structural 좌표 생성
### D3. `resolve_dp_policy` takes `target_sip` and produces structural coordinates
ADR-0024 D4의 계약 구현. Post-hoc shifting 없음.
Implements the contract of ADR-0024 D4. No post-hoc shifting.
```python
# src/kernbench/policy/placement/dp.py (after)
@dataclass(frozen=True)
class _LocalPeShard:
"""Internal — PE resolver의 반환. Cubelocal PE 식별자 + payload."""
"""Internal — return value of the PE resolver. Cube-local PE id + payload."""
local_pe: int # cube-local PE index (0..num_pe-1)
offset_bytes: int
nbytes: int
@@ -93,7 +99,7 @@ def resolve_dp_policy(
itemsize: int,
num_pe: int,
num_cubes: int = 1,
target_sip: int, # NEW — 어느 SIP에 배치할지 명시
target_sip: int, # NEW — explicitly state which SIP to place on
) -> list[ShardSpec]:
"""2-level resolution (cube × PE) on a specified SIP.
@@ -123,28 +129,30 @@ def resolve_dp_policy(
return all_shards
```
**내부 resolver** (`column_wise`, `row_wise`, `replicate`)`_LocalPeShard`
리스트 반환 — `local_pe` 필드명으로 **"cube-local PE identifier"임이 명시적**.
과거 `ShardSpec.pe_index`와 이름이 혼동되던 문제 해소.
**Internal resolvers** (`column_wise`, `row_wise`, `replicate`) return a
list of `_LocalPeShard` — the `local_pe` field name makes it **explicit
that this is a "cube-local PE identifier"**. This resolves the previous
confusion with the name `ShardSpec.pe_index`.
**이름 규약 정리** (전체 ADR):
- `ShardSpec.pe`: 최종 외부 API — cube-local PE (structural coord)
- `_LocalPeShard.local_pe`: 내부 resolver 단계의 동일 의미
- `pe_index`: **제거**. 외부/내부 어디에도 남기지 않는다 (silent drift 차단의
부가 효과: 이름 재등장 없음).
**Naming convention summary** (whole ADR):
- `ShardSpec.pe`: the final external API — cube-local PE (structural coord)
- `_LocalPeShard.local_pe`: the same meaning at the internal resolver stage
- `pe_index`: **removed**. Not retained anywhere, internal or external
(additional benefit of preventing silent drift: the name does not
reappear).
### D4. `_create_tensor` — 구조적 좌표로 직접 placement
### D4. `_create_tensor` — placement directly in structural coordinates
ADR-0024 D4 연속선. Post-hoc shifting 제거, 구조적 좌표를 `resolve_dp_policy`
호출 시점에 직접 지정.
Continuation of ADR-0024 D4. Post-hoc shifting removed; structural
coordinates are specified directly at the `resolve_dp_policy` call site.
```python
# context.py _create_tensor (after)
current_sip = self.ahbm.current_device()
if current_sip is None:
# Single-driver fallback (ADR-0024 D2와 일관).
# Launcher 기반 코드가 set_device()를 빼먹으면 조용히 SIP 0에 박히는
# 문제가 있음 → debug mode에서 경고.
# Single-driver fallback (consistent with ADR-0024 D2).
# In launcher-based code, forgetting set_device() silently sticks the
# tensor on SIP 0 — emit a warning in debug mode.
if os.environ.get("KERNBENCH_DEBUG"):
import warnings
warnings.warn(
@@ -161,38 +169,39 @@ placement = resolve_dp_policy(
itemsize=itemsize,
num_pe=eff_num_pe,
num_cubes=eff_num_cubes,
target_sip=current_sip, # ← 구조적 좌표 일차 지정
target_sip=current_sip, # ← structural coord specified up front
)
# placement의 각 ShardSpec은 이미 (sip=current_sip, cube=local, pe=local) 포함.
# 과거의 post-hoc shifting 블록은 완전히 제거.
# Each ShardSpec in placement already carries (sip=current_sip, cube=local, pe=local).
# The old post-hoc shifting block is removed entirely.
```
**모든** 텐서가 current device SIP에 배치됨. Multi-SIP 텐서를 만들고 싶으면
ADR-0027의 TP primitive 사용.
**Every** tensor is placed on the current device's SIP. If you need a
multi-SIP tensor, use the TP primitive of ADR-0027.
**Single-driver fallback의 trade-off**: set_device 없는 호출에서 SIP 0으로
default는 기존 single-driver 테스트 호환을 위해 유지. `KERNBENCH_DEBUG=1`
환경에서는 launcher 컨텍스트의 실수로 set_device 누락 시 조용히 잘못된 SIP에
배치되는 것을 감지할 수 있도록 warning.
**Trade-off of the single-driver fallback**: When set_device is not
called, defaulting to SIP 0 is kept for compatibility with existing
single-driver tests. With `KERNBENCH_DEBUG=1`, a warning is emitted so
that accidentally omitting set_device in a launcher context — which would
silently place the tensor on the wrong SIP — can be detected.
### D5. Downstream — allocator lookup은 구조적 tuple key
### D5. Downstream — allocator lookup by structural tuple key
기존 `deploy_tensor` (`src/kernbench/runtime_api/tensor.py`):
Existing `deploy_tensor` (`src/kernbench/runtime_api/tensor.py`):
```python
for spec in placement:
alloc = allocators[spec.pe_index] # ← AttributeError (property 제거됨)
alloc = allocators[spec.pe_index] # ← AttributeError (property removed)
```
`pe_index`가 없어졌으므로 구조적 좌표로 **강제** migration:
With `pe_index` gone, migration to structural coordinates is **forced**:
```python
for spec in placement:
alloc = allocators[(spec.sip, spec.cube, spec.pe)]
```
`_ensure_allocators`의 dict population도 tuple key:
The dict population in `_ensure_allocators` is also tuple-keyed:
```python
# context.py _ensure_allocators (after)
@@ -204,59 +213,71 @@ for sip_id in sip_range:
)
```
`_free_tensor`도 동일: 기존 `flat_idx = sip * ... + cube * ... + pe` 계산
블록 제거, `(shard.sip, shard.cube, shard.pe)` 직접 사용.
`_free_tensor` is the same: the old
`flat_idx = sip * ... + cube * ... + pe` computation block is removed,
and `(shard.sip, shard.cube, shard.pe)` is used directly.
**Tuple vs dataclass `PEIdentity`**: Tuple이 단순하고 hashable로 바로 써서
권고. `PEIdentity` 값객체는 명시적 타입 장점은 있지만 boilerplate가 크고 현재
allocator dict의 유일한 key라 오버엔지니어링. Tuple 유지.
**Tuple vs dataclass `PEIdentity`**: Recommend the tuple — it is simple
and hashable out of the box. A `PEIdentity` value object has the upside
of an explicit type, but the boilerplate is large and it is currently
the only key of the allocator dict, so it would be over-engineering.
Keep the tuple.
### D7. 하위 호환 — 불가 (cleanup ADR)
### D7. Backward compatibility — none (cleanup ADR)
이 ADR은 **breaking change**.
This ADR is a **breaking change**.
1. `DPPolicy(sip=...)` 또는 `DPPolicy(num_sips=...)` 호출`TypeError`
2. `ShardSpec.pe_index` 접근`AttributeError`
1. `DPPolicy(sip=...)` or `DPPolicy(num_sips=...)``TypeError`
2. `ShardSpec.pe_index` access`AttributeError`
모두 **즉시 명시적 breakage**. Deprecation warning / fallback 경로 없음.
KernBench는 사내 프로젝트로 call site가 한정되어 있어 한 번에 migration.
Both are **immediate, explicit breakage**. No deprecation warning /
fallback path. KernBench is an internal project with a bounded set of
call sites, so migration happens in one pass.
**Silent drift 차단**이 property 완전 제거의 주된 이점: global flat을 기대한
코드가 SIP-local 결과를 받아 조용히 잘못된 인덱싱을 할 가능성 제거.
**Blocking silent drift** is the main upside of fully removing the
property: code that expected a global flat could otherwise silently
receive a SIP-local result and index incorrectly — that possibility is
eliminated.
## Dependencies
- **ADR-0024** (launcher): `set_device(rank)` current-device scoping
SIP 배치 메커니즘 제공. 본 ADR은 그 위에 서서 DPPolicy를 순수 intra-device로
좁힘.
- **ADR-0027** (Megatron TP): 다중 SIP에 걸친 텐서가 필요한 경우의 대안 경로.
이 ADR 적용 후 multi-SIP use case는 ADR-0027로 이관.
- **ADR-0024** (launcher): `set_device(rank)` and current-device scoping
provide the SIP placement mechanism. This ADR sits on top and narrows
DPPolicy to pure intra-device.
- **ADR-0027** (Megatron TP): the alternative path when a tensor spans
multiple SIPs. After this ADR is applied, multi-SIP use cases move to
ADR-0027.
---
## Non-goals
- **`DPPolicy.cube` / `pe` 재설계**: 기존 replicate/column_wise/row_wise 의미
유지.
- **Tiling 정책 통합**: `tiled_column_major` / `tiled_row_major`는 그대로.
- **Multi-device 텐서 추상화 신규**: DTensor-like는 ADR-0028.
- **Redesign of `DPPolicy.cube` / `pe`**: existing
replicate/column_wise/row_wise semantics are kept.
- **Tiling policy consolidation**: `tiled_column_major` /
`tiled_row_major` stay as they are.
- **New multi-device tensor abstraction**: a DTensor-like is ADR-0028.
---
## Open questions
- **`_create_tensor`의 current_sip 기본값**: set_device 없는 호출에서 rank=0
(SIP 0)로 fallback할지, 아니면 error 낼지. 권고는 fallback (기존 single-driver
테스트와의 호환).
- **`test_sip_parallel.py` 재작성 범위**: 기존 단위 테스트의 의도를 유지하며
launcher 기반으로 옮기려면 추가 fixture 필요. 별도 작업으로 scope.
- **`DPPolicy``num_sips=None` 의미**: 필드가 없어지면 `num_sips` 개념 자체가
사라짐. Multi-SIP을 표현하고 싶으면 ADR-0027의 TP primitive를 쓰라는 것이
명시적 답.
- **Default value of current_sip in `_create_tensor`**: for calls without
set_device, whether to fall back to rank=0 (SIP 0) or to raise an
error. The recommendation is fallback (compatibility with existing
single-driver tests).
- **Scope of `test_sip_parallel.py` rewrite**: porting the existing unit
tests to the launcher base while preserving their intent requires
additional fixtures. Scoped as separate work.
- **Meaning of `num_sips=None` on `DPPolicy`**: once the field is gone,
the concept of `num_sips` disappears entirely. The explicit answer for
expressing multi-SIP is to use the TP primitive of ADR-0027.
**Resolved (이전 rev에서 open이었던 것들)**:
- ~~`ShardSpec.pe_index` property 존치 여부~~ → **완전 제거** (D2)
- ~~`_ensure_allocators` dict key 형식~~ → **tuple `(sip, cube, pe)`** (D5)
**Resolved (items that were open in earlier revs)**:
- ~~Whether to keep the `ShardSpec.pe_index` property~~ → **fully
removed** (D2)
- ~~Form of `_ensure_allocators` dict key~~ → **tuple `(sip, cube, pe)`**
(D5)
---
@@ -264,25 +285,31 @@ KernBench는 사내 프로젝트로 call site가 한정되어 있어 한 번에
### Positive
- **개념 분리 명확**: DPPolicy = intra-device, TP = inter-device.
- **API 단순화**: DPPolicy 생성자 필드 ~33% 축소.
- **Structural 좌표 일관성**: ShardSpec이 `(sip, cube, pe)` 튜플로 표현 →
abstraction leakage 해소 (ADR-0024 D4 계약 충족).
- **`pe_index` 의미 명확**: SIP-local이 단일 해석. Global flat이 필요하면 명시.
- **Launcher 모델 일관성**: ADR-0024의 "1 worker per SIP" 모델이 유일한 SIP
경계 제어 메커니즘.
- **Clean conceptual separation**: DPPolicy = intra-device, TP =
inter-device.
- **API simplification**: about a 33% reduction in DPPolicy constructor
fields.
- **Structural-coordinate consistency**: ShardSpec is expressed as a
`(sip, cube, pe)` tuple → abstraction leakage resolved (the ADR-0024
D4 contract is satisfied).
- **Clear meaning of `pe_index`**: the single interpretation is
SIP-local. If global-flat is needed, it must be made explicit.
- **Launcher-model consistency**: ADR-0024's "1 worker per SIP" model is
the sole SIP-boundary control mechanism.
### Negative
- **Breaking change (explicit)**: `DPPolicy(sip=...)``TypeError`,
`spec.pe_index``AttributeError`. 모든 호출자 한 번에 수정 필요.
- **ShardSpec schema 변경**: `pe_index` 단일 필드 → `sip`/`cube`/`pe` 세 필드.
Downstream (`deploy_tensor`, `_free_tensor`, `_ensure_allocators`,
`allocators` dict key 등) 연쇄 수정.
- **Silent drift 없음**: property 완전 제거로 runtime에서 즉시 실패 →
migration leakage 원천 차단. (Negative가 아니라 explicit tradeoff)
- `test_sip_parallel.py` 재작성 비용.
`spec.pe_index``AttributeError`. All callers need to be fixed at
once.
- **ShardSpec schema change**: a single `pe_index` field becomes three
fields `sip`/`cube`/`pe`. Cascading edits downstream (`deploy_tensor`,
`_free_tensor`, `_ensure_allocators`, `allocators` dict key, etc.).
- **No silent drift**: with the property fully removed, runtime failure
is immediate → migration leakage is blocked at the source. (Not a
negative but an explicit tradeoff.)
- The cost of rewriting `test_sip_parallel.py`.
### Neutral
- 기존 `cube` / `pe` 필드 의미 불변.
- The meaning of the existing `cube` / `pe` fields is unchanged.