ADR: introduce docs/history/, merge 0011+0018, prune migration cruft

- CLAUDE.md: add ADR Lifecycle subsection (superseded → docs/history/, immutable numbering, no renumber) - ADR-0011: merge ADR-0018 content as "Address Model: LA" section alongside PA / VA; status notes VA model is currently implemented - ADR-0018 / 0029 / 0031: moved to docs/history/ with status updates (0018 merged into 0011, 0029 superseded by 0032, 0031 absorbed into 0001 rev 2) - ADR-0019: rewrite Context as PE-HBM connectivity decision (self-contained, no LA model framing) - ADR-0019/0020/0021/0023/0025/0027: Status Proposed → Accepted (code verified) and prune Implementation Notes / Affected files / Test strategy / "현재 상태" sub-sections describing pre-impl state - ADR-0024/0026: same migration-flavor cleanup; 0026 also drops D6 Migration and D8 docs-update sub-decisions - ADR-0030: status simplified (blocker ADR-0031 now superseded) - SPEC.md: R10 + §0.2 reflect PA / VA / LA model names - ADR-0008/0012/0013: refresh ADR-0011 subtitle in Links 21 files changed, 553 insertions(+), 1290 deletions(-). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:42:45 -07:00
parent ecc57d050d
commit 22fd0d2b9d
23 changed files with 553 additions and 1290 deletions
@@ -13,53 +13,6 @@ intra-device 추상화로 명확화한다. SIP 간 분산(TP)은 별도 레이
 (ADR-0024의 `torch.ahbm.set_device(rank)` 또는 ADR-0027의 Megatron parallel
 layers가 담당).

-### 현재 상태
-
-`src/kernbench/policy/placement/dp.py`:
-
-```python
-@dataclass(frozen=True)
-class DPPolicy:
-    sip: Literal["replicate", "column_wise", "row_wise"] = "replicate"
-    cube: Literal["replicate", "column_wise", "row_wise"] = "replicate"
-    pe: Literal["replicate", "column_wise", "row_wise"] = "replicate"
-    num_pes: int | None = None
-    num_cubes: int | None = None
-    num_sips: int | None = None    # ← 제거 대상
-```
-
-`sip` / `num_sips` 필드는 텐서를 SIP 경계 **너머**로 분산하는 경로를 제공함.
-이는:
-
- **ADR-0024의 launcher 모델과 충돌**: ADR-0024는 "rank = SIP = 1 worker per SIP"
-  모델. 각 worker가 자기 SIP에 텐서를 생성. 텐서가 여러 SIP에 걸치는 경우는
-  Megatron-style TP가 개별 primitive로 처리해야 함.
- **사용자 의도와 불일치**: "DPPolicy는 한 디바이스 내에서 PE들로 분산하는 방법"
-  (사용자 진술).
- **개념 혼동**: `DPPolicy.sip="column_wise"`는 실제로 **TP**. 이름이 DP인데
-  하는 일은 TP → 신규 사용자에게 혼란.
-
-### 영향받는 call site (rollback 시점 grep 결과)
-
-**생성 사이트** (`DPPolicy(sip=...` 또는 `num_sips=...`):
- `tests/test_runtime_api_tensor.py`
- `benches/ccl_allreduce.py` (ADR-0024 scope 내에서 이미 개편됨)
- `tests/test_va_offset.py`
- `benches/va_offset_verify.py`
- `tests/test_sip_parallel.py`
-
-**참조 사이트** (`dp.sip`, `policy.sip`, `num_sips` 등):
- `src/kernbench/runtime_api/context.py` (`_create_tensor`, `launch`)
- `src/kernbench/components/builtin/pe_cpu.py`
- `src/kernbench/components/legacy/builtin/pe_cpu.py`
- `src/kernbench/policy/placement/dp.py` (구현 자체)
- `tests/test_tensor.py`, `test_ipcq_types.py`
-
-**핵심 테스트**: `test_sip_parallel.py`는 이름 그대로 "SIP 병렬성을 DPPolicy로
-표현하는" 테스트. 이 ADR 이후 **새 launcher 모델로 재작성** 필요.
-
---
-
 ## Decision

 ### D1. `DPPolicy`에서 `sip` + `num_sips` 필드 제거
@@ -258,66 +211,6 @@ for sip_id in sip_range:
 권고. `PEIdentity` 값객체는 명시적 타입 장점은 있지만 boilerplate가 크고 현재
 allocator dict의 유일한 key라 오버엔지니어링. Tuple 유지.

-### D6. Migration — 기존 call site
-
-**(A) `DPPolicy(sip=..., num_sips=..., ...)` 사용하던 코드**:
-
- `DPPolicy(sip="column_wise", cube=..., pe=...)` 패턴 → **해당 bench를 ADR-0024
-  launcher로 재작성**. worker가 `set_device(rank)`로 SIP 선택, DPPolicy는
-  cube/PE만.
- `DPPolicy(sip="replicate", num_sips=1, ...)` 패턴 → `DPPolicy(cube=..., pe=...)`로
-  축소 (필드가 사라지니 자연스럽게).
-
-**(B) `dp.sip`, `dp.num_sips` 읽던 코드**:
-
- 제거. `launch()`의 `_compute_local_shape`에서 `dp.sip` 분기 삭제.
- `pe_cpu.py`가 `dp.sip`을 참조하던 곳도 정리.
-
-**(C) `ShardSpec.pe_index`를 사용하던 코드 — 전부 수정 필요**:
-
- `.pe_index` 접근은 이제 `AttributeError` 발생 → 모든 call site 수정 필수.
- Allocator lookup: `allocators[spec.pe_index]` →
-  `allocators[(spec.sip, spec.cube, spec.pe)]`
- Flat integer가 꼭 필요한 국소 문맥: `spec.sip * N_CUBES * N_PE + spec.cube *
-  N_PE + spec.pe` 명시적 계산. **국소 변수로만 사용하고 공개 API에 노출하지
-  않는다**.
-
-**구현 착수 전 grep audit 체크리스트**:
-
-1. **Property 참조**:
-   - `\.pe_index\b` — 필드/property 접근 모두 (regex)
-   - `pe_index=` — 생성 시점의 키워드 인자
-   - `pe_index:` — dataclass 필드 선언
-2. **Allocator / dict indexing**:
-   - `allocators\[` — dict lookup 패턴. `allocators[spec.pe_index]` 같은
-     것이 걸리는지
-   - `_allocators\[` — 같은 패턴 (prefix _)
-3. **Flat index 수동 계산 블록**:
-   - `flat_idx =`
-   - `pe_index =` (좌변)
-   - `* pes_per_cube +` (전형적 flat 계산 패턴)
-   - `* self._num_cubes \* self._pes_per_cube` (global flat 계산)
-4. **Serialization / logging**:
-   - `asdict(.*shard` — dataclass 직렬화 시 `pe_index` 자동 포함 여부
-   - `repr(.*ShardSpec` — 로그 포맷에서 의존하는지
-   - JSON/YAML 저장 포맷에서 `pe_index` 키 사용 여부
-5. **Tests asserting integer PE identity**:
-   - `assert .*pe_index` — 정수 동일성 주장
-   - `spec.pe_index ==` — 비교 (SIP-local 의미로 변하면 테스트가 깨질 수 있음)
-
-각 match마다 "이 호출자가 global flat / SIP-local / 내부 lookup 중 무엇을
-기대했나"를 판단한 뒤 구조적 좌표로 교체.
-
-**(D) `test_sip_parallel.py`**:
-
- 이름 유지, 내용은 ADR-0024의 multi-greenlet launcher 기반 재작성.
- "SIP 병렬성 = rank 별 worker × 각자 DPPolicy" 로 검증.
-
-**(E) `test_va_offset.py`, `benches/va_offset_verify.py`**:
-
- `num_sips=1`만 쓰는 경우가 대부분. 단순히 필드 제거.
- SIP offset 테스트가 핵심이면 `set_device(rank)` + 구조적 좌표 관찰로 이식.
-
 ### D7. 하위 호환 — 불가 (cleanup ADR)

 이 ADR은 **breaking change**.
@@ -331,17 +224,6 @@ KernBench는 사내 프로젝트로 call site가 한정되어 있어 한 번에
 **Silent drift 차단**이 property 완전 제거의 주된 이점: global flat을 기대한
 코드가 SIP-local 결과를 받아 조용히 잘못된 인덱싱을 할 가능성 제거.

-### D8. 문서 업데이트
-
- `ADR-0008` (tensor deploy) — DPPolicy 의미 갱신 note, ShardSpec 구조적 좌표
-  전환 명시
- DPPolicy docstring에 "intra-device only" 명시 (D1 코드 스니펫의 docstring)
- ShardSpec docstring에 **structural coordinates `(sip, cube, pe)`를 직접
-  사용하며, `pe_index`는 더 이상 제공되지 않음**을 명시 (D2)
- `docs/ccl-author-guide` 등 튜토리얼에서 `sip=...` 예시 제거
-
---
-
 ## Dependencies

 - **ADR-0024** (launcher): `set_device(rank)` 및 current-device scoping이
@@ -378,56 +260,6 @@ KernBench는 사내 프로젝트로 call site가 한정되어 있어 한 번에

 ---

-## Test strategy
-
-### T1. 단위 테스트 갱신
-
- `tests/test_tensor.py`, `tests/test_ipcq_types.py`, `tests/test_runtime_api_tensor.py`
-  — DPPolicy 생성자 인자 정리, ShardSpec 구조적 좌표 검증
- `tests/test_va_offset.py` — `num_sips=1` 제거 후 동작 유지
-
-### T2. `resolve_dp_policy` 구조적 좌표 반환
-
-`tests/test_dp_policy.py` (new 또는 확장):
- `resolve_dp_policy(dp, ..., target_sip=1)` 결과의 모든 ShardSpec이 `sip=1`
- 각 spec의 `(cube, pe)`가 local (0..num_cubes-1, 0..num_pe-1)
- 같은 topology에서 `target_sip=0`과 `target_sip=1` 결과가 sip 필드만 다름
-
-### T3. `test_sip_parallel.py` 재작성
-
-SIP 병렬성 검증을 launcher 기반으로:
-
-```python
-def test_sip_parallel_via_launcher(topology):
-    ...
-    def worker(rank, ws, torch):
-        torch.ahbm.set_device(rank)
-        t = torch.zeros((1, 128), dtype="f16",
-                         dp=DPPolicy(cube="column_wise", pe="column_wise"))
-        # verify shard.sip == rank (structural coord)
-
-    spawn(worker, nprocs=n_sips, ...)
-```
-
-### T4. Allocator key migration
-
-`tests/test_allocator_structural_key.py` (new 또는 기존 확장):
- `PEMemAllocator` dict이 `(sip, cube, pe)` tuple key로 작동
- `deploy_tensor`가 구조적 좌표로 allocator lookup
- `_free_tensor`도 동일
-
-### T5. E2E 회귀
-
-ADR-0024의 `test_ccl_allreduce_matrix.py` 그대로 통과.
-
-### T6. 오류 검증
-
- `DPPolicy(sip="column_wise")` 호출 → `TypeError`. 테스트로 명시.
- `DPPolicy(num_sips=2)` 호출 → `TypeError`.
- `spec.pe_index` 접근 → `AttributeError` (property 완전 제거 검증).
-
---
-
 ## Consequences

 ### Positive
@@ -454,23 +286,3 @@ ADR-0024의 `test_ccl_allreduce_matrix.py` 그대로 통과.
 ### Neutral

 - 기존 `cube` / `pe` 필드 의미 불변.
-
---
-
-## Affected files
-
-| File | Change |
-|------|--------|
-| `src/kernbench/policy/placement/dp.py` | D1: `sip`/`num_sips` 제거 / D2: `ShardSpec`에 `sip`/`cube`/`pe` structural fields 추가, **`pe_index` property 제거** / D3: `resolve_dp_policy`에 `target_sip`, SIP-level 루프 제거 / 내부 resolver가 반환하는 shard 타입 이름도 `local_pe`로 명확화 (이름 충돌 방지) |
-| `src/kernbench/runtime_api/context.py` | D4: `_create_tensor` `target_sip` 전달 / D5: `_ensure_allocators` dict key → `(sip, cube, pe)` tuple / `launch`의 `dp.sip` 분기 제거 |
-| `src/kernbench/runtime_api/tensor.py` | D5: `deploy_tensor`가 구조적 좌표로 allocator lookup |
-| `src/kernbench/components/builtin/pe_cpu.py` | D6: `dp.sip` 참조 제거 |
-| `src/kernbench/components/legacy/builtin/pe_cpu.py` | D6: 동일 |
-| `benches/ccl_allreduce.py` | ADR-0024 scope에서 이미 처리 |
-| `benches/va_offset_verify.py` | D6: `num_sips=1` 제거 |
-| `tests/test_runtime_api_tensor.py` | D6 |
-| `tests/test_va_offset.py` | D6 |
-| `tests/test_tensor.py`, `test_ipcq_types.py` | D6 |
-| `tests/test_sip_parallel.py` | T3: launcher 기반 재작성 |
-| `tests/test_dp_policy.py` (new 또는 확장) | T2 |
-| `tests/test_allocator_structural_key.py` (new) | T4 |