docs: add ADRs 0024–0031 for SIP-TP launcher stack

ADR-0024 (SIP-level TP launcher): rank = SIP abstraction, engine-routed install, mp.spawn parity, epoch barrier, ShardSpec structural coords. ADR-0025 (IPCQ direction addressing): address-based matching for meta arrival and credit return; fixes 2-rank bidirectional ring deadlock. ADR-0026 (DPPolicy intra-device only): remove sip/num_sips fields; ShardSpec uses structural (sip, cube, pe); pe_index property removed. ADR-0027 (Megatron-style TP API): ColumnParallelLinear / RowParallelLinear on top of ADR-0024 launcher. Backlog until 0024/0025/0026 land. ADR-0028 (DTensor support): stub / future work. ADR-0029 (Hierarchical all-reduce): 3-level reduce using all_pes mapper and multi_pe_sip_local validator from ADR-0024. Backlog. ADR-0030 (IPCQ PhysAddr integration): blocked on ADR-0031. ADR-0031 (PhysAddr PE-resource extension): stub; local_offset range-based partition approach; specific ranges TBD. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 00:38:27 -07:00
parent b2c52f0e34
commit e1084800ab
8 changed files with 3366 additions and 0 deletions
@@ -0,0 +1,476 @@
+# ADR-0026: DPPolicy = Intra-Device Only — sip/num_sips 필드 제거
+
+## Status
+
+Proposed (Revision 4 — 문서 일관성 + grep audit 구체화)
+
+## Context
+
+### 목표
+
+`DPPolicy`를 **한 device(SIP) 내부의 cube × PE 분산**만 표현하는 순수한
+intra-device 추상화로 명확화한다. SIP 간 분산(TP)은 별도 레이어로 분리
+(ADR-0024의 `torch.ahbm.set_device(rank)` 또는 ADR-0027의 Megatron parallel
+layers가 담당).
+
+### 현재 상태
+
+`src/kernbench/policy/placement/dp.py`:
+
+```python
+@dataclass(frozen=True)
+class DPPolicy:
+    sip: Literal["replicate", "column_wise", "row_wise"] = "replicate"
+    cube: Literal["replicate", "column_wise", "row_wise"] = "replicate"
+    pe: Literal["replicate", "column_wise", "row_wise"] = "replicate"
+    num_pes: int | None = None
+    num_cubes: int | None = None
+    num_sips: int | None = None    # ← 제거 대상
+```
+
+`sip` / `num_sips` 필드는 텐서를 SIP 경계 **너머**로 분산하는 경로를 제공함.
+이는:
+
+- **ADR-0024의 launcher 모델과 충돌**: ADR-0024는 "rank = SIP = 1 worker per SIP"
+  모델. 각 worker가 자기 SIP에 텐서를 생성. 텐서가 여러 SIP에 걸치는 경우는
+  Megatron-style TP가 개별 primitive로 처리해야 함.
+- **사용자 의도와 불일치**: "DPPolicy는 한 디바이스 내에서 PE들로 분산하는 방법"
+  (사용자 진술).
+- **개념 혼동**: `DPPolicy.sip="column_wise"`는 실제로 **TP**. 이름이 DP인데
+  하는 일은 TP → 신규 사용자에게 혼란.
+
+### 영향받는 call site (rollback 시점 grep 결과)
+
+**생성 사이트** (`DPPolicy(sip=...` 또는 `num_sips=...`):
+- `tests/test_runtime_api_tensor.py`
+- `benches/ccl_allreduce.py` (ADR-0024 scope 내에서 이미 개편됨)
+- `tests/test_va_offset.py`
+- `benches/va_offset_verify.py`
+- `tests/test_sip_parallel.py`
+
+**참조 사이트** (`dp.sip`, `policy.sip`, `num_sips` 등):
+- `src/kernbench/runtime_api/context.py` (`_create_tensor`, `launch`)
+- `src/kernbench/components/builtin/pe_cpu.py`
+- `src/kernbench/components/legacy/builtin/pe_cpu.py`
+- `src/kernbench/policy/placement/dp.py` (구현 자체)
+- `tests/test_tensor.py`, `test_ipcq_types.py`
+
+**핵심 테스트**: `test_sip_parallel.py`는 이름 그대로 "SIP 병렬성을 DPPolicy로
+표현하는" 테스트. 이 ADR 이후 **새 launcher 모델로 재작성** 필요.
+
+---
+
+## Decision
+
+### D1. `DPPolicy`에서 `sip` + `num_sips` 필드 제거
+
+```python
+@dataclass(frozen=True)
+class DPPolicy:
+    """Intra-device (cube × PE) data-parallel policy.
+
+    SIP-level placement is controlled by ``torch.cuda.set_device(rank)``
+    (ADR-0024) and, for model-level TP, by Megatron-style parallel layers
+    (ADR-0027). DPPolicy does not cross SIP boundaries.
+    """
+    cube: Literal["replicate", "column_wise", "row_wise"] = "replicate"
+    pe: Literal["replicate", "column_wise", "row_wise"] = "replicate"
+    num_pes: int | None = None
+    num_cubes: int | None = None
+```
+
+제거되는 필드: `sip`, `num_sips`.
+
+### D2. `ShardSpec` — structural (sip, cube, pe) 좌표, `pe_index` 완전 제거
+
+현재 `ShardSpec.pe_index`는 **global flat index** (`sip × cubes × pes + cube ×
+pes + pe`). 이는 ADR-0024 D11이 "abstraction leakage"로 지적한 형태.
+
+본 ADR에서 ShardSpec을 **structural 좌표로 재정의**하고, `pe_index`는
+property로도 **남기지 않는다**:
+
+```python
+# src/kernbench/policy/placement/dp.py (after)
+@dataclass(frozen=True)
+class ShardSpec:
+    """Structural shard placement — intra-SIP (cube × PE) coord.
+
+    Global-flat `pe_index` was removed in ADR-0026. Callers must use
+    structural coords (sip, cube, pe) directly. If a flat integer key is
+    needed (e.g. dict lookup), compute it explicitly at the call site.
+    """
+    sip: int              # structural — which SIP this shard lives on
+    cube: int             # local within SIP
+    pe: int               # local within cube
+    offset_bytes: int
+    nbytes: int
+```
+
+**핵심 원칙**:
+- ShardSpec의 정체성은 `(sip, cube, pe)` 3튜플.
+- **`pe_index` property도 없음** — silent semantics drift 차단.
+- Global flat을 기대한 기존 호출자는 `.pe_index` 접근 시 **즉시
+  `AttributeError`** → 반드시 구조적 좌표로 migration.
+- Flat integer key가 필요한 국소 문맥 (예: 내부 dict lookup)은 호출자가
+  명시적으로 `spec.sip * N_CUBES * N_PE + spec.cube * N_PE + spec.pe`를 계산.
+
+**Property 제거 정당화**: KernBench는 사내 프로젝트로 call site가 한정되어
+있음. Silent drift 위험 (의미만 바뀌고 타입은 같은 int) 대비 explicit breakage
+(AttributeError)가 훨씬 안전.
+
+### D3. `resolve_dp_policy`가 `target_sip`을 받아 structural 좌표 생성
+
+ADR-0024 D11의 계약 구현. Post-hoc shifting 없음.
+
+```python
+# src/kernbench/policy/placement/dp.py (after)
+
+@dataclass(frozen=True)
+class _LocalPeShard:
+    """Internal — PE resolver의 반환. Cube 내 local PE 식별자 + payload."""
+    local_pe: int                  # cube-local PE index (0..num_pe-1)
+    offset_bytes: int
+    nbytes: int
+
+
+def resolve_dp_policy(
+    policy: DPPolicy,
+    *,
+    shape: tuple[int, int],
+    itemsize: int,
+    num_pe: int,
+    num_cubes: int = 1,
+    target_sip: int,       # NEW — 어느 SIP에 배치할지 명시
+) -> list[ShardSpec]:
+    """2-level resolution (cube × PE) on a specified SIP.
+
+    Returns ShardSpecs with structural coords (sip=target_sip, cube, pe).
+    No SIP-level split — DPPolicy is intra-device only.
+    """
+    resolver = _PE_RESOLVERS[policy.pe]
+    all_shards: list[ShardSpec] = []
+
+    # Level 1: cube within SIP
+    cube_splits = _split_shape(policy.cube, shape, num_cubes, itemsize)
+
+    for cube_id, (cube_shape, cube_offset) in enumerate(cube_splits):
+        # Level 2: PE within cube — resolver returns _LocalPeShard (local_pe)
+        local_shards = resolver(shape=cube_shape, itemsize=itemsize,
+                                 num_pe=num_pe)
+
+        for ls in local_shards:
+            all_shards.append(ShardSpec(
+                sip=target_sip,                   # from caller (current_device)
+                cube=cube_id,                     # local within SIP
+                pe=ls.local_pe,                   # local within cube (explicit name)
+                offset_bytes=cube_offset + ls.offset_bytes,
+                nbytes=ls.nbytes,
+            ))
+
+    return all_shards
+```
+
+**내부 resolver** (`column_wise`, `row_wise`, `replicate`)는 `_LocalPeShard`
+리스트 반환 — `local_pe` 필드명으로 **"cube-local PE identifier"임이 명시적**.
+과거 `ShardSpec.pe_index`와 이름이 혼동되던 문제 해소.
+
+**이름 규약 정리** (전체 ADR):
+- `ShardSpec.pe`: 최종 외부 API — cube-local PE (structural coord)
+- `_LocalPeShard.local_pe`: 내부 resolver 단계의 동일 의미
+- `pe_index`: **제거**. 외부/내부 어디에도 남기지 않는다 (silent drift 차단의
+  부가 효과: 이름 재등장 없음).
+
+### D4. `_create_tensor` — 구조적 좌표로 직접 placement
+
+ADR-0024 D11 연속선. Post-hoc shifting 제거, 구조적 좌표를 `resolve_dp_policy`
+호출 시점에 직접 지정.
+
+```python
+# context.py _create_tensor (after)
+current_sip = self.ahbm.current_device()
+if current_sip is None:
+    # Single-driver fallback (ADR-0024 D9와 일관).
+    # Launcher 기반 코드가 set_device()를 빼먹으면 조용히 SIP 0에 박히는
+    # 문제가 있음 → debug mode에서 경고.
+    if os.environ.get("KERNBENCH_DEBUG"):
+        import warnings
+        warnings.warn(
+            "torch.ahbm.current_device() is None; defaulting to SIP 0. "
+            "If this is a multi-rank launcher context, you likely forgot "
+            "torch.ahbm.set_device(rank) inside the worker.",
+            stacklevel=2,
+        )
+    current_sip = 0
+
+placement = resolve_dp_policy(
+    dp,
+    shape=shape_2d,
+    itemsize=itemsize,
+    num_pe=eff_num_pe,
+    num_cubes=eff_num_cubes,
+    target_sip=current_sip,          # ← 구조적 좌표 일차 지정
+)
+
+# placement의 각 ShardSpec은 이미 (sip=current_sip, cube=local, pe=local) 포함.
+# 과거의 post-hoc shifting 블록은 완전히 제거.
+```
+
+**모든** 텐서가 current device SIP에 배치됨. Multi-SIP 텐서를 만들고 싶으면
+ADR-0027의 TP primitive 사용.
+
+**Single-driver fallback의 trade-off**: set_device 없는 호출에서 SIP 0으로
+default는 기존 single-driver 테스트 호환을 위해 유지. `KERNBENCH_DEBUG=1`
+환경에서는 launcher 컨텍스트의 실수로 set_device 누락 시 조용히 잘못된 SIP에
+배치되는 것을 감지할 수 있도록 warning.
+
+### D5. Downstream — allocator lookup은 구조적 tuple key로
+
+기존 `deploy_tensor` (`src/kernbench/runtime_api/tensor.py`):
+
+```python
+for spec in placement:
+    alloc = allocators[spec.pe_index]       # ← AttributeError (property 제거됨)
+```
+
+`pe_index`가 없어졌으므로 구조적 좌표로 **강제** migration:
+
+```python
+for spec in placement:
+    alloc = allocators[(spec.sip, spec.cube, spec.pe)]
+```
+
+`_ensure_allocators`의 dict population도 tuple key로:
+
+```python
+# context.py _ensure_allocators (after)
+for sip_id in sip_range:
+    for cube_id in range(cubes_per_sip):
+        for pe_id in range(pes_per_cube):
+            self._allocators[(sip_id, cube_id, pe_id)] = PEMemAllocator(
+                rack_id=0, sip_id=sip_id, cube_id=cube_id, pe_id=pe_id, cfg=cfg,
+            )
+```
+
+`_free_tensor`도 동일: 기존 `flat_idx = sip * ... + cube * ... + pe` 계산
+블록 제거, `(shard.sip, shard.cube, shard.pe)` 직접 사용.
+
+**Tuple vs dataclass `PEIdentity`**: Tuple이 단순하고 hashable로 바로 써서
+권고. `PEIdentity` 값객체는 명시적 타입 장점은 있지만 boilerplate가 크고 현재
+allocator dict의 유일한 key라 오버엔지니어링. Tuple 유지.
+
+### D6. Migration — 기존 call site
+
+**(A) `DPPolicy(sip=..., num_sips=..., ...)` 사용하던 코드**:
+
+- `DPPolicy(sip="column_wise", cube=..., pe=...)` 패턴 → **해당 bench를 ADR-0024
+  launcher로 재작성**. worker가 `set_device(rank)`로 SIP 선택, DPPolicy는
+  cube/PE만.
+- `DPPolicy(sip="replicate", num_sips=1, ...)` 패턴 → `DPPolicy(cube=..., pe=...)`로
+  축소 (필드가 사라지니 자연스럽게).
+
+**(B) `dp.sip`, `dp.num_sips` 읽던 코드**:
+
+- 제거. `launch()`의 `_compute_local_shape`에서 `dp.sip` 분기 삭제.
+- `pe_cpu.py`가 `dp.sip`을 참조하던 곳도 정리.
+
+**(C) `ShardSpec.pe_index`를 사용하던 코드 — 전부 수정 필요**:
+
+- `.pe_index` 접근은 이제 `AttributeError` 발생 → 모든 call site 수정 필수.
+- Allocator lookup: `allocators[spec.pe_index]` →
+  `allocators[(spec.sip, spec.cube, spec.pe)]`
+- Flat integer가 꼭 필요한 국소 문맥: `spec.sip * N_CUBES * N_PE + spec.cube *
+  N_PE + spec.pe` 명시적 계산. **국소 변수로만 사용하고 공개 API에 노출하지
+  않는다**.
+
+**구현 착수 전 grep audit 체크리스트**:
+
+1. **Property 참조**:
+   - `\.pe_index\b` — 필드/property 접근 모두 (regex)
+   - `pe_index=` — 생성 시점의 키워드 인자
+   - `pe_index:` — dataclass 필드 선언
+2. **Allocator / dict indexing**:
+   - `allocators\[` — dict lookup 패턴. `allocators[spec.pe_index]` 같은
+     것이 걸리는지
+   - `_allocators\[` — 같은 패턴 (prefix _)
+3. **Flat index 수동 계산 블록**:
+   - `flat_idx =`
+   - `pe_index =` (좌변)
+   - `* pes_per_cube +` (전형적 flat 계산 패턴)
+   - `* self._num_cubes \* self._pes_per_cube` (global flat 계산)
+4. **Serialization / logging**:
+   - `asdict(.*shard` — dataclass 직렬화 시 `pe_index` 자동 포함 여부
+   - `repr(.*ShardSpec` — 로그 포맷에서 의존하는지
+   - JSON/YAML 저장 포맷에서 `pe_index` 키 사용 여부
+5. **Tests asserting integer PE identity**:
+   - `assert .*pe_index` — 정수 동일성 주장
+   - `spec.pe_index ==` — 비교 (SIP-local 의미로 변하면 테스트가 깨질 수 있음)
+
+각 match마다 "이 호출자가 global flat / SIP-local / 내부 lookup 중 무엇을
+기대했나"를 판단한 뒤 구조적 좌표로 교체.
+
+**(D) `test_sip_parallel.py`**:
+
+- 이름 유지, 내용은 ADR-0024의 multi-greenlet launcher 기반 재작성.
+- "SIP 병렬성 = rank 별 worker × 각자 DPPolicy" 로 검증.
+
+**(E) `test_va_offset.py`, `benches/va_offset_verify.py`**:
+
+- `num_sips=1`만 쓰는 경우가 대부분. 단순히 필드 제거.
+- SIP offset 테스트가 핵심이면 `set_device(rank)` + 구조적 좌표 관찰로 이식.
+
+### D7. 하위 호환 — 불가 (cleanup ADR)
+
+이 ADR은 **breaking change**.
+
+1. `DPPolicy(sip=...)` 또는 `DPPolicy(num_sips=...)` 호출 → `TypeError`
+2. `ShardSpec.pe_index` 접근 → `AttributeError`
+
+모두 **즉시 명시적 breakage**. Deprecation warning / fallback 경로 없음.
+KernBench는 사내 프로젝트로 call site가 한정되어 있어 한 번에 migration.
+
+**Silent drift 차단**이 property 완전 제거의 주된 이점: global flat을 기대한
+코드가 SIP-local 결과를 받아 조용히 잘못된 인덱싱을 할 가능성 제거.
+
+### D8. 문서 업데이트
+
+- `ADR-0008` (tensor deploy) — DPPolicy 의미 갱신 note, ShardSpec 구조적 좌표
+  전환 명시
+- DPPolicy docstring에 "intra-device only" 명시 (D1 코드 스니펫의 docstring)
+- ShardSpec docstring에 **structural coordinates `(sip, cube, pe)`를 직접
+  사용하며, `pe_index`는 더 이상 제공되지 않음**을 명시 (D2)
+- `docs/ccl-author-guide` 등 튜토리얼에서 `sip=...` 예시 제거
+
+---
+
+## Dependencies
+
+- **ADR-0024** (launcher): `set_device(rank)` 및 current-device scoping이
+  SIP 배치 메커니즘 제공. 본 ADR은 그 위에 서서 DPPolicy를 순수 intra-device로
+  좁힘.
+- **ADR-0027** (Megatron TP): 다중 SIP에 걸친 텐서가 필요한 경우의 대안 경로.
+  이 ADR 적용 후 multi-SIP use case는 ADR-0027로 이관.
+
+---
+
+## Non-goals
+
+- **`DPPolicy.cube` / `pe` 재설계**: 기존 replicate/column_wise/row_wise 의미
+  유지.
+- **Tiling 정책 통합**: `tiled_column_major` / `tiled_row_major`는 그대로.
+- **Multi-device 텐서 추상화 신규**: DTensor-like는 ADR-0028.
+
+---
+
+## Open questions
+
+- **`_create_tensor`의 current_sip 기본값**: set_device 없는 호출에서 rank=0
+  (SIP 0)로 fallback할지, 아니면 error 낼지. 권고는 fallback (기존 single-driver
+  테스트와의 호환).
+- **`test_sip_parallel.py` 재작성 범위**: 기존 단위 테스트의 의도를 유지하며
+  launcher 기반으로 옮기려면 추가 fixture 필요. 별도 작업으로 scope.
+- **`DPPolicy`의 `num_sips=None` 의미**: 필드가 없어지면 `num_sips` 개념 자체가
+  사라짐. Multi-SIP을 표현하고 싶으면 ADR-0027의 TP primitive를 쓰라는 것이
+  명시적 답.
+
+**Resolved (이전 rev에서 open이었던 것들)**:
+- ~~`ShardSpec.pe_index` property 존치 여부~~ → **완전 제거** (D2)
+- ~~`_ensure_allocators` dict key 형식~~ → **tuple `(sip, cube, pe)`** (D5)
+
+---
+
+## Test strategy
+
+### T1. 단위 테스트 갱신
+
+- `tests/test_tensor.py`, `tests/test_ipcq_types.py`, `tests/test_runtime_api_tensor.py`
+  — DPPolicy 생성자 인자 정리, ShardSpec 구조적 좌표 검증
+- `tests/test_va_offset.py` — `num_sips=1` 제거 후 동작 유지
+
+### T2. `resolve_dp_policy` 구조적 좌표 반환
+
+`tests/test_dp_policy.py` (new 또는 확장):
+- `resolve_dp_policy(dp, ..., target_sip=1)` 결과의 모든 ShardSpec이 `sip=1`
+- 각 spec의 `(cube, pe)`가 local (0..num_cubes-1, 0..num_pe-1)
+- 같은 topology에서 `target_sip=0`과 `target_sip=1` 결과가 sip 필드만 다름
+
+### T3. `test_sip_parallel.py` 재작성
+
+SIP 병렬성 검증을 launcher 기반으로:
+
+```python
+def test_sip_parallel_via_launcher(topology):
+    ...
+    def worker(rank, ws, torch):
+        torch.ahbm.set_device(rank)
+        t = torch.zeros((1, 128), dtype="f16",
+                         dp=DPPolicy(cube="column_wise", pe="column_wise"))
+        # verify shard.sip == rank (structural coord)
+
+    spawn(worker, nprocs=n_sips, ...)
+```
+
+### T4. Allocator key migration
+
+`tests/test_allocator_structural_key.py` (new 또는 기존 확장):
+- `PEMemAllocator` dict이 `(sip, cube, pe)` tuple key로 작동
+- `deploy_tensor`가 구조적 좌표로 allocator lookup
+- `_free_tensor`도 동일
+
+### T5. E2E 회귀
+
+ADR-0024의 `test_ccl_allreduce_matrix.py` 그대로 통과.
+
+### T6. 오류 검증
+
+- `DPPolicy(sip="column_wise")` 호출 → `TypeError`. 테스트로 명시.
+- `DPPolicy(num_sips=2)` 호출 → `TypeError`.
+- `spec.pe_index` 접근 → `AttributeError` (property 완전 제거 검증).
+
+---
+
+## Consequences
+
+### Positive
+
+- **개념 분리 명확**: DPPolicy = intra-device, TP = inter-device.
+- **API 단순화**: DPPolicy 생성자 필드 ~33% 축소.
+- **Structural 좌표 일관성**: ShardSpec이 `(sip, cube, pe)` 튜플로 표현 →
+  abstraction leakage 해소 (ADR-0024 D11 계약 충족).
+- **`pe_index` 의미 명확**: SIP-local이 단일 해석. Global flat이 필요하면 명시.
+- **Launcher 모델 일관성**: ADR-0024의 "1 worker per SIP" 모델이 유일한 SIP
+  경계 제어 메커니즘.
+
+### Negative
+
+- **Breaking change (explicit)**: `DPPolicy(sip=...)` → `TypeError`,
+  `spec.pe_index` → `AttributeError`. 모든 호출자 한 번에 수정 필요.
+- **ShardSpec schema 변경**: `pe_index` 단일 필드 → `sip`/`cube`/`pe` 세 필드.
+  Downstream (`deploy_tensor`, `_free_tensor`, `_ensure_allocators`,
+  `allocators` dict key 등) 연쇄 수정.
+- **Silent drift 없음**: property 완전 제거로 runtime에서 즉시 실패 →
+  migration leakage 원천 차단. (Negative가 아니라 explicit tradeoff)
+- `test_sip_parallel.py` 재작성 비용.
+
+### Neutral
+
+- 기존 `cube` / `pe` 필드 의미 불변.
+
+---
+
+## Affected files
+
+| File | Change |
+|------|--------|
+| `src/kernbench/policy/placement/dp.py` | D1: `sip`/`num_sips` 제거 / D2: `ShardSpec`에 `sip`/`cube`/`pe` structural fields 추가, **`pe_index` property 제거** / D3: `resolve_dp_policy`에 `target_sip`, SIP-level 루프 제거 / 내부 resolver가 반환하는 shard 타입 이름도 `local_pe`로 명확화 (이름 충돌 방지) |
+| `src/kernbench/runtime_api/context.py` | D4: `_create_tensor` `target_sip` 전달 / D5: `_ensure_allocators` dict key → `(sip, cube, pe)` tuple / `launch`의 `dp.sip` 분기 제거 |
+| `src/kernbench/runtime_api/tensor.py` | D5: `deploy_tensor`가 구조적 좌표로 allocator lookup |
+| `src/kernbench/components/builtin/pe_cpu.py` | D6: `dp.sip` 참조 제거 |
+| `src/kernbench/components/legacy/builtin/pe_cpu.py` | D6: 동일 |
+| `benches/ccl_allreduce.py` | ADR-0024 scope에서 이미 처리 |
+| `benches/va_offset_verify.py` | D6: `num_sips=1` 제거 |
+| `tests/test_runtime_api_tensor.py` | D6 |
+| `tests/test_va_offset.py` | D6 |
+| `tests/test_tensor.py`, `test_ipcq_types.py` | D6 |
+| `tests/test_sip_parallel.py` | T3: launcher 기반 재작성 |
+| `tests/test_dp_policy.py` (new 또는 확장) | T2 |
+| `tests/test_allocator_structural_key.py` (new) | T4 |