ADR: translate adr-ko/ to Korean, fix ADR-0013 slug, refine Status check

Follow-up to the bilingual-structure commit: docs/adr-ko/ now holds only Korean versions (24 files translated from English placeholders), ADR-0013 slug uses kebab-case in both folders, and the verify tool allows translated parenthetical commentary in the Status block. - Translate 24 English files in docs/adr-ko/ to Korean. The previous bilingual-structure commit had left these as English copies because their source content was already English; this commit fulfills the policy that docs/adr-ko/ contains only Korean. - Rename ADR-0013 in both adr/ and adr-ko/ from ver-verification_strategy.md to ver-verification-strategy.md (kebab-case consistency with other ADRs). - CLAUDE.md (ADR Translation Discipline): clarify that only the Status lifecycle keyword (Accepted / Proposed / Stub / Draft / Superseded by ADR-NNNN / Merged into ADR-NNNN) must match across EN and KO; parenthetical commentary and trailing list items may be translated. - tools/verify_adr_lang_pairs.py: replace byte-equal Status check with normalize_status_keyword() which strips parenthetical commentary and takes only the first non-empty line. - tests/test_verify_adr_lang_pairs.py: update existing test names, add coverage for translated parenthetical, translated trailing list, and Superseded-by-NNNN keyword equality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 08:17:56 -07:00
parent a796c1d2f7
commit 168b0c89f0
29 changed files with 2631 additions and 2651 deletions
@@ -1,4 +1,4 @@
-# ADR-0017: Cube NOC and HBM Connectivity
+# ADR-0017: 큐브 NoC와 HBM 연결성

 ## Status

@@ -6,78 +6,74 @@ Accepted

 ## Context

-The CUBE-level NOC is a 2D router mesh that carries every intra-cube
-request: PE-to-HBM data, PE-to-PE traffic, command paths
-(M_CPU↔PE_CPU), shared SRAM access, and inter-cube UCIe traffic.
+CUBE 레벨의 NoC는 모든 큐브 내부 요청을 운반하는 2D 라우터 메시이다:
+PE-HBM 데이터, PE-PE 트래픽, 명령 경로(M_CPU↔PE_CPU), 공유 SRAM 접근,
+큐브 간 UCIe 트래픽.

-The CUBE's HBM is exposed through per-PE controller endpoints attached
-to PE routers. This per-PE partitioning makes local-vs-remote HBM
-distinguishable by mesh distance: a PE's own HBM partition sits at its
-own router (switching overhead only); another PE's HBM partition is
-reachable by mesh hops to that PE's router.
+CUBE의 HBM은 PE 라우터에 부착된 PE별 컨트롤러 엔드포인트를 통해 노출된다.
+이러한 PE별 분할 덕분에 로컬-vs-원격 HBM이 메시 거리로 구분 가능하다:
+PE 자신의 HBM 파티션은 자신의 라우터에 위치하고(스위칭 오버헤드만 발생),
+다른 PE의 HBM 파티션은 해당 PE의 라우터로 메시 hop을 거쳐 도달 가능하다.

-Two channel-mapping modes are supported in the design space:
+설계 공간에서는 두 가지 채널 매핑 모드를 지원한다:

- **n:1 (default, implemented)** — each PE's HBM partition aggregates
-  `channels_per_pe` pseudo-channels into one endpoint. Effective
-  per-PE BW = N × per-channel BW.
- **1:1 (future)** — each PE router decomposes into per-channel
-  mini-routers; per-channel BW contention is modeled directly.
+- **n:1 (default, 구현됨)** — 각 PE의 HBM 파티션이 `channels_per_pe`
+  pseudo-channel을 하나의 엔드포인트로 집계한다. 유효 PE당 BW =
+  N × per-channel BW.
+- **1:1 (future)** — 각 PE 라우터가 채널별 미니 라우터로 분해된다;
+  채널별 BW 경합을 직접 모델링한다.

-In both modes the per-PE effective BW is identical; only the connectivity
-granularity differs.
+두 모드 모두 PE당 유효 BW는 동일하다; 연결 입도만 다르다.

 ## Decision

-### D1. 2D router mesh
+### D1. 2D 라우터 메시

-Each cube contains a 2D mesh of NOC routers generated by `mesh_gen.py`.
+각 큐브는 `mesh_gen.py`가 생성하는 2D 라우터 메시를 포함한다.

- Node naming: `sip{S}.cube{C}.r{row}c{col}` (e.g., `sip0.cube0.r0c0`).
- Implementation: `forwarding_v1`. NOC `overhead_ns = 0`.
- Default 6×6 grid (sized from PE corner placement + UCIe attachment
-  count); larger PE counts scale the grid up.
- HBM exclusion zone: center rows/columns are excluded where HBM die
-  physically occupies space (e.g., r2c2, r2c3, r3c2, r3c3 for a 6×6).
- Latency = Manhattan distance × `ns_per_mm`.
+- 노드 명명: `sip{S}.cube{C}.r{row}c{col}` (예: `sip0.cube0.r0c0`).
+- 구현: `forwarding_v1`. NoC `overhead_ns = 0`.
+- 기본 6×6 그리드 (PE 코너 배치 + UCIe 부착 개수로 산정); 더 큰 PE
+  개수는 그리드를 확장한다.
+- HBM 제외 영역: HBM 다이가 물리적으로 점유하는 중앙 행/열을 제외한다
+  (예: 6×6의 경우 r2c2, r2c3, r3c2, r3c3).
+- 레이턴시 = Manhattan 거리 × `ns_per_mm`.

-### D2. XY routing algorithm
+### D2. XY 라우팅 알고리즘

-Deterministic XY routing:
+결정론적 XY 라우팅:

-1. Horizontal segment: route from source X to destination X at source Y.
-2. Vertical segment: route from destination X at source Y to destination Y.
+1. 수평 구간: 소스 X에서 목적지 X까지 소스 Y에서 라우팅.
+2. 수직 구간: 소스 Y의 목적지 X에서 목적지 Y까지 라우팅.

-Each directed segment carries a unique key:
+각 유향 구간은 고유 키를 운반한다:

- Horizontal: `("H", y_band, x_min, x_max, direction)`
- Vertical:   `("V", x_band, y_min, y_max, direction)`
+- 수평: `("H", y_band, x_min, x_max, direction)`
+- 수직:   `("V", x_band, y_min, y_max, direction)`

-Grid positions are snapped to the router grid, excluding the HBM zone.
+그리드 위치는 HBM 영역을 제외하고 라우터 그리드에 스냅된다.

-### D3. Per-segment contention model
+### D3. 구간별 경합 모델

-Each directed XY segment is a `simpy.Resource(capacity=1)`. Transactions
-sharing a segment (same row or column band, same direction) contend for
-the resource — modelling link-level serialization in a wormhole-routed
-mesh.
+각 유향 XY 구간은 `simpy.Resource(capacity=1)`이다. 동일 구간을 공유하는
+트랜잭션(동일한 행 또는 열 밴드, 동일한 방향)은 자원을 두고 경합한다 —
+wormhole 라우팅 메시에서의 링크 수준 직렬화를 모델링한다.

-With no contention, NOC traversal latency equals Manhattan distance ×
-`ns_per_mm`. Under contention, SimPy's resource scheduling adds queueing
-delay.
+경합이 없을 때 NoC 순회 레이턴시는 Manhattan 거리 × `ns_per_mm`이다.
+경합이 있을 때는 SimPy의 자원 스케줄링이 큐잉 지연을 추가한다.

-### D4. NOC attachment points (per-PE HBM partition)
+### D4. NoC 부착 지점 (PE별 HBM 파티션)

-Every PE router carries three attachments: `pe{idx}.dma`, `pe{idx}.cpu`,
-and `pe{idx}.hbm`. The last is the per-PE HBM controller endpoint —
-`sip{S}.cube{C}.hbm_ctrl.pe{idx}` — which owns one slice of the cube's
-HBM (one pseudo-channel group; see D8).
+모든 PE 라우터는 세 개의 부착을 갖는다: `pe{idx}.dma`, `pe{idx}.cpu`,
+그리고 `pe{idx}.hbm`. 마지막은 PE별 HBM 컨트롤러 엔드포인트로
+`sip{S}.cube{C}.hbm_ctrl.pe{idx}`이며, 큐브 HBM의 한 슬라이스를
+소유한다 (하나의 pseudo-channel 그룹; D8 참조).

-Other attachments:
+기타 부착:

- M_CPU and shared SRAM each occupy a dedicated edge router.
- UCIe endpoints (N/S/E/W) each expose 4 connection routers distributed
-  along that edge (see D6).
+- M_CPU와 공유 SRAM은 각각 전용 edge 라우터를 점유한다.
+- UCIe 엔드포인트(N/S/E/W)는 각각 해당 변에 분산된 4개의 연결 라우터를
+  노출한다 (D6 참조).

 ```text
                    UCIe-N (conn x4)
@@ -102,35 +98,34 @@ PE4.cpu <--+ +hbm.pe4|       | +hbm.pe6+--< PE6.cpu
                    UCIe-S (conn x4)
 ```

-Per-PE HBM partitioning is the key invariant that makes local vs
-cross-PE HBM distinguishable by mesh distance (see D7).
+PE별 HBM 분할은 로컬 vs 크로스-PE HBM을 메시 거리로 구분 가능하게 만드는
+핵심 불변식이다 (D7 참조).

-### D5. NOC edge bandwidths and distances
+### D5. NoC 엣지 대역폭과 거리

 | Connection                    | BW (GB/s)  | Distance      | Notes                                       |
 | ----------------------------- | ---------- | ------------- | ------------------------------------------- |
-| PE_DMA → NOC                  | 256.0      | Physical (PE) | Matches local-HBM aggregate BW              |
-| NOC → PE_CPU                  | —          | 0.0 mm        | Command path only                           |
-| Router ↔ hbm_ctrl.pe{idx}     | 256.0      | 0.0 mm        | Per PE router; N × per-channel BW (see D8)  |
-| NOC ↔ M_CPU                   | —          | 0.0 mm        | Command path                                |
-| NOC ↔ SRAM                    | 128.0 × 4  | 0.0 mm        | 512 GB/s aggregate                          |
-| NOC ↔ UCIe conn               | 128.0      | 0.0 mm        | Per connection; 4 conn per port             |
+| PE_DMA → NOC                  | 256.0      | Physical (PE) | 로컬-HBM 집계 BW와 일치                     |
+| NOC → PE_CPU                  | —          | 0.0 mm        | 명령 경로 전용                              |
+| Router ↔ hbm_ctrl.pe{idx}     | 256.0      | 0.0 mm        | PE 라우터당; N × per-channel BW (D8 참조)   |
+| NOC ↔ M_CPU                   | —          | 0.0 mm        | 명령 경로                                   |
+| NOC ↔ SRAM                    | 128.0 × 4  | 0.0 mm        | 512 GB/s 집계                               |
+| NOC ↔ UCIe conn               | 128.0      | 0.0 mm        | 연결당; 포트당 4개 conn                     |

-`0.0 mm` distances reflect the distributed nature of the NOC; actual
-traversal distance is computed via Manhattan distance within the router
-grid.
+`0.0 mm` 거리는 NoC의 분산 특성을 반영한다; 실제 순회 거리는 라우터
+그리드 내에서 Manhattan 거리로 계산된다.

-### D6. UCIe decomposition and inter-cube traffic
+### D6. UCIe 분해와 큐브 간 트래픽

-Each of the 4 UCIe ports (N, S, E, W) decomposes into:
+4개의 UCIe 포트(N, S, E, W) 각각은 다음으로 분해된다:

- 1 `ucie-{PORT}` node: UCIe protocol endpoint (`overhead = 8.0 ns`).
- 4 `ucie-{PORT}.conn{0-3}` nodes: connection bridges between NOC and UCIe.
+- `ucie-{PORT}` 노드 1개: UCIe 프로토콜 엔드포인트 (`overhead = 8.0 ns`).
+- `ucie-{PORT}.conn{0-3}` 노드 4개: NoC와 UCIe 간 연결 브리지.

-This decomposition gives 4 independent NOC↔UCIe connections per port,
-each with 128 GB/s bandwidth (512 GB/s aggregate per port).
+이 분해로 포트당 4개의 독립 NoC↔UCIe 연결이 생성되며, 각각 128 GB/s
+대역폭을 갖는다 (포트당 집계 512 GB/s).

-Inter-cube traffic path:
+큐브 간 트래픽 경로:

 ```text
 Source: PE_DMA → NOC → conn{i} → ucie-{PORT}
@@ -138,56 +133,56 @@ Source: PE_DMA → NOC → conn{i} → ucie-{PORT}
 Target: ucie-{PORT} → conn{i} → r{x}c{y} → (mesh hops) → hbm_ctrl.pe{idx}
 ```

-UCIe overhead (8.0 ns) is applied at each `ucie-{PORT}` node, so a full
-crossing incurs 16 ns (TX port + RX port).
+UCIe 오버헤드(8.0 ns)는 각 `ucie-{PORT}` 노드에서 적용되므로 전체 횡단은
+16 ns(TX 포트 + RX 포트)가 소요된다.

-### D7. Data paths through the NOC
+### D7. NoC를 통한 데이터 경로

-All intra-cube traffic uses the same router mesh — no separate fast
-paths.
+모든 큐브 내부 트래픽은 동일한 라우터 메시를 사용한다 — 별도의 fast path는
+없다.

-**Local HBM** (same PE's own partition; 0 mesh hops):
+**로컬 HBM** (동일 PE의 자신 파티션; 0 메시 hop):

 ```text
 PE_DMA → r{x}c{y} → hbm_ctrl.pe{idx}   (switching overhead only)
 ```

-**Cross-PE HBM within cube** (target PE's partition, reached by mesh):
+**큐브 내 크로스-PE HBM** (대상 PE의 파티션, 메시로 도달):

 ```text
 PE_DMA → r{x}c{y} → (mesh hops) → r{x'}c{y'} → hbm_ctrl.pe{idx'}
 ```

-Example: PE0 (on `r0c0`) accessing PE2's HBM (PE2 on `r1c4`):
+예시: PE0(`r0c0` 위)이 PE2의 HBM(PE2는 `r1c4` 위)에 접근:

 ```text
 PE0.pe_dma → r0c0 → r0c1 → r0c2 → r0c3 → r0c4 → r1c4 → hbm_ctrl.pe2
 ```

-Dijkstra computes the shortest path within the mesh.
+Dijkstra가 메시 내 최단 경로를 계산한다.

-**Cross-cube HBM** (UCIe traversal):
+**큐브 간 HBM** (UCIe 횡단):

 ```text
 PE_DMA → r{x}c{y} → conn → ucie-{PORT} → [seam] → ucie-{PORT'} → conn
       → r{x'}c{y'} → hbm_ctrl.pe{idx'}
 ```

-**Kernel launch command to PE**:
+**PE로의 커널 launch 명령**:

 ```text
 [from io_noc] → ucie → conn → r{x}c{y} → (mesh) → M_CPU → (mesh) → PE_CPU
 ```

-**Shared SRAM access**:
+**공유 SRAM 접근**:

 ```text
 PE_DMA → r{x}c{y} → (mesh) → SRAM
 ```

-### D8. HBM channel mapping mode
+### D8. HBM 채널 매핑 모드

-Channel mapping is configured at cube scope:
+채널 매핑은 큐브 범위에서 구성된다:

 ```yaml
 cube:
@@ -200,37 +195,35 @@ cube:
    hbm_total_gb_per_cube: 48
 ```

-**n:1 mode (default, implemented).** Each PE's HBM partition is a single
-endpoint `hbm_ctrl.pe{idx}` that aggregates `channels_per_pe` pseudo-
-channels. The `Router ↔ hbm_ctrl.pe{idx}` link bandwidth equals
-`channels_per_pe × hbm_channel_bw_gbs`. Pseudo-channels are assumed to
-interleave; only aggregate per-PE BW is modeled. No separate aggregated
-router node exists — the per-PE router itself serves that role.
+**n:1 모드 (default, 구현됨).** 각 PE의 HBM 파티션은 `channels_per_pe`
+pseudo-channel을 집계하는 단일 엔드포인트 `hbm_ctrl.pe{idx}`이다.
+`Router ↔ hbm_ctrl.pe{idx}` 링크 대역폭은 `channels_per_pe ×
+hbm_channel_bw_gbs`와 같다. Pseudo-channel은 인터리브된다고 가정하며,
+PE당 집계 BW만 모델링한다. 별도의 집계 라우터 노드는 존재하지 않는다 —
+PE별 라우터 자체가 그 역할을 한다.

-**1:1 mode (future).** Each PE router decomposes into N channel
-mini-routers; per-channel routing carries fully-resolved PA + channel ID.
-A `ChannelSplitter` resolves a logical access to N per-channel physical
-requests. Per-channel link models BW contention. Cross-PE channel
-access semantics are deferred to the implementation ADR.
+**1:1 모드 (future).** 각 PE 라우터가 N개의 채널 미니 라우터로
+분해된다; 채널별 라우팅이 완전히 해석된 PA + channel ID를 운반한다.
+`ChannelSplitter`가 논리적 접근을 N개의 채널별 물리 요청으로 해결한다.
+채널별 링크가 BW 경합을 모델링한다. 크로스-PE 채널 접근 시맨틱은
+구현 ADR로 연기된다.

-**BW math (defaults).**
+**BW 계산 (default 값).**

 | Parameter                          | Value                      |
 | ---------------------------------- | -------------------------- |
-| pseudo channels per cube           | 64 (parameter)             |
-| PEs per cube                       | 8 (parameter)              |
-| channels per PE (N)                | 64 / 8 = 8                 |
-| per-channel BW                     | 32 GB/s (parameter)        |
-| per-PE local BW                    | N × 32 = 256 GB/s          |
-| cube total HBM BW                  | 64 × 32 = 2048 GB/s        |
+| 큐브당 pseudo channel              | 64 (parameter)             |
+| 큐브당 PE                          | 8 (parameter)              |
+| PE당 channel (N)                   | 64 / 8 = 8                 |
+| 채널당 BW                          | 32 GB/s (parameter)        |
+| PE당 로컬 BW                       | N × 32 = 256 GB/s          |
+| 큐브 전체 HBM BW                   | 64 × 32 = 2048 GB/s        |

-Both modes give the same per-PE effective BW; only the request shape and
-contention model differ.
+두 모드 모두 PE당 유효 BW는 동일하다; 요청 형태와 경합 모델만 다르다.

-### D9. AddressResolver — per-PE HBM endpoint
+### D9. AddressResolver — PE별 HBM 엔드포인트

-The address resolver decodes a PA's HBM offset to the owning PE's
-partition:
+주소 리졸버는 PA의 HBM 오프셋을 소유 PE의 파티션으로 디코딩한다:

 ```python
 # policy/routing/router.py
@@ -241,51 +234,49 @@ if addr.kind == "hbm":
    return f"sip{s}.cube{d}.hbm_ctrl.pe{pe_id}"
 ```

-The pe_id computation is intrinsic to the routing layer (not a
-topology-time concern). Any HBM PA falls within exactly one partition,
-yielding deterministic routing.
+pe_id 계산은 라우팅 레이어의 본질적 일부이다 (토폴로지 시점 관심사가
+아니다). 모든 HBM PA는 정확히 하나의 파티션에 속하므로 결정론적 라우팅이
+보장된다.

-External callers (e.g., M_CPU DMA, Memory R/W from PCIE_EP) follow the
-same resolver path — there is no separate fast path.
+외부 호출자(예: M_CPU DMA, PCIE_EP로부터의 Memory R/W)도 동일한 리졸버
+경로를 따른다 — 별도의 fast path는 존재하지 않는다.

-### D10. Mesh generation parameters
+### D10. 메시 생성 파라미터

-`mesh_gen.py` produces `cube_mesh.yaml` from:
+`mesh_gen.py`는 다음으로부터 `cube_mesh.yaml`을 생성한다:

- `cube.pe_layout`: corner placement (NW, NE, SW, SE) and PEs per corner.
- `cube.geometry`: cube physical dimensions and HBM zone.
- `cube.ucie.n_connections`: determines router count for UCIe attachment.
+- `cube.pe_layout`: 코너 배치(NW, NE, SW, SE)와 코너당 PE 개수.
+- `cube.geometry`: 큐브 물리 치수와 HBM 영역.
+- `cube.ucie.n_connections`: UCIe 부착용 라우터 개수를 결정.

-Output `mesh_data` dictionary contains:
+출력 `mesh_data` 딕셔너리는 다음을 포함한다:

- Router grid with positions and HBM exclusion zones.
- PE-to-router attachments (`pe{idx}.dma`, `pe{idx}.cpu`, `pe{idx}.hbm`
-  per PE).
- UCIe-to-router attachments (N/S/E/W distributed across edge routers).
- M_CPU and SRAM router attachments.
+- 위치 및 HBM 제외 영역을 갖는 라우터 그리드.
+- PE-라우터 부착 (PE별 `pe{idx}.dma`, `pe{idx}.cpu`, `pe{idx}.hbm`).
+- UCIe-라우터 부착 (N/S/E/W가 edge 라우터에 분산).
+- M_CPU와 SRAM 라우터 부착.

 ## Consequences

- Local HBM (0 mesh hops, switching overhead only) and cross-PE HBM
-  (mesh hops) are naturally distinguishable, satisfying SPEC R5
-  (multi-domain communication) and ADR-0002 (no zero-latency end-to-end
-  paths).
- All cube-internal traffic routes through one mesh — single contention
-  model, single layout, single set of edge BWs.
- Per-PE HBM partitioning maps cleanly to the LA model (ADR-0011): each
-  PE's partition is the n:1 aggregate of its assigned pseudo-channels.
- 1:1 mode extension is structurally natural — split each PE router into
-  N channel routers.
- Mesh generation is fully parameterised by `topology.yaml`; PE/cube
-  geometry changes propagate without code edits.
+- 로컬 HBM(0 메시 hop, 스위칭 오버헤드만)과 크로스-PE HBM(메시 hop)이
+  자연스럽게 구분되어 SPEC R5(다중 도메인 통신)와 ADR-0002(end-to-end
+  제로 레이턴시 경로 금지)를 만족한다.
+- 모든 큐브 내부 트래픽이 하나의 메시를 통해 라우팅된다 — 단일 경합
+  모델, 단일 레이아웃, 단일 엣지 BW 집합.
+- PE별 HBM 분할이 LA 모델(ADR-0011)에 깔끔하게 매핑된다: 각 PE의
+  파티션은 할당된 pseudo-channel의 n:1 집계이다.
+- 1:1 모드 확장이 구조적으로 자연스럽다 — 각 PE 라우터를 N개의 채널
+  라우터로 분해한다.
+- 메시 생성이 `topology.yaml`로 완전히 파라미터화된다; PE/큐브 기하
+  변경이 코드 수정 없이 전파된다.

 ## Links

- ADR-0002 (Routing distance, ordering, no zero-latency paths)
- ADR-0003 D3 (cube-level NOC definition — extended here)
- ADR-0004 (Memory semantics, local HBM)
- ADR-0011 (Memory addressing — LA model consumes per-PE partition)
- ADR-0014 D1 (PE_DMA egress via router mesh)
- ADR-0015 D4 (fabric paths for Memory R/W and Kernel Launch)
- ADR-0016 (IOChiplet io_noc — analogous pattern at IO chiplet level)
- ADR-0033 (Latency model: per-PC parallelism, switch penalty)
+- ADR-0002 (라우팅 거리, 순서, 제로 레이턴시 경로 금지)
+- ADR-0003 D3 (큐브 레벨 NoC 정의 — 본 ADR에서 확장)
+- ADR-0004 (메모리 시맨틱, 로컬 HBM)
+- ADR-0011 (메모리 주소 지정 — LA 모델이 PE별 파티션을 소비)
+- ADR-0014 D1 (라우터 메시를 통한 PE_DMA egress)
+- ADR-0015 D4 (Memory R/W와 Kernel Launch의 패브릭 경로)
+- ADR-0016 (IOChiplet io_noc — IO 칩렛 레벨에서의 유사 패턴)
+- ADR-0033 (레이턴시 모델: PC당 병렬성, 스위치 패널티)