ADR: translate adr-ko/ to Korean, fix ADR-0013 slug, refine Status check

Follow-up to the bilingual-structure commit: docs/adr-ko/ now holds only Korean versions (24 files translated from English placeholders), ADR-0013 slug uses kebab-case in both folders, and the verify tool allows translated parenthetical commentary in the Status block. - Translate 24 English files in docs/adr-ko/ to Korean. The previous bilingual-structure commit had left these as English copies because their source content was already English; this commit fulfills the policy that docs/adr-ko/ contains only Korean. - Rename ADR-0013 in both adr/ and adr-ko/ from ver-verification_strategy.md to ver-verification-strategy.md (kebab-case consistency with other ADRs). - CLAUDE.md (ADR Translation Discipline): clarify that only the Status lifecycle keyword (Accepted / Proposed / Stub / Draft / Superseded by ADR-NNNN / Merged into ADR-NNNN) must match across EN and KO; parenthetical commentary and trailing list items may be translated. - tools/verify_adr_lang_pairs.py: replace byte-equal Status check with normalize_status_keyword() which strips parenthetical commentary and takes only the first non-empty line. - tests/test_verify_adr_lang_pairs.py: update existing test names, add coverage for translated parenthetical, translated trailing list, and Superseded-by-NNNN keyword equality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 08:17:56 -07:00
parent a796c1d2f7
commit 168b0c89f0
29 changed files with 2631 additions and 2651 deletions
@@ -1,4 +1,4 @@
-# ADR-0034: HBM Controller Internal Design
+# ADR-0034: HBM 컨트롤러 내부 설계

 ## Status

@@ -6,111 +6,108 @@ Accepted

 ## Context

-`HbmCtrlComponent` is the per-PE HBM partition endpoint at the leaf of
-the cube NOC. One instance is created per PE under the topology node
-`sip{S}.cube{C}.hbm_ctrl.pe{idx}` and attaches to that PE's router
-(ADR-0017 D4). The component models per-pseudo-channel (PC) scheduling,
-burst-granular commit timing, address-based PC selection, and response
-routing back to the requester.
+`HbmCtrlComponent`는 큐브 NOC의 말단(leaf)에 위치하는 PE별 HBM
+파티션 엔드포인트이다. 토폴로지 노드
+`sip{S}.cube{C}.hbm_ctrl.pe{idx}` 아래에 PE마다 하나의 인스턴스가
+생성되며 해당 PE의 라우터에 연결된다 (ADR-0017 D4). 본 컴포넌트는
+의사 채널(PC, pseudo-channel)별 스케줄링, 버스트 단위 커밋 타이밍,
+주소 기반 PC 선택, 그리고 응답을 요청자에게 되돌리는 라우팅을
+모델링한다.

-This ADR documents the component as currently implemented. ADR-0017 D4/D8
-defines *where* HBM CTRL attaches and *what* aggregate BW it must
-deliver. ADR-0033 D1/D2 defines *what fidelity* of HBM modelling is in
-scope. This ADR fills the gap between those two — the per-instance
-internal scheduling model.
+본 ADR은 현재 구현된 컴포넌트를 문서화한다. ADR-0017 D4/D8은 HBM CTRL이
+*어디에* 부착되는지와 *어떤* 집계 대역폭을 제공해야 하는지를 정의한다.
+ADR-0033 D1/D2는 HBM 모델링의 *어떤 정밀도(fidelity)*가 범위에 포함되는지를
+정의한다. 본 ADR은 그 둘 사이의 공백 — 인스턴스별 내부 스케줄링 모델을
+채운다.

 ## Decision

-### D1. Role
+### D1. 역할

-`HbmCtrlComponent` is a per-PE HBM partition endpoint. One instance per
-PE (default 8 per cube, set by `cube.memory_map.hbm_slices_per_cube`)
-attaches to that PE's router via the `peX.hbm` attachment list in
-`cube_mesh.yaml` (ADR-0017 D4). In the default n:1 channel mapping
-(ADR-0017 D8) the instance aggregates `channels_per_pe` pseudo-channels
-into one endpoint.
+`HbmCtrlComponent`는 PE별 HBM 파티션 엔드포인트이다. PE당 하나의
+인스턴스(큐브당 기본 8개, `cube.memory_map.hbm_slices_per_cube`로 설정)가
+`cube_mesh.yaml`의 `peX.hbm` 부착 목록을 통해 해당 PE의 라우터에 연결된다
+(ADR-0017 D4). 기본 n:1 채널 매핑(ADR-0017 D8)에서는 인스턴스가
+`channels_per_pe`개의 의사 채널을 하나의 엔드포인트로 집계한다.

-The component models:
+본 컴포넌트는 다음을 모델링한다:

- Per-PC scheduling (D2) with R/W command-bus sharing.
- Address-based PC selection (D3).
- Burst-granular commit timing (D4).
- Flit-aware per-flit PC commit and async finalize (D5, D6).
- Command-only Transaction handling for read-data drain (D7).
- Response routing back to the requester (D8).
+- PC별 스케줄링(D2) 및 R/W 명령 버스 공유.
+- 주소 기반 PC 선택(D3).
+- 버스트 단위 커밋 타이밍(D4).
+- Flit 인지 per-flit PC 커밋 및 비동기 finalize(D5, D6).
+- 읽기 데이터 드레인(drain)을 위한 명령 전용 Transaction 처리(D7).
+- 요청자에게 되돌리는 응답 라우팅(D8).

-It does not model:
+다음은 모델링하지 않는다:

- Bank-level row-buffer conflicts, refresh, ECC, thermal throttling
+- Bank 수준의 row-buffer 충돌, refresh, ECC, 열 스로틀링
  (ADR-0033 D3).
- Cross-PE HBM contention beyond its own router edge (handled by the
-  router mesh — ADR-0017 D3).
- 1:1 channel mode (ADR-0017 D8 future work).
+- 자신의 라우터 엣지를 넘어가는 PE 간 HBM 경합(라우터 메시가 처리 —
+  ADR-0017 D3).
+- 1:1 채널 모드(ADR-0017 D8 향후 작업).

-### D2. Per-PC scheduling model
+### D2. PC별 스케줄링 모델

-Per-instance state initialised in `start()`:
+`start()`에서 초기화되는 인스턴스별 상태:

- `_pc_avail: list[float]` — earliest sim-time each PC is free; length
-  `num_pcs`, initial 0.0.
- `_pc_last_dir: list["R"|"W"|None]` — direction of the last commit on
-  each PC, used for switch-penalty detection (D4); initial `None`.
+- `_pc_avail: list[float]` — 각 PC가 다음에 자유로워지는 가장 빠른
+  시뮬레이션 시각; 길이 `num_pcs`, 초기값 0.0.
+- `_pc_last_dir: list["R"|"W"|None]` — 각 PC의 마지막 커밋 방향, 스위치
+  페널티 감지에 사용(D4); 초기값 `None`.

-`num_pcs` and `burst_bytes` must each be a positive power of two so
-that address-based PC selection (D3) reduces to a shift-and-mask.
+`num_pcs`와 `burst_bytes`는 각각 양의 2의 거듭제곱이어야 주소 기반 PC
+선택(D3)이 시프트와 마스크로 축약된다.

-Read and write requests share the same `_pc_avail` slot per PC — the
-real HW per-PC command bus is shared between read and write traffic, so
-issuing a write to PC k blocks a subsequent read to PC k by exactly the
-burst time.
+읽기와 쓰기 요청은 PC별로 동일한 `_pc_avail` 슬롯을 공유한다 — 실제 HW에서
+PC별 명령 버스는 읽기와 쓰기 트래픽이 공유하므로, PC k에 쓰기를 발행하면
+PC k에 대한 후속 읽기가 정확히 버스트 시간만큼 블록된다.

-Direction `dir` for a request is inferred from the request type:
+요청의 방향 `dir`은 요청 타입으로부터 추론된다:

 - `MemoryWriteMsg` → `"W"`.
- `PeDmaMsg` with `is_write=True` → `"W"`.
- All others (`MemoryReadMsg`, `PeDmaMsg` read) → `"R"`.
+- `is_write=True`인 `PeDmaMsg` → `"W"`.
+- 그 외 전부(`MemoryReadMsg`, 읽기 `PeDmaMsg`) → `"R"`.

-### D3. Address-based PC selection
+### D3. 주소 기반 PC 선택

-PC index for an access is derived from the access address by shift and
-mask:
+접근에 대한 PC 인덱스는 접근 주소로부터 시프트와 마스크로 도출된다:

 ```text
-pc_shift = log2(burst_bytes)         # default 8  (burst=256B)
-pc_mask  = num_pcs - 1               # default 7  (8 PCs)
+pc_shift = log2(burst_bytes)         # 기본값 8  (burst=256B)
+pc_mask  = num_pcs - 1               # 기본값 7  (8 PCs)
 pc       = (address >> pc_shift) & pc_mask
 ```

-Computed once in `start()` from topology config so alternative
-`(burst_bytes, num_pcs)` pairs stay consistent. For the canonical
-default `(256, 8)` this places the PC select field at bits `[10:8]` of
-the HBM byte offset: bits `[7:0]` are within-burst (same PC), bits
-`[10:8]` are the 3-bit PC index, bits `[36:11]` are row/bank/column
-within the PC slice (see `phyaddr.py` comment).
+대안적인 `(burst_bytes, num_pcs)` 쌍과의 정합성을 유지하기 위해
+`start()`에서 토폴로지 설정으로부터 한 번 계산된다. 정규 기본값
+`(256, 8)`에서는 PC 선택 필드가 HBM 바이트 오프셋의 비트 `[10:8]`에
+배치된다: 비트 `[7:0]`은 버스트 내부(같은 PC), 비트 `[10:8]`은 3비트
+PC 인덱스, 비트 `[36:11]`은 PC 슬라이스 내부의 row/bank/column이다
+(`phyaddr.py` 주석 참조).

-Address-based striping — as opposed to address-blind global
-round-robin — preserves PC parallelism for offset-disjoint concurrent
-transfers: each transfer's bursts land deterministically on the PC set
-implied by its byte addresses, so multi-PE workloads accessing disjoint
-regions do not collide on a single PC.
+주소 기반 스트라이핑은 — 주소를 보지 않는 전역 라운드로빈과 달리 —
+오프셋이 분리된 동시 전송들에 대해 PC 병렬성을 보존한다: 각 전송의
+버스트는 자신의 바이트 주소가 함의하는 PC 집합 위에 결정론적으로
+떨어지므로, 분리된 영역에 접근하는 멀티 PE 워크로드가 단일 PC에서
+충돌하지 않는다.

-### D4. Burst granularity and PC commit timing
+### D4. 버스트 단위 시간 및 PC 커밋 타이밍

-A single PC commit takes:
+단일 PC 커밋에 걸리는 시간:

 ```text
 chunk_time = burst_bytes / pc_bw_gbs    # ns
 ```

- `burst_bytes` (default 256) is the burst granularity matching the
-  flit size (ADR-0033 D1).
- `pc_bw_gbs` is **builder-derived** from
-  `hbm_to_router_bw_gbs / num_pcs` (`topology/builder.py`), enforcing
-  the ADR-0017 D8 invariant that aggregate per-PE BW equals the
-  router-to-HBM link BW.
+- `burst_bytes`(기본 256)는 flit 크기와 일치하는 버스트 단위이다
+  (ADR-0033 D1).
+- `pc_bw_gbs`는 **빌더에서 도출**된다:
+  `hbm_to_router_bw_gbs / num_pcs` (`topology/builder.py`). 이는 PE당
+  집계 대역폭이 라우터-HBM 링크 대역폭과 같아야 한다는 ADR-0017 D8의
+  불변식을 강제한다.

-Per-PC commit scheduling for an arriving access on PC `pc` with
-direction `dir`:
+방향 `dir`로 PC `pc`에 도착한 접근에 대한 PC별 커밋 스케줄링:

 ```text
 switch_cost = switch_penalty_ns
@@ -121,33 +118,32 @@ pc_avail[pc]    = finish
 pc_last_dir[pc] = dir
 ```

-Default `switch_penalty_ns = 0` — Tier 0 assumption that an ideal HBM
-scheduler amortises R/W switching cost (ADR-0033 D2). Non-zero values
-model pessimistic per-alternation cost.
+기본 `switch_penalty_ns = 0` — 이상적인 HBM 스케줄러가 R/W 스위칭
+비용을 분할 상환한다는 Tier 0 가정(ADR-0033 D2). 0이 아닌 값은
+교차마다 발생하는 비관적 비용을 모델링한다.

-### D5. Flit-aware per-flit PC commit (primary path)
+### D5. Flit 인지 per-flit PC 커밋 (주 경로)

-`_handle_flit` is the primary worker path. For each arriving `Flit`:
+`_handle_flit`이 주 워커 경로이다. 각 도착 `Flit`에 대해:

-1. On the **first** flit of a transaction (`tid = id(txn)` not in
-   `_txn_state`):
-   - Apply `overhead_ns` once via `run(env, nbytes)` — header decode
-     model, first-flit overhead pattern (ADR-0033 D1).
-   - Initialise `_txn_state[tid] = {"last_finish": env.now}`.
-2. Compute `pc = _pc_for_address(flit.address)` (D3).
-3. Apply the per-PC schedule (D4) using the request direction (D2).
-4. Update `state["last_finish"] = max(state["last_finish"], finish)`.
-5. If `flit.is_last`: pop `_txn_state[tid]` and spawn `_finalize_txn`
-   (D6).
+1. 트랜잭션의 **첫 번째** flit인 경우(`tid = id(txn)`가 `_txn_state`에
+   없는 경우):
+   - `run(env, nbytes)`를 통해 `overhead_ns`를 한 번 적용 — 헤더 디코드
+     모델, first-flit overhead 패턴(ADR-0033 D1).
+   - `_txn_state[tid] = {"last_finish": env.now}`로 초기화.
+2. `pc = _pc_for_address(flit.address)`를 계산(D3).
+3. 요청 방향(D2)을 사용하여 PC별 스케줄(D4)을 적용.
+4. `state["last_finish"] = max(state["last_finish"], finish)`로 갱신.
+5. `flit.is_last`이면: `_txn_state[tid]`를 pop하고 `_finalize_txn`을
+   spawn(D6).

-Per-flit address-aware commit is the mechanism that lets concurrent
-multi-PE traffic to disjoint HBM offsets pipeline through distinct PCs
-in parallel.
+per-flit 주소 인지 커밋이 분리된 HBM 오프셋으로 향하는 동시 멀티 PE
+트래픽이 서로 다른 PC를 통해 병렬로 파이프라인되도록 하는 메커니즘이다.

-### D6. Async finalize per transaction
+### D6. 트랜잭션별 비동기 finalize

-When a transaction's last flit has been scheduled, finalisation runs in
-a separately-spawned process:
+트랜잭션의 마지막 flit이 스케줄링되고 나면, finalize는 별도로 spawn된
+프로세스에서 실행된다:

 ```python
 def _finalize_txn(env, txn, last_finish):
@@ -157,115 +153,111 @@ def _finalize_txn(env, txn, last_finish):
    yield from _send_response(env, txn)
 ```

-`_handle_flit` spawns this via `env.process(...)` and returns
-immediately, so the worker can pick up the next inbox message while the
-last PC commit drains.
+`_handle_flit`은 이를 `env.process(...)`로 spawn한 뒤 즉시 반환하므로,
+마지막 PC 커밋이 드레인되는 동안에도 워커는 다음 inbox 메시지를 집어들
+수 있다.

-Without this split — i.e. if the worker itself did
-`yield env.timeout(wait)` — concurrent single-flit transactions whose
-addresses hit distinct PCs would still serialise at `chunk_time` each
-inside the worker, hiding the PC parallelism that D3 and D5 are
-designed to expose.
+이 분리가 없다면 — 즉 워커 자신이 `yield env.timeout(wait)`를 한다면 —
+서로 다른 PC에 떨어지는 주소를 가진 동시 단일 flit 트랜잭션들도 결국
+워커 내부에서 각각 `chunk_time`만큼 직렬화되어, D3와 D5가 노출하려고
+설계한 PC 병렬성을 숨겨버린다.

-### D7. Non-flit fallback for command-only transactions
+### D7. 명령 전용 트랜잭션을 위한 non-flit 폴백

-`_handle_txn` runs when the inbox delivers a `Transaction` rather than a
-`Flit`. This is the path for command-only requests that the wire does
-not chunk into flits — most notably `MemoryReadMsg` whose command txn
-carries `nbytes=0` (data drain is modelled at HBM CTRL post-processing,
-not as inbound flits).
+`_handle_txn`은 inbox가 `Flit`이 아닌 `Transaction`을 전달할 때 실행된다.
+이는 와이어가 flit으로 분할하지 않는 명령 전용 요청에 대한 경로로 —
+대표적으로 명령 트랜잭션이 `nbytes=0`을 운반하는 `MemoryReadMsg`가
+해당한다(데이터 드레인은 HBM CTRL 후처리에서 모델링되며, 인바운드
+flit으로 모델링되지 않는다).

-Procedure:
+절차:

 1. `work_bytes = txn.nbytes if txn.nbytes > 0 else int(request.nbytes or 0)`
-   — for read commands, work is sized by the request.
-2. `n_chunks = ceil(work_bytes / burst_bytes)` if `work_bytes > 0` else
-   0.
-3. `chunk_interval = drain_ns / n_chunks` (when both > 0) — chunks are
-   scheduled over time at `drain/n_chunks` ns intervals to model the
-   bottleneck-link's data arrival rate (ADR-0033 D1 chunk-loop drain).
-4. Apply `run(env, txn.nbytes)` once for `overhead_ns`.
-5. For each chunk `i`, advance `chunk_interval` ns then apply the D4
-   schedule with `pc = _pc_for_address(base_address + i * burst_bytes)`.
-6. After scheduling all chunks, wait `last_finish - env.now` then call
-   `_send_response`.
+   — 읽기 명령의 경우 작업량은 요청으로 결정된다.
+2. `work_bytes > 0`이면 `n_chunks = ceil(work_bytes / burst_bytes)`,
+   아니면 0.
+3. 둘 다 > 0일 때 `chunk_interval = drain_ns / n_chunks` — 청크는
+   `drain/n_chunks` ns 간격으로 시간상에 스케줄링되어 병목 링크의 데이터
+   도착 속도를 모델링한다(ADR-0033 D1 청크 루프 드레인).
+4. `overhead_ns`를 위해 `run(env, txn.nbytes)`를 한 번 적용.
+5. 각 청크 `i`에 대해 `chunk_interval` ns만큼 진행한 뒤
+   `pc = _pc_for_address(base_address + i * burst_bytes)`로 D4 스케줄을
+   적용.
+6. 모든 청크 스케줄링 후 `last_finish - env.now`만큼 대기한 다음
+   `_send_response`를 호출.

-`_handle_txn` shares the same `_pc_avail` / `_pc_last_dir` state with
-`_handle_flit` — there is exactly one source of PC scheduling truth
-across both paths.
+`_handle_txn`은 `_handle_flit`과 동일한 `_pc_avail` / `_pc_last_dir`
+상태를 공유한다 — 두 경로에 걸쳐 PC 스케줄링의 단일 진실 원천이 정확히
+하나만 존재한다.

-### D8. Response routing
+### D8. 응답 라우팅

-`_send_response` dispatches on request type and path geometry:
+`_send_response`는 요청 타입과 경로 형상에 따라 디스패치한다:

-| Case | Trigger | Response |
+| 경우 | 트리거 | 응답 |
 | --- | --- | --- |
-| PE_DMA | `isinstance(txn.request, PeDmaMsg)` | New reverse-path Transaction (`is_response=True`, `nbytes=0`), same `done` |
-| Bypass — Memory Read | `"m_cpu" not in any(txn.path)` AND `MemoryReadMsg` | Reverse-path Transaction with `nbytes=request.nbytes` (data return) |
-| Bypass — Memory Write | `"m_cpu" not in any(txn.path)` AND not Memory Read | `txn.done.succeed()` (write completes locally) |
-| Default | otherwise | New `ResponseMsg(correlation_id, request_id, src_cube, src_pe, success=True)` on reverse path |
+| PE_DMA | `isinstance(txn.request, PeDmaMsg)` | 신규 역방향 경로 Transaction(`is_response=True`, `nbytes=0`), 동일한 `done` |
+| Bypass — Memory Read | `"m_cpu" not in any(txn.path)` AND `MemoryReadMsg` | `nbytes=request.nbytes`(데이터 반환)인 역방향 경로 Transaction |
+| Bypass — Memory Write | `"m_cpu" not in any(txn.path)` AND not Memory Read | `txn.done.succeed()` (쓰기는 로컬에서 완료) |
+| 기본 | 그 외 | 역방향 경로상의 신규 `ResponseMsg(correlation_id, request_id, src_cube, src_pe, success=True)` |

-The "bypass" classification matches the Memory R/W fabric path defined
-in ADR-0015 D4 (PCIE_EP → io_noc → ucie → cube router → hbm_ctrl,
-without M_CPU). The PE_DMA case is its own dedicated reverse-path to
-keep the inner-loop DMA fast (PE_DMA reads/writes do not synthesise a
-ResponseMsg envelope).
+"bypass" 분류는 ADR-0015 D4에서 정의된 Memory R/W 패브릭 경로(PCIE_EP →
+io_noc → ucie → 큐브 라우터 → hbm_ctrl, M_CPU 미경유)와 일치한다.
+PE_DMA 케이스는 내부 루프 DMA를 빠르게 유지하기 위한 전용 역방향 경로이다
+(PE_DMA 읽기/쓰기는 ResponseMsg 봉투를 합성하지 않는다).

-In all reverse-path cases, the response Transaction is put onto
-`out_ports[reverse_path[1]]` — the first hop back along the recorded
-forward path. If `reverse_path` has fewer than 2 entries (degenerate
-path), the original `txn.done` is signalled directly.
+모든 역방향 경로 케이스에서, 응답 Transaction은
+`out_ports[reverse_path[1]]` — 기록된 정방향 경로를 따라 되돌아가는 첫
+홉 — 에 put된다. `reverse_path`의 엔트리가 2개 미만이면(축퇴된 경로),
+원래의 `txn.done`이 직접 시그널된다.

-### D9. Configurable attributes
+### D9. 설정 가능한 속성

-| Attribute | Default | Source | Notes |
+| 속성 | 기본값 | 출처 | 비고 |
 | --- | --- | --- | --- |
-| `num_pcs` | 8 | topology cube `hbm_ctrl.attrs` | Must be power of 2 |
-| `pc_bw_gbs` | 32.0 | builder-derived: `hbm_to_router_bw_gbs / num_pcs` | Enforces ADR-0017 D8 invariant |
-| `burst_bytes` | 256 | topology attrs | Must be power of 2; equals `flit_bytes` (ADR-0033 D1) |
-| `switch_penalty_ns` | 0.0 | topology attrs | Tier 0 default; non-zero models pessimistic R/W switching |
-| `efficiency` | 1.0 | topology attrs | Applied at builder time to `hbm_to_router_bw_gbs` (router-edge BW scaling only) |
-| `overhead_ns` | 0.0 | topology attrs | First-flit decode overhead (D5) |
+| `num_pcs` | 8 | 토폴로지 큐브 `hbm_ctrl.attrs` | 2의 거듭제곱이어야 함 |
+| `pc_bw_gbs` | 32.0 | 빌더 도출: `hbm_to_router_bw_gbs / num_pcs` | ADR-0017 D8 불변식 강제 |
+| `burst_bytes` | 256 | 토폴로지 attrs | 2의 거듭제곱이어야 함; `flit_bytes`와 동일(ADR-0033 D1) |
+| `switch_penalty_ns` | 0.0 | 토폴로지 attrs | Tier 0 기본값; 0이 아니면 비관적 R/W 스위칭 모델링 |
+| `efficiency` | 1.0 | 토폴로지 attrs | 빌더 시점에 `hbm_to_router_bw_gbs`에 적용(라우터 엣지 BW 스케일링만) |
+| `overhead_ns` | 0.0 | 토폴로지 attrs | First-flit 디코드 오버헤드(D5) |

-`pc_bw_gbs` is derived by `topology/builder.py` rather than configured
-directly so the aggregate per-PE BW matches the router-to-HBM link BW
-without yaml-side duplication.
+`pc_bw_gbs`는 yaml 측 중복 없이 PE당 집계 대역폭을 라우터-HBM 링크
+대역폭과 일치시키기 위해 직접 설정되지 않고 `topology/builder.py`에서
+도출된다.

 ## Consequences

 ### Positive

- Address-based PC selection preserves multi-stream HBM parallelism
-  that an address-blind round-robin would collapse — important for
-  multi-PE workloads with disjoint HBM regions.
- Flit-aware path (D5) + async finalize (D6) preserves wormhole
-  pipelining and exposes PC parallelism for back-to-back single-flit
-  transactions.
- Single source of PC scheduling truth (D4 mechanism, used by both D5
-  flit path and D7 chunk-loop path).
- Builder-derived `pc_bw_gbs` enforces ADR-0017 D8 in code, not yaml
-  discipline.
+- 주소 기반 PC 선택은 주소를 보지 않는 라운드로빈이 무너뜨릴 멀티 스트림
+  HBM 병렬성을 보존한다 — 분리된 HBM 영역을 갖는 멀티 PE 워크로드에서
+  중요하다.
+- Flit 인지 경로(D5) + 비동기 finalize(D6)는 웜홀 파이프라이닝을
+  보존하며, 연속적인 단일 flit 트랜잭션에 대해 PC 병렬성을 노출한다.
+- PC 스케줄링의 단일 진실 원천(D4 메커니즘이 D5 flit 경로와 D7 청크 루프
+  경로 모두에서 사용됨).
+- 빌더 도출 `pc_bw_gbs`가 yaml 규율이 아닌 코드에서 ADR-0017 D8을
+  강제한다.

 ### Negative

- No bank-level conflict modelling within a PC; address-blind to
-  bank/row-buffer reuse (ADR-0033 D3).
- No HBM scheduler (FR-FCFS / write-buffer / watermark drain); fixed
-  FIFO per PC. Bursty mixed R/W is approximated by `switch_penalty_ns`
+- PC 내부의 bank 수준 충돌 모델링이 없음; bank/row-buffer 재사용에
+  주소-무관(ADR-0033 D3).
+- HBM 스케줄러 없음(FR-FCFS / write-buffer / watermark drain); PC당 고정
+  FIFO. 버스티한 혼합 R/W는 `switch_penalty_ns`로 근사화된다
  (ADR-0033 D2).
- `_txn_state` is a regular dict keyed by `id(txn)`; in-flight state
-  accumulates per concurrent transaction and is removed only on
-  `is_last`. Adequate for current workloads.
+- `_txn_state`는 `id(txn)`로 키를 잡는 일반 dict이다; 동시 트랜잭션마다
+  in-flight 상태가 누적되며 `is_last` 시에만 제거된다. 현재 워크로드에는
+  충분하다.

 ## Links

- ADR-0001 (Physical address layout — PC bit field comment)
- ADR-0015 D4 (Memory R/W fabric path — bypass response case)
- ADR-0017 D4 (Per-PE HBM partitioning — attachment to PE routers)
- ADR-0017 D8 (HBM channel mapping mode — n:1 aggregate this ADR
-  implements)
- ADR-0017 D9 (AddressResolver — `hbm_ctrl.pe{pe_id}` endpoint
-  resolution)
- ADR-0033 D1 (Modelled precisely — per-PC parallelism, switch penalty,
-  flit-aware PC commit, first-flit overhead, chunk-loop drain)
- ADR-0033 D2 (Switch-penalty default 0 — ideal scheduler amortisation)
+- ADR-0001 (물리 주소 레이아웃 — PC 비트 필드 주석)
+- ADR-0015 D4 (Memory R/W 패브릭 경로 — bypass 응답 케이스)
+- ADR-0017 D4 (PE별 HBM 파티셔닝 — PE 라우터로의 부착)
+- ADR-0017 D8 (HBM 채널 매핑 모드 — 본 ADR이 구현하는 n:1 집계)
+- ADR-0017 D9 (AddressResolver — `hbm_ctrl.pe{pe_id}` 엔드포인트 해석)
+- ADR-0033 D1 (정확한 모델링 — PC별 병렬성, 스위치 페널티, flit 인지
+  PC 커밋, first-flit 오버헤드, 청크 루프 드레인)
+- ADR-0033 D2 (스위치 페널티 기본값 0 — 이상적 스케줄러의 분할 상환)