Files

T

ywkang 168b0c89f0 ADR: translate adr-ko/ to Korean, fix ADR-0013 slug, refine Status check

Follow-up to the bilingual-structure commit: docs/adr-ko/ now holds
only Korean versions (24 files translated from English placeholders),
ADR-0013 slug uses kebab-case in both folders, and the verify tool
allows translated parenthetical commentary in the Status block.

- Translate 24 English files in docs/adr-ko/ to Korean. The previous
  bilingual-structure commit had left these as English copies because
  their source content was already English; this commit fulfills the
  policy that docs/adr-ko/ contains only Korean.
- Rename ADR-0013 in both adr/ and adr-ko/ from
  ver-verification_strategy.md to ver-verification-strategy.md
  (kebab-case consistency with other ADRs).
- CLAUDE.md (ADR Translation Discipline): clarify that only the
  Status lifecycle keyword (Accepted / Proposed / Stub / Draft /
  Superseded by ADR-NNNN / Merged into ADR-NNNN) must match across
  EN and KO; parenthetical commentary and trailing list items may be
  translated.
- tools/verify_adr_lang_pairs.py: replace byte-equal Status check
  with normalize_status_keyword() which strips parenthetical
  commentary and takes only the first non-empty line.
- tests/test_verify_adr_lang_pairs.py: update existing test names,
  add coverage for translated parenthetical, translated trailing
  list, and Superseded-by-NNNN keyword equality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 08:17:56 -07:00

6.9 KiB

Raw Blame History

ADR-0015: 컴포넌트 포트/와이어 모델과 패브릭 라우팅

Status

Accepted

Context

현실적인 하드웨어 모델링 — 큐, 경합, fan-out — 을 위해서는 컴포넌트가 패브릭 순회를 소유하고, 시뮬레이션 엔진은 초기화와 완료 관측만 처리해야 한다. 컴포넌트 간의 직접 메서드 호출이나 엔진 내부의 경로 탐색은 큐잉과 경합 시맨틱을 무력화한다.

본 ADR은 다음을 정의한다:

컴포넌트가 타입드 포트 큐를 통해 통신하는 방식,
전파 지연을 모델링하는 방식 (BW 점유를 포함한 와이어 프로세스),
Memory R/W (M_CPU 우회)와 Kernel Launch (M_CPU 경유)의 패브릭 경로,
엔진의 축소된 역할 (와이어 초기화 + 완료 관측만),
M_CPU의 내부 서브컴포넌트로서의 M_CPU.DMA.

Decision

D1. 컴포넌트 포트 모델

각 컴포넌트는 SimPy Store로 모델링된 타입드 입출력 포트를 갖는다:

in_ports:  dict[str, simpy.Store]   # keyed by source node_id
out_ports: dict[str, simpy.Store]   # keyed by destination node_id

포트는 그래프 엣지를 기반으로 엔진 초기화 시 생성된다. 각 유향 엣지(src → dst)는 다음을 생성한다:

src.out_ports[dst] — 송신측
dst.in_ports[src] — 수신측

D2. 와이어 프로세스 (전파 지연 + BW 점유)

토폴로지 그래프의 각 유향 엣지 (src, dst)에 대해 SimPy 와이어 프로세스가 전파 지연과 BW 점유를 모델링한다:

def wire_process(env, out_port, in_port, delay_ns, bw_gbs):
    available_at = 0.0
    while True:
        cmd = yield out_port.get()
        if bw_gbs > 0:
            nbytes = getattr(cmd, "nbytes", 0)
            if nbytes > 0:
                wait = available_at - env.now
                if wait > 0:
                    yield env.timeout(wait)
                available_at = env.now + (nbytes / bw_gbs)
        yield env.timeout(delay_ns)
        yield in_port.put(cmd)

와이어 프로세스는 엔진 초기화 시점에 시작된다. 각 유향 엣지는 링크가 다음 트랜잭션을 위해 비워지는 시점을 추적하는 available_at 타임스탬프를 유지한다. 한 트랜잭션이 링크를 점유하는 동안, 동일 유향 링크의 다음 트랜잭션은 점유가 해제될 때까지 대기해야 한다 (연속 직렬화). TX와 RX 방향은 독립적이다 (각각의 available_at 상태를 갖는 별개의 와이어 프로세스).

D3. 엔진 역할 (축소)

시뮬레이션 엔진은 다음을 수행해야 한다:

초기화 시점에 컴포넌트 와이어링 (포트 Store 생성, 와이어 프로세스 시작),
각 요청 타입별 진입 컴포넌트 식별 (PCIE_EP),
진입 컴포넌트의 in_port에 요청을 put,
완료 이벤트 대기.

시뮬레이션 엔진은 다음을 해서는 안 된다:

요청 실행 중 토폴로지 경로 탐색,
컴포넌트 run() 메서드 직접 호출,
hop별 레이턴시 추적이나 fan-out 분해.

D4. Memory R/W와 Kernel Launch의 패브릭 경로

Memory R/W와 Kernel Launch는 서로 다른 패브릭 경로를 사용한다. 메모리 연산은 M_CPU를 우회하여 크로스바를 통해 직접 HBM으로 라우팅된다. Kernel Launch는 PE fan-out을 위해 M_CPU를 경유한다.

Memory R/W forward 경로 (pcie_ep → hbm_ctrl, M_CPU 우회):

pcie_ep → io_noc → io_ucie
  → [transit cubes: ucie_in → noc → ucie_out]  (zero or more)
  → target cube: ucie_in → router mesh → hbm_ctrl

Memory R/W 완료 경로:

hbm_ctrl → router mesh → [transit cubes: ucie → router mesh → ucie]
  → io_ucie → io_noc → pcie_ep

Kernel Launch forward 경로 (pcie_ep → io_cpu → M_CPU → PE):

pcie_ep → io_noc → io_cpu → io_noc → io_ucie
  → [transit cubes: ucie_in → noc → ucie_out]  (zero or more)
  → target cube: ucie_in → noc → M_CPU → PE[0..n] (parallel fan-out)

Kernel Launch 완료 경로:

PE[0..n] all complete → M_CPU (aggregation)
  → noc → [transit cubes: ucie → noc → ucie]
  → io_ucie → io_noc → io_cpu → io_noc → pcie_ep

Memory R/W가 M_CPU를 우회하는 근거:

메모리 write/read 연산은 명령 해석이나 PE 디스패치가 필요하지 않다 — HBM으로의/로부터의 직접 데이터 전송이다. M_CPU를 경유하면 기능적 이득 없이 불필요한 오버헤드(5ns)를 추가한다. IO 칩렛 내부의 io_noc가 라우팅 결정을 처리한다: 메모리 연산은 큐브 패브릭으로 직접 가고, kernel launch는 io_cpu로 먼저 전달된다.

D5. M_CPU.DMA는 M_CPU의 내부 서브컴포넌트이다

M_CPU.DMA는 별개의 토폴로지 노드가 아니다. M_CPU 컴포넌트 구현이 소유하는 내부 서브컴포넌트이다.

M_CPU.DMA는:

DMA READ 및 DMA WRITE 큐를 소유한다 (각 capacity=1, ADR-0014 D4),
NoC를 통해 hbm_ctrl에 메모리 요청을 발행한다,
NoC를 통해 hbm_ctrl로부터 완료를 수신한다,
M_CPU에 완료를 보고한다,
M_CPU의 __init__과 run() 내부에서 생성·관리된다.

M_CPU.DMA는 컴파일된 토폴로지 그래프에서 노드로 나타나지 않는다.

D6. Transit 큐브 포워딩

메모리나 커널 요청의 대상이 아닌 큐브는 transit 노드로 동작한다. Transit 큐브는 요청을 소비하지 않고 포워딩한다:

ucie_in (from upstream) → noc → ucie_out (to downstream)

Transit 포워딩은 ucie_in 컴포넌트 내부에서 전적으로 구현된다. transit 큐브의 noc와 ucie_out 컴포넌트는 패킷을 수정 없이 포워딩한다.

D7. _formula_latency는 하한 교차 검증 용도로 유지된다

경로 기반 공식 레이턴시 함수(_formula_latency)는 정확성 검증을 위한 하한값으로 엔진 내에 유지된다.

불변식:

Phase 0: _formula_latency == component model total_ns
Phase 1+: _formula_latency <= component model total_ns (경합이 큐잉을 추가)

이 함수는 포트/와이어 모델과 독립적이며 토폴로지 그래프만 요구한다. _route_kernel의 샤드 비교와 회귀 가드로 사용된다.

Consequences

컴포넌트가 현실적인 하드웨어 동작(큐, 경합, fan-out)을 모델링한다.
전파 지연이 엣지마다 정확하게 모델링된다.
엔진이 라우팅 정책으로부터 분리된다.
컴포넌트 구현이 DI(ADR-0007 D3)를 통해 교체 가능하게 유지된다.

6.9 KiB Raw Blame History