Files
kernbench2/docs/adr-ko/ADR-0003-dev-target-system-hierarchy.md
T
ywkang a796c1d2f7 ADR: bilingual structure — EN canonical in adr/, KO mirror in adr-ko/
Establish English as the canonical ADR language with Korean translations
held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror).
Promotion from adr-proposed/ to adr/ now writes English to adr/ and the
Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md.

- Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English,
  2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix
  dropped). ADR-0023 EN regenerated against KO source which had newer
  HW Realization Notes (D16-D23) section.
- docs/adr-history/ left frozen by design (transitional state).
- CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark
  docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline
  section covering bidirectional sync, conflict resolution (EN wins),
  and proposed-language freedom.
- tools/verify_adr_lang_pairs.py: new verification tool checking pair
  completeness, filename mirroring, ADR-ID match, Status byte-equality.
  Pre-commit hook intentionally not added; run on demand or in CI.
- tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF
  normalization, em-dash title separator, underscore-slug edge case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:38:44 -07:00

69 lines
2.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0003: Target System Hierarchy & Modeling Scope
## Status
Accepted
## Context
We need a system-level simulator to evaluate LLM kernel performance on our AI Accelerator platform.
The platform is organized as a compute tray containing multiple identical SIPs connected via PCIe or UAL
through switching fabrics, with a host CPU issuing commands/kernels.
## Decision
We model the system hierarchy explicitly:
### D1. Tray-level
- A compute tray contains:
- Host CPU (issues requests / coordinates runtime & data placement)
- Multiple identical SIPs (accelerators)
- Interconnect fabric between SIPs (PCIe and/or UAL via switches)
### D2. SIP-level
- A SIP is a multi-die package composed of:
- Multiple CUBEs (HBM die + compute PEs + UCIe)
- One or more IO chiplets (host/SIP interfaces)
- IO chiplets:
- provide interfaces: PCIe-EP, IO_CPU, optionally UAL-EP
- can be multiple per SIP
- placement constrained to SIP shoreline (top/bottom/left/right); each shoreline may host 12 IO chiplets
### D3. CUBE-level
- A CUBE contains:
- HBM + memory controller (HBM_CTRL)
- NOC (on-die fabric): carries all intra-cube traffic including HBM data,
inter-cube (UCIe), command (M_CPU↔PE_CPU), and shared SRAM access.
Must provide: full-BW PE↔local HBM path, PE↔SRAM connectivity,
PE↔UCIe connectivity, M_CPU↔PE command path.
NOC topology is an implementation choice (e.g., 2D mesh, ring, crossbar);
current implementation uses a 2D mesh with XY routing (see ADR-0017).
HBM_CTRL is attached to each PE's local NOC port (local HBM = minimal hop).
- Shared SRAM: cube-level shared memory accessible by all PEs via NOC
- management/control CPU (M_CPU) coordinating PE command distribution and completion aggregation
- multiple PEs
- up to 4 UCIe endpoints (N/E/W/S) for CUBE↔CUBE and CUBE↔IO connectivity
### D4. PE-level
- A PE can execute one kernel instance
- PE contains internal control + accelerators (modeled at PE view granularity):
- PE_CPU, command handler, PE_TCM, DMA/GEMM/MATH engines, internal queues
## Consequences
- The simulator supports abstraction by “views”:
- SIP view hides PE internals
- CUBE view treats each PE as a single block
- PE view expands PE internals
- Topology remains parameterized; sizes/counts/links come from configuration.
## Links
- SPEC R3/R5
- ADR-0005 (diagram views)
- ADR-0017 (cube NOC 2D mesh architecture)