e33e76f2d1
Adds a section-based table of contents for the 46-ADR corpus, mirroring the /report skill's classification (Design Principles / High-level Architecture / Detailed Architecture by component / Implementation Decisions by topic). Generated for both docs/adr/ (EN titles) and docs/adr-ko/ (KO titles) from one tool. tools/generate_adr_index.py: - Single CLASSIFICATION dict per ADR — add an entry when introducing a new ADR; the script fails loud if any file is missing from the table. - DETAILED_COMPONENTS lists each builtin component and the ADR(s) that cover it (ADR-0014 appears under six PE engines; ADR-0023 under pe_dma + pe_ipcq). - Accepts both ":" and "—" title separators (matching ADR-0033's existing format). - --check mode for CI: exits 1 if INDEX.md is stale. Also includes the docs/report/architecture-2026-1H.md generated by the prior /report write (the public-facing architecture document; 836 lines, 76 source-attribution comments). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.9 KiB
6.9 KiB
ADR Index
Auto-generated by tools/generate_adr_index.py. Total ADRs: 46.
Classification mirrors the /report skill's section assignment. When adding a new ADR, also add an entry to the CLASSIFICATION table in tools/generate_adr_index.py.
Design Principles
- ADR-0013 — Verification Strategy and Phase 1 Test Plan
- ADR-0033 — Latency Model: Assumptions and Known Simplifications
High-level Architecture
- ADR-0003 — Target System Hierarchy & Modeling Scope (System hierarchy (Tray / SIP / CUBE / PE))
- ADR-0007 — Runtime API and Simulation Engine Boundaries (Runtime API ↔ sim_engine boundaries)
- ADR-0016 — IOChiplet NOC and Memory Data Path (IOChiplet NOC and memory data path)
- ADR-0017 — Cube NOC and HBM Connectivity (Cube NOC and HBM connectivity)
Detailed Architecture
One subsection per component file under src/kernbench/components/builtin/.
forwarding
- ADR-0037 — Forwarding Component (forwarding_v1)
hbm_ctrl
- ADR-0034 — HBM Controller Internal Design
io_cpu
- ADR-0036 — IO_CPU Component Model
m_cpu
- ADR-0035 — M_CPU and M_CPU.DMA Component Model
pcie_ep
- ADR-0038 — PCIE_EP Component Model
pe_cpu
- ADR-0014 — PE Pipeline Execution Model
pe_dma
pe_fetch_store
- ADR-0014 — PE Pipeline Execution Model
pe_gemm
- ADR-0014 — PE Pipeline Execution Model
pe_ipcq
- ADR-0023 — PE-level IPCQ — Inter-PE Collective Communication
pe_math
- ADR-0014 — PE Pipeline Execution Model
pe_mmu
- ADR-0039 — PE_MMU Component Model — Component + Utility Dual Role
pe_scheduler
- ADR-0014 — PE Pipeline Execution Model
pe_tcm
- ADR-0040 — PE_TCM Component Model — Dual-Channel BW Serialization
sram
- ADR-0041 — Cube SRAM Component Model — terminal scratchpad on cube NoC
tiling
- ADR-0042 — Tile Plan Generators — GEMM/Math Pipeline Plan Builders
Implementation Decisions
Address Scheme
- ADR-0001 — 51-bit Physical Address Layout & Decoding Contract
- ADR-0011 — Memory Addressing — PA / VA / LA Address Models
Routing & Helper API
- ADR-0002 — Routing Distance, Ordering & Bypass Rules
- ADR-0051 — Routing Helper API —
AddressResolver+PathRouter
Memory Semantics & Local-HBM Bandwidth
- ADR-0004 — Memory Semantics & Local-HBM Bandwidth Guarantee
Topology Compilation, Diagrams & Builder Algorithms
- ADR-0005 — Diagram Views & Distance-Aware Layout Rules
- ADR-0006 — Topology Compilation, Distance Extraction, and Automatic Diagram Generation
- ADR-0053 — Topology Builder + Visualizer Algorithms
Tensor Deployment and Allocation
- ADR-0008 — Tensor Deployment and Allocation (Host Allocator, PA-first)
Kernel Execution and Host-Device Messaging
- ADR-0009 — Kernel Execution Messaging and Completion Semantics
- ADR-0012 — Host ↔ IO_CPU Message Schema (PA-first, PE-tagged)
CLI Surface and Semantics
- ADR-0010 — Command Line Interface and Execution Semantics
Component Port/Wire Fabric Model
- ADR-0015 — Component Port/Wire Model and Fabric Routing
Two-Pass Data Execution
- ADR-0020 — 2-Pass Data Execution Model (Timing / Data Separation)
2D Grid Program Identity
- ADR-0022 — 2D Grid program_id Semantics
Parallelism (Launcher, DP, TP, AHBM backend, CCL algorithm)
- ADR-0024 — SIP-level Launcher — rank = SIP
- ADR-0026 — DPPolicy = Intra-Device Only — remove sip/num_sips fields
- ADR-0027 — Megatron-style Tensor Parallelism API
- ADR-0047 — AHBM CCL Backend —
torch.distributed-compat shim - ADR-0050 — CCL Algorithm Module Contract —
ccl/algorithms/*.py
IPCQ Direction Addressing
- ADR-0025 — IPCQ Direction Addressing — address-based matching
Intercube All-Reduce
- ADR-0032 — Intercube All-Reduce — pe0 cube-mesh reduce + multi-SIP exchange
Evaluation Harnesses
- ADR-0043 — Allreduce Evaluation Harness —
tests/sccl/ - ADR-0044 — GEMM Evaluation Harness —
scripts/gemm_sweep.py+tests/gemm/
Bench Module Contract
- ADR-0045 — Bench Module Contract — registration, dispatch, and authoring
Kernel-side tl.* API (TLContext)
- ADR-0046 — TLContext — Kernel-side
tl.*API Contract
Memory Allocator Algorithms
- ADR-0048 — Memory Allocator Algorithms — VirtualAllocator + PEMemAllocator
Probe Subcommand
- ADR-0049 —
kernbench probeSubcommand — Traffic-Pattern Verification Harness
Sim-engine Op Log and Memory Store Schemas
- ADR-0052 — OpLog + MemoryStore Schemas — sim_engine internals