ADR: introduce docs/history/, merge 0011+0018, prune migration cruft

- CLAUDE.md: add ADR Lifecycle subsection (superseded → docs/history/,
  immutable numbering, no renumber)
- ADR-0011: merge ADR-0018 content as "Address Model: LA" section
  alongside PA / VA; status notes VA model is currently implemented
- ADR-0018 / 0029 / 0031: moved to docs/history/ with status updates
  (0018 merged into 0011, 0029 superseded by 0032, 0031 absorbed
  into 0001 rev 2)
- ADR-0019: rewrite Context as PE-HBM connectivity decision
  (self-contained, no LA model framing)
- ADR-0019/0020/0021/0023/0025/0027: Status Proposed → Accepted
  (code verified) and prune Implementation Notes / Affected files /
  Test strategy / "현재 상태" sub-sections describing pre-impl state
- ADR-0024/0026: same migration-flavor cleanup; 0026 also drops D6
  Migration and D8 docs-update sub-decisions
- ADR-0030: status simplified (blocker ADR-0031 now superseded)
- SPEC.md: R10 + §0.2 reflect PA / VA / LA model names
- ADR-0008/0012/0013: refresh ADR-0011 subtitle in Links

21 files changed, 553 insertions(+), 1290 deletions(-).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-19 11:42:45 -07:00
parent ecc57d050d
commit 22fd0d2b9d
23 changed files with 553 additions and 1290 deletions
@@ -2,7 +2,7 @@
## Status
Proposed
Accepted
## Context
@@ -16,21 +16,6 @@ but do not actually read tensor data or perform computations.
2. PE_GEMM, PE_MATH must be able to perform actual matrix operations and verify results
3. Must minimize simulation performance degradation
### Limitations of the Existing Kernel Execution Structure
The current kernel execution is separated into 3 stages:
```
Phase 0: Kernel function execution in TLContext → PeCommand list generation (outside SimPy, no data)
Phase 1: PE_CPU replays PeCommand list via SimPy (timing only)
```
Phase 0 requires the kernel to **complete execution entirely** before SimPy begins.
`tl.load()` returns a TensorHandle (placeholder), so actual data cannot be accessed.
Therefore, branching based on data values (dynamic control flow) is impossible.
This ADR resolves this limitation **for memory operations only** (see D1, D3).
### Constraints
- SimPy is a single-thread event loop — running numpy matmul inside it blocks everything
@@ -532,22 +517,3 @@ Per-dtype tolerance policy:
(computations execute in Phase 2, result values are undetermined in Phase 1).
Memory-data-based branching is supported via greenlet.
- greenlet C extension dependency added (pip install greenlet)
---
## Affected Files
| File | Change |
|------|--------|
| `src/kernbench/components/base.py` | Add `_on_process_start/end` hooks |
| `src/kernbench/common/pe_commands.py` | Add `data_op = True`, extend metadata fields |
| `src/kernbench/sim_engine/op_log.py` | New: OpRecord, OpLogger |
| `src/kernbench/sim_engine/data_executor.py` | New: DataExecutor, MemoryStore |
| `src/kernbench/sim_engine/engine.py` | op_logger injection (optional) |
| `src/kernbench/triton_emu/tl_context.py` | greenlet switch calls inside `tl.load()` etc. |
| `src/kernbench/triton_emu/kernel_runner.py` | New: KernelRunner (greenlet ↔ SimPy bridge) |
| `src/kernbench/components/builtin/pe_cpu.py` | Remove Phase 0, change to KernelRunner invocation |
| `pyproject.toml` | Add greenlet dependency |
Component implementation files (pe_gemm.py, pe_dma.py, hbm_ctrl.py, etc.): **no changes**
Benchmark kernels (benches/*.py): **no user API changes**