ADR: introduce docs/history/, merge 0011+0018, prune migration cruft
- CLAUDE.md: add ADR Lifecycle subsection (superseded → docs/history/, immutable numbering, no renumber) - ADR-0011: merge ADR-0018 content as "Address Model: LA" section alongside PA / VA; status notes VA model is currently implemented - ADR-0018 / 0029 / 0031: moved to docs/history/ with status updates (0018 merged into 0011, 0029 superseded by 0032, 0031 absorbed into 0001 rev 2) - ADR-0019: rewrite Context as PE-HBM connectivity decision (self-contained, no LA model framing) - ADR-0019/0020/0021/0023/0025/0027: Status Proposed → Accepted (code verified) and prune Implementation Notes / Affected files / Test strategy / "현재 상태" sub-sections describing pre-impl state - ADR-0024/0026: same migration-flavor cleanup; 0026 also drops D6 Migration and D8 docs-update sub-decisions - ADR-0030: status simplified (blocker ADR-0031 now superseded) - SPEC.md: R10 + §0.2 reflect PA / VA / LA model names - ADR-0008/0012/0013: refresh ADR-0011 subtitle in Links 21 files changed, 553 insertions(+), 1290 deletions(-). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
@@ -16,21 +16,6 @@ but do not actually read tensor data or perform computations.
|
||||
2. PE_GEMM, PE_MATH must be able to perform actual matrix operations and verify results
|
||||
3. Must minimize simulation performance degradation
|
||||
|
||||
### Limitations of the Existing Kernel Execution Structure
|
||||
|
||||
The current kernel execution is separated into 3 stages:
|
||||
|
||||
```
|
||||
Phase 0: Kernel function execution in TLContext → PeCommand list generation (outside SimPy, no data)
|
||||
Phase 1: PE_CPU replays PeCommand list via SimPy (timing only)
|
||||
```
|
||||
|
||||
Phase 0 requires the kernel to **complete execution entirely** before SimPy begins.
|
||||
`tl.load()` returns a TensorHandle (placeholder), so actual data cannot be accessed.
|
||||
Therefore, branching based on data values (dynamic control flow) is impossible.
|
||||
|
||||
This ADR resolves this limitation **for memory operations only** (see D1, D3).
|
||||
|
||||
### Constraints
|
||||
|
||||
- SimPy is a single-thread event loop — running numpy matmul inside it blocks everything
|
||||
@@ -532,22 +517,3 @@ Per-dtype tolerance policy:
|
||||
(computations execute in Phase 2, result values are undetermined in Phase 1).
|
||||
Memory-data-based branching is supported via greenlet.
|
||||
- greenlet C extension dependency added (pip install greenlet)
|
||||
|
||||
---
|
||||
|
||||
## Affected Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `src/kernbench/components/base.py` | Add `_on_process_start/end` hooks |
|
||||
| `src/kernbench/common/pe_commands.py` | Add `data_op = True`, extend metadata fields |
|
||||
| `src/kernbench/sim_engine/op_log.py` | New: OpRecord, OpLogger |
|
||||
| `src/kernbench/sim_engine/data_executor.py` | New: DataExecutor, MemoryStore |
|
||||
| `src/kernbench/sim_engine/engine.py` | op_logger injection (optional) |
|
||||
| `src/kernbench/triton_emu/tl_context.py` | greenlet switch calls inside `tl.load()` etc. |
|
||||
| `src/kernbench/triton_emu/kernel_runner.py` | New: KernelRunner (greenlet ↔ SimPy bridge) |
|
||||
| `src/kernbench/components/builtin/pe_cpu.py` | Remove Phase 0, change to KernelRunner invocation |
|
||||
| `pyproject.toml` | Add greenlet dependency |
|
||||
|
||||
Component implementation files (pe_gemm.py, pe_dma.py, hbm_ctrl.py, etc.): **no changes**
|
||||
Benchmark kernels (benches/*.py): **no user API changes**
|
||||
|
||||
Reference in New Issue
Block a user