ADR: introduce docs/history/, merge 0011+0018, prune migration cruft
- CLAUDE.md: add ADR Lifecycle subsection (superseded → docs/history/, immutable numbering, no renumber) - ADR-0011: merge ADR-0018 content as "Address Model: LA" section alongside PA / VA; status notes VA model is currently implemented - ADR-0018 / 0029 / 0031: moved to docs/history/ with status updates (0018 merged into 0011, 0029 superseded by 0032, 0031 absorbed into 0001 rev 2) - ADR-0019: rewrite Context as PE-HBM connectivity decision (self-contained, no LA model framing) - ADR-0019/0020/0021/0023/0025/0027: Status Proposed → Accepted (code verified) and prune Implementation Notes / Affected files / Test strategy / "현재 상태" sub-sections describing pre-impl state - ADR-0024/0026: same migration-flavor cleanup; 0026 also drops D6 Migration and D8 docs-update sub-decisions - ADR-0030: status simplified (blocker ADR-0031 now superseded) - SPEC.md: R10 + §0.2 reflect PA / VA / LA model names - ADR-0008/0012/0013: refresh ADR-0011 subtitle in Links 21 files changed, 553 insertions(+), 1290 deletions(-). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,30 +2,10 @@
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
### Problems with the Current Structure
|
||||
|
||||
pe_accel (SchedulerV2Component) hides 5 hardware blocks (DmaIn, DmaWb, Gemm, Math, Tcm)
|
||||
**inside a single component**.
|
||||
|
||||
```
|
||||
SchedulerV2Component (single topology node)
|
||||
├── DmaInBlock ← directly connected via internal SimPy Store
|
||||
├── DmaWbBlock ← not visible in topology
|
||||
├── GemmBlock ← not replaceable
|
||||
├── MathBlock ← not replaceable
|
||||
└── TcmBlock ← not replaceable
|
||||
```
|
||||
|
||||
Problems:
|
||||
- Blocks directly reference the next block via `desc.next_block` — hardcoded routing
|
||||
- Individual blocks cannot be replaced (violates ADR-0015 component replacement principle)
|
||||
- PE internal structure is not visible in the topology
|
||||
- GemmBlock and MathBlock each duplicate TCM load/store logic
|
||||
|
||||
### Actual Hardware Structure
|
||||
|
||||
```
|
||||
@@ -374,66 +354,6 @@ Topology edges encompass both **control/dispatch visibility + runtime chaining**
|
||||
Scheduler → sub-component edges are initial dispatch paths, while
|
||||
inter-component edges are runtime chaining paths driven by token self-routing.
|
||||
|
||||
### D8. Existing Code Migration — Builtin Integration
|
||||
|
||||
The existing builtin v1 components and pe_accel are **replaced with new builtin components**.
|
||||
|
||||
#### Migration Strategy
|
||||
|
||||
1. Back up existing `components/builtin/` → `components/builtin_legacy/` (preserved without modification)
|
||||
2. Back up existing `components/custom/pe_accel/` → likewise
|
||||
3. Re-implement new `components/builtin/` with the ADR-0021 architecture
|
||||
4. Maintain **only one** topology.yaml (including pe_fetch_store)
|
||||
5. components.yaml points to the new builtin
|
||||
|
||||
```yaml
|
||||
# components.yaml — new builtin
|
||||
pe_scheduler_v1: kernbench.components.builtin.pe_scheduler:PeSchedulerComponent
|
||||
pe_gemm_v1: kernbench.components.builtin.pe_gemm:PeGemmComponent
|
||||
pe_math_v1: kernbench.components.builtin.pe_math:PeMathComponent
|
||||
pe_dma_v1: kernbench.components.builtin.pe_dma:PeDmaComponent
|
||||
pe_fetch_store_v1: kernbench.components.builtin.pe_fetch_store:PeFetchStoreComponent
|
||||
pe_tcm_v1: kernbench.components.builtin.pe_tcm:PeTcmComponent
|
||||
```
|
||||
|
||||
The impl names (pe_gemm_v1, etc.) are preserved, but **the implementations are replaced
|
||||
with the ADR-0021 architecture**. Existing benchmarks and tests referencing topology.yaml
|
||||
continue to work without changes.
|
||||
|
||||
#### Latency Model Inheritance
|
||||
|
||||
The latency modeling of the new builtin components (MAC cycle calculation, SIMD latency,
|
||||
TCM BW serialization, DMA fabric latency, etc.) is **based on the current pe_accel
|
||||
implementation**. The tile schedule generation logic from tiling.py is also carried over.
|
||||
Only the architecture (component separation, self-routing) changes; timing accuracy
|
||||
is preserved.
|
||||
|
||||
#### Test Strategy
|
||||
|
||||
#### Test Plan
|
||||
|
||||
**1. Existing test pass** (regression):
|
||||
After migration is complete, all existing tests (366) must pass.
|
||||
|
||||
**2. Latency regression**:
|
||||
Verify that the new builtin produces identical latency for the same inputs as pe_accel.
|
||||
|
||||
**3. Phase 1 → Phase 2 end-to-end**:
|
||||
Integration test from SimPy simulation (Phase 1) op_log generation → DataExecutor
|
||||
(Phase 2) actual numpy computation → result correctness verification.
|
||||
- GEMM: tl.composite(gemm) → op_log → Phase 2 matmul → allclose verification
|
||||
- MATH: tl.exp / tl.add, etc. → op_log → Phase 2 numpy op → allclose verification
|
||||
- Chaining: GEMM output → MATH input → final result end-to-end verification
|
||||
|
||||
**4. TileToken self-routing**:
|
||||
- Verify that tiles chain according to the plan's stage sequence
|
||||
- Verify PipelineContext.complete_tile() exactly-once at the last stage
|
||||
- Queue backpressure: verify that only the feeder blocks when DMA queue capacity is exceeded
|
||||
|
||||
**5. Asynchronous pipeline overlap**:
|
||||
- Verify that inter-tile stage overlap occurs within the same command (tile0 in GEMM while tile1 in DMA)
|
||||
- Multiple commands: verify that cmd2 feed starts after cmd1 feed completes (FIFO order)
|
||||
|
||||
### D9. TileToken Message Definition
|
||||
|
||||
A message used for passing tile work between components.
|
||||
@@ -472,8 +392,6 @@ Relationship with existing PeInternalTxn:
|
||||
- **Resource contention model across multiple pipelines**: the current scope focuses on
|
||||
accurate modeling of a single pipeline. TCM bank conflicts across multiple pipelines
|
||||
are future work.
|
||||
- **builtin_legacy maintenance**: kept for backup purposes only; not a target for
|
||||
bug fixes or feature additions.
|
||||
|
||||
## Open Questions
|
||||
|
||||
@@ -511,27 +429,4 @@ Relationship with existing PeInternalTxn:
|
||||
|
||||
- Increased number of PE internal components (5 → 6) — more topology nodes/edges
|
||||
- Component separation makes intra-PE token forwarding more explicit than before
|
||||
- Breaking change from existing builtin/pe_accel — migration required
|
||||
|
||||
---
|
||||
|
||||
## Affected Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `topology.yaml` | Add pe_fetch_store component, add chaining edges |
|
||||
| `components.yaml` | Register new builtin components |
|
||||
| `src/kernbench/topology/builder.py` | Add fetch_store + chaining edges to PE internal edges |
|
||||
| `src/kernbench/common/pe_commands.py` | Add TileToken definition |
|
||||
| `src/kernbench/components/builtin/pe_scheduler.py` | Re-implement (feeder + plan-based dispatch) |
|
||||
| `src/kernbench/components/builtin/pe_gemm.py` | Re-implement (TileToken, _process pattern) |
|
||||
| `src/kernbench/components/builtin/pe_math.py` | Re-implement (TileToken, _process pattern) |
|
||||
| `src/kernbench/components/builtin/pe_dma.py` | Re-implement (TileToken, _process pattern) |
|
||||
| `src/kernbench/components/builtin/pe_fetch_store.py` | New |
|
||||
| `src/kernbench/components/builtin/pe_tcm.py` | Re-implement (TcmRequest service) |
|
||||
| `src/kernbench/components/builtin/types.py` | New: TilePlan, Stage, StageType, PipelineContext, TileToken |
|
||||
| `src/kernbench/components/builtin/tiling.py` | Ported from pe_accel: plan generation logic |
|
||||
|
||||
Backup:
|
||||
| `src/kernbench/components/builtin_legacy/` | Full backup of existing builtin (preserved without modification) |
|
||||
| `src/kernbench/components/custom/pe_accel/` | Backup of existing pe_accel (preserved without modification) |
|
||||
|
||||
Reference in New Issue
Block a user