1f36baa898
Fill component-model coverage gaps surfaced by /report's G4 analysis. Each ADR documents the component's First action, latency model, and honest notes on dormant code or implementation asymmetries discovered during re-evaluation against current code. - 0038 pcie_ep: thin protocol-overhead model; ComponentBase forwarding worker as-is; named-node contract for router helpers - 0039 pe_mmu: component + utility dual role; sub-page region stopgap; D2.1 flags pipeline path missing mmu.overhead_ns timeout (asymmetric with non-pipeline; not visible at default tlb_overhead_ns=0) - 0040 pe_tcm: dual-channel BW serialization (read/write Resource cap=1); TcmRequest schema owned by TCM; timing-only (no data store) - 0041 sram: terminal scratchpad model + ResponseMsg on reverse path; D1.1 flags _worker override as currently dormant (no Transaction actually targets the SRAM node today) - 0042 tiling: pure plan-generator module, not a component; corrects the G4 misclassification; pins GEMM/Math stage sequences and epilogue scope contract Also: /report skill G3 refinement — only flag older->newer asymmetric cross-references; newer->older (e.g., 0034-0037 citing infrastructure ADRs) are expected one-way and no longer reported. Bilingual pair verifier (tools/verify_adr_lang_pairs.py) passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
200 lines
8.0 KiB
Markdown
200 lines
8.0 KiB
Markdown
# ADR-0042: Tile Plan Generators — GEMM/Math Pipeline Plan Builders
|
|
|
|
## Status
|
|
|
|
Accepted (2026-05-20).
|
|
|
|
This ADR pins down `tiling.py` as a **plan-generator
|
|
module**, not a SimPy component.
|
|
|
|
ADR-0014 (PE Pipeline Execution Model) D6 (tile plan / self-routing) does not
|
|
specify the tile-plan generation algorithm itself; this ADR fills that gap.
|
|
|
|
## First action
|
|
|
|
When `generate_gemm_plan(M, K, N, tile_m, tile_k, tile_n, ..., pe_prefix,
|
|
a_pinned, b_pinned, epilogue_specs)` is called, the very first action is
|
|
**computing tile counts and constructing the PE-component ID strings**:
|
|
|
|
```
|
|
M_tiles = max(1, ceil(M / tile_m))
|
|
K_tiles = max(1, ceil(K / tile_k))
|
|
N_tiles = max(1, ceil(N / tile_n))
|
|
dma_id = f"{pe_prefix}.pe_dma"
|
|
fetch_id = f"{pe_prefix}.pe_fetch_store"
|
|
gemm_id = f"{pe_prefix}.pe_gemm"
|
|
math_id = f"{pe_prefix}.pe_math"
|
|
```
|
|
|
|
In short, **the plan generator's first act is "compute ceiling tile counts
|
|
and assemble the four sub-component IDs for this PE once"**. No SimPy event
|
|
or environment is touched — this module is a pure function.
|
|
|
|
`generate_math_plan(M, N, tile_m, tile_n, ..., math_op, src_addr, dst_addr,
|
|
pe_prefix)` likewise begins by computing `M_tiles`, `N_tiles` and assembling
|
|
three component IDs (`dma_id`, `fetch_id`, `math_id`).
|
|
|
|
## Context
|
|
|
|
ADR-0014 D6 agreed that "PE_SCHEDULER, on receiving a CompositeCmd, generates
|
|
a TilePlan and feeds self-routing tile tokens". But the **concrete plan
|
|
generation algorithm** lives in `src/kernbench/components/builtin/tiling.py`,
|
|
which:
|
|
|
|
- Defines no component — it is a pair of **pure functions**
|
|
(`generate_gemm_plan`, `generate_math_plan`).
|
|
- Does not depend on the SimPy environment, queues, op_log, or hooks.
|
|
- Returns a `PipelinePlan` (dataclass).
|
|
|
|
The original G4 analysis incorrectly described `tiling.py` as a component;
|
|
it is in fact a plan-builder helper consumed by PE_SCHEDULER. Pinning this
|
|
down in its own ADR (paired with ADR-0014 D6) prevents:
|
|
|
|
- Ambiguity over whether plan generation belongs to PE_SCHEDULER or a
|
|
separate module.
|
|
- Inconsistent rationale for stage sequences (e.g., FETCH/STORE position)
|
|
between GEMM and Math plans.
|
|
- Undocumented branching rationale for `a_pinned` / `b_pinned` /
|
|
`epilogue_specs`.
|
|
|
|
## Decision
|
|
|
|
### D1. `tiling` is a pure plan-generator module, not a component
|
|
|
|
`components/builtin/tiling.py` defines no `ComponentBase` subclass. It exports
|
|
two module-level functions:
|
|
|
|
- `generate_gemm_plan(...) -> PipelinePlan`
|
|
- `generate_math_plan(...) -> PipelinePlan`
|
|
|
|
There is no `tiling` node in the topology graph. It lives in `builtin/`
|
|
because it is a direct helper for PE_SCHEDULER (ADR-0014 D6) and is
|
|
conceptually a PE_SCHEDULER internal utility.
|
|
|
|
### D2. GEMM plan stage sequence — `M → N → K` order
|
|
|
|
For each `(m, n, k)` tile (default — no operand pinning, no epilogue):
|
|
|
|
```
|
|
[DMA_READ(A)] → [DMA_READ(B)] → FETCH → GEMM
|
|
↑
|
|
↓
|
|
(last k tile only) [MATH(output_tile)]* → STORE → DMA_WRITE
|
|
```
|
|
|
|
`k_tile` epilogue inserts a MATH stage immediately after GEMM on every
|
|
K-tile; `output_tile` epilogue inserts MATH stages once per `(m, n)` after
|
|
the final K-tile but before STORE/DMA_WRITE. The K-loop accumulator stays
|
|
in the register file across K-tiles — STORE/DMA_WRITE happens only when
|
|
`last_k`.
|
|
|
|
### D3. Operand pinning — `a_pinned` / `b_pinned`
|
|
|
|
If a caller passes `a_pinned=True`, **the A DMA_READ is omitted from every
|
|
(m, n, k) tile**. Semantically: the caller (e.g., `tl.composite`) has already
|
|
staged all of A in TCM via a prior `tl.load`, and signals so to the plan
|
|
generator.
|
|
|
|
The branch is made at plan time (not at runtime). Therefore the stage record
|
|
count in op_log changes deterministically with pinning, and sweep analyses
|
|
(e.g., gemm_sweep's stage record count) see this decision directly.
|
|
|
|
### D4. Epilogue scope — `k_tile` vs `output_tile`
|
|
|
|
`epilogue_specs` is an iterable of op-spec objects. Each op object is
|
|
expected to have:
|
|
|
|
- `op.kind: str` — math op name (e.g., `"dequant"`, `"bias"`, `"relu"`,
|
|
`"scale"`). Placed into the stage's `params["op_kind"]`.
|
|
- `op.scope: Scope` — `Scope.K_TILE` or `Scope.OUTPUT_TILE` (`Scope` enum
|
|
in `kernbench.common.pe_commands`).
|
|
- Op-specific extras (e.g., `bias`, `scale`, `factor`) — currently not used
|
|
by the plan generator; consumed at runtime by PE_MATH.
|
|
|
|
The plan generator partitions by `getattr(o, "scope", None)`:
|
|
|
|
- `scope == Scope.K_TILE`: adds a MATH stage right after GEMM on every K-tile.
|
|
- `scope == Scope.OUTPUT_TILE`: adds MATH stages just before STORE on the
|
|
last K-tile per `(m, n)`.
|
|
|
|
Ops with neither `scope` value (e.g., missing attribute) are **dropped
|
|
silently** — `getattr(..., None) == Scope.X` is False for both. Picking a
|
|
default (`output_tile`) is the **caller's responsibility** (e.g.,
|
|
`tl.composite`), not the plan generator's. This aligns with ADR-0014's
|
|
composite epilogue contract.
|
|
|
|
`Scope` is imported lazily inside the function to avoid the circular path
|
|
`pe_commands ← pe_types ← tiling`. This is intentional and not a refactor
|
|
target — keeping `tiling` free of compile-time `pe_commands` dependencies
|
|
preserves the module boundary (D1).
|
|
|
|
### D5. Math plan stage sequence — `M → N` order
|
|
|
|
For each `(m, n)` tile:
|
|
|
|
```
|
|
DMA_READ → FETCH → MATH → STORE → DMA_WRITE
|
|
```
|
|
|
|
There is no K dimension, so concepts like epilogue or accumulator residency
|
|
do not apply. PE_FETCH_STORE's register-file accounting follows the same
|
|
pattern as the GEMM plan.
|
|
|
|
### D6. Plans are data — no SimPy dependency
|
|
|
|
`PipelinePlan` is a dataclass in `pe_types.py` holding `tiles:
|
|
list[TilePlan]`. Each `TilePlan` holds `stages: tuple[Stage, ...]`. The plan
|
|
itself is near-immutable (only `Stage.params: dict` is mutable) and holds no
|
|
SimPy objects.
|
|
|
|
At runtime, PE_SCHEDULER consumes the plan's first stage, builds a `TileToken`,
|
|
and feeds it into the pipeline. The TileToken carries `plan: TilePlan`,
|
|
`stage_idx: int`, and a cached `params: dict`. Self-routing proceeds by
|
|
`TileToken.advance()` caching the next stage's `params` (ADR-0014 D6).
|
|
|
|
### D7. Plan generator contract — pure, deterministic, idempotent
|
|
|
|
Two calls with identical inputs return identical `PipelinePlan` instances
|
|
(including `TilePlan.stages` order). This contract aligns with ADR-0014 D6's
|
|
"deterministic tile dispatch order".
|
|
|
|
No side effects (no SimPy events, no file I/O, no global state) — tests can
|
|
call the generators directly without an environment object (some cases in
|
|
`tests/test_pe_pipeline.py` rely on this).
|
|
|
|
## Alternatives Considered
|
|
|
|
### A1. Make tiling a component (e.g., PE_PLANNER)
|
|
|
|
Rejected. Plan generation consumes no SimPy time — it is a pure decision
|
|
algorithm. Making it a component would (a) add unnecessary infrastructure
|
|
(inbox, resources), and (b) split PE_SCHEDULER's flow into "receive plan"
|
|
plus "feed tiles", inserting a meaningless hop.
|
|
|
|
### A2. Move plan generation into PE_SCHEDULER as methods
|
|
|
|
Rejected (currently). Module separation provides (1) testability and
|
|
(2) extensibility for additional plan algorithms (e.g., DTensor-aware) —
|
|
add a new function. If plan kinds proliferate enough to require explicit
|
|
dispatch, a future ADR can introduce a plan factory on PE_SCHEDULER.
|
|
|
|
### A3. Make plans fully immutable (frozen dataclass + tuple)
|
|
|
|
Partially adopted. `Stage` and `TilePlan` are dataclasses but not frozen,
|
|
because `Stage.params: dict` is populated at plan-generation time and read
|
|
at runtime (cached by TileToken on advance). Moving dict → frozendict pays
|
|
migration cost without enough benefit. Convention: do not mutate after
|
|
generation.
|
|
|
|
## Consequences
|
|
|
|
- `tiling.py` is documented as a plan-generator module, not a component —
|
|
preempting future G4-style "this component lacks an ADR" analyses.
|
|
- The GEMM plan's stage sequence (D2) and pinning / epilogue branching
|
|
(D3 / D4) are pinned, providing a clear interpretation basis for sweep
|
|
analyses (e.g., `scripts/gemm_sweep.py`'s stage record counts).
|
|
- The plan generator's pure contract (D7) enables environment-free testing
|
|
in line with ADR-0013 (verification strategy).
|
|
- Future plan kinds (DTensor-aware, K-major, ...) follow D1 / D6 / D7 as a
|
|
baseline — just add a new function.
|