# ADR-0042: Tile Plan Generators — GEMM/Math Pipeline Plan Builders ## Status Accepted (2026-05-20). This ADR pins down `tiling.py` as a **plan-generator module**, not a SimPy component. ADR-0014 (PE Pipeline Execution Model) D6 (tile plan / self-routing) does not specify the tile-plan generation algorithm itself; this ADR fills that gap. ## First action When `generate_gemm_plan(M, K, N, tile_m, tile_k, tile_n, ..., pe_prefix, a_pinned, b_pinned, epilogue_specs)` is called, the very first action is **computing tile counts and constructing the PE-component ID strings**: ``` M_tiles = max(1, ceil(M / tile_m)) K_tiles = max(1, ceil(K / tile_k)) N_tiles = max(1, ceil(N / tile_n)) dma_id = f"{pe_prefix}.pe_dma" fetch_id = f"{pe_prefix}.pe_fetch_store" gemm_id = f"{pe_prefix}.pe_gemm" math_id = f"{pe_prefix}.pe_math" ``` In short, **the plan generator's first act is "compute ceiling tile counts and assemble the four sub-component IDs for this PE once"**. No SimPy event or environment is touched — this module is a pure function. `generate_math_plan(M, N, tile_m, tile_n, ..., math_op, src_addr, dst_addr, pe_prefix)` likewise begins by computing `M_tiles`, `N_tiles` and assembling three component IDs (`dma_id`, `fetch_id`, `math_id`). ## Context ADR-0014 D6 agreed that "PE_SCHEDULER, on receiving a CompositeCmd, generates a TilePlan and feeds self-routing tile tokens". But the **concrete plan generation algorithm** lives in `src/kernbench/components/builtin/tiling.py`, which: - Defines no component — it is a pair of **pure functions** (`generate_gemm_plan`, `generate_math_plan`). - Does not depend on the SimPy environment, queues, op_log, or hooks. - Returns a `PipelinePlan` (dataclass). The original G4 analysis incorrectly described `tiling.py` as a component; it is in fact a plan-builder helper consumed by PE_SCHEDULER. Pinning this down in its own ADR (paired with ADR-0014 D6) prevents: - Ambiguity over whether plan generation belongs to PE_SCHEDULER or a separate module. - Inconsistent rationale for stage sequences (e.g., FETCH/STORE position) between GEMM and Math plans. - Undocumented branching rationale for `a_pinned` / `b_pinned` / `epilogue_specs`. ## Decision ### D1. `tiling` is a pure plan-generator module, not a component `components/builtin/tiling.py` defines no `ComponentBase` subclass. It exports two module-level functions: - `generate_gemm_plan(...) -> PipelinePlan` - `generate_math_plan(...) -> PipelinePlan` There is no `tiling` node in the topology graph. It lives in `builtin/` because it is a direct helper for PE_SCHEDULER (ADR-0014 D6) and is conceptually a PE_SCHEDULER internal utility. ### D2. GEMM plan stage sequence — `M → N → K` order For each `(m, n, k)` tile (default — no operand pinning, no epilogue): ``` [DMA_READ(A)] → [DMA_READ(B)] → FETCH → GEMM ↑ ↓ (last k tile only) [MATH(output_tile)]* → STORE → DMA_WRITE ``` `k_tile` epilogue inserts a MATH stage immediately after GEMM on every K-tile; `output_tile` epilogue inserts MATH stages once per `(m, n)` after the final K-tile but before STORE/DMA_WRITE. The K-loop accumulator stays in the register file across K-tiles — STORE/DMA_WRITE happens only when `last_k`. ### D3. Operand pinning — `a_pinned` / `b_pinned` If a caller passes `a_pinned=True`, **the A DMA_READ is omitted from every (m, n, k) tile**. Semantically: the caller (e.g., `tl.composite`) has already staged all of A in TCM via a prior `tl.load`, and signals so to the plan generator. The branch is made at plan time (not at runtime). Therefore the stage record count in op_log changes deterministically with pinning, and sweep analyses (e.g., gemm_sweep's stage record count) see this decision directly. ### D4. Epilogue scope — `k_tile` vs `output_tile` `epilogue_specs` is an iterable of op-spec objects. Each op object is expected to have: - `op.kind: str` — math op name (e.g., `"dequant"`, `"bias"`, `"relu"`, `"scale"`). Placed into the stage's `params["op_kind"]`. - `op.scope: Scope` — `Scope.K_TILE` or `Scope.OUTPUT_TILE` (`Scope` enum in `kernbench.common.pe_commands`). - Op-specific extras (e.g., `bias`, `scale`, `factor`) — currently not used by the plan generator; consumed at runtime by PE_MATH. The plan generator partitions by `getattr(o, "scope", None)`: - `scope == Scope.K_TILE`: adds a MATH stage right after GEMM on every K-tile. - `scope == Scope.OUTPUT_TILE`: adds MATH stages just before STORE on the last K-tile per `(m, n)`. Ops with neither `scope` value (e.g., missing attribute) are **dropped silently** — `getattr(..., None) == Scope.X` is False for both. Picking a default (`output_tile`) is the **caller's responsibility** (e.g., `tl.composite`), not the plan generator's. This aligns with ADR-0014's composite epilogue contract. `Scope` is imported lazily inside the function to avoid the circular path `pe_commands ← pe_types ← tiling`. This is intentional and not a refactor target — keeping `tiling` free of compile-time `pe_commands` dependencies preserves the module boundary (D1). ### D5. Math plan stage sequence — `M → N` order For each `(m, n)` tile: ``` DMA_READ → FETCH → MATH → STORE → DMA_WRITE ``` There is no K dimension, so concepts like epilogue or accumulator residency do not apply. PE_FETCH_STORE's register-file accounting follows the same pattern as the GEMM plan. ### D6. Plans are data — no SimPy dependency `PipelinePlan` is a dataclass in `pe_types.py` holding `tiles: list[TilePlan]`. Each `TilePlan` holds `stages: tuple[Stage, ...]`. The plan itself is near-immutable (only `Stage.params: dict` is mutable) and holds no SimPy objects. At runtime, PE_SCHEDULER consumes the plan's first stage, builds a `TileToken`, and feeds it into the pipeline. The TileToken carries `plan: TilePlan`, `stage_idx: int`, and a cached `params: dict`. Self-routing proceeds by `TileToken.advance()` caching the next stage's `params` (ADR-0014 D6). ### D7. Plan generator contract — pure, deterministic, idempotent Two calls with identical inputs return identical `PipelinePlan` instances (including `TilePlan.stages` order). This contract aligns with ADR-0014 D6's "deterministic tile dispatch order". No side effects (no SimPy events, no file I/O, no global state) — tests can call the generators directly without an environment object (some cases in `tests/test_pe_pipeline.py` rely on this). ## Alternatives Considered ### A1. Make tiling a component (e.g., PE_PLANNER) Rejected. Plan generation consumes no SimPy time — it is a pure decision algorithm. Making it a component would (a) add unnecessary infrastructure (inbox, resources), and (b) split PE_SCHEDULER's flow into "receive plan" plus "feed tiles", inserting a meaningless hop. ### A2. Move plan generation into PE_SCHEDULER as methods Rejected (currently). Module separation provides (1) testability and (2) extensibility for additional plan algorithms (e.g., DTensor-aware) — add a new function. If plan kinds proliferate enough to require explicit dispatch, a future ADR can introduce a plan factory on PE_SCHEDULER. ### A3. Make plans fully immutable (frozen dataclass + tuple) Partially adopted. `Stage` and `TilePlan` are dataclasses but not frozen, because `Stage.params: dict` is populated at plan-generation time and read at runtime (cached by TileToken on advance). Moving dict → frozendict pays migration cost without enough benefit. Convention: do not mutate after generation. ## Consequences - `tiling.py` is documented as a plan-generator module, not a component — preempting future G4-style "this component lacks an ADR" analyses. - The GEMM plan's stage sequence (D2) and pinning / epilogue branching (D3 / D4) are pinned, providing a clear interpretation basis for sweep analyses (e.g., `scripts/gemm_sweep.py`'s stage record counts). - The plan generator's pure contract (D7) enables environment-free testing in line with ADR-0013 (verification strategy). - Future plan kinds (DTensor-aware, K-major, ...) follow D1 / D6 / D7 as a baseline — just add a new function.