# ADR-0039: PE_MMU Component Model — Component + Utility Dual Role ## Status Accepted (2026-05-20). ADR-0011 (PA/VA/LA address model) only states that "the VA model translates VA→PA via PE_MMU"; this ADR pins down **the PE_MMU component's own behavior model**. ## First action At construction, read `node.attrs["page_size"]` (default `2 MiB`) and `node.attrs["tlb_overhead_ns"]` (default `0.0`) and instantiate the internal `PeMMU` utility object (`policy.address.pe_mmu.PeMMU`) exactly once. That object is the single owner of the page table, the sub-page region lists, and the TLB overhead value. At runtime the first action splits into two paths: - **Component path (inbox consumption)**: `_worker` pulls a Transaction off `_inbox`; if `request` is a `MmuMapMsg`, call `self._mmu.map(va, pa, size)` for each entry and then `txn.done.succeed()`. For `MmuUnmapMsg`, call `unmap(va, size)`. Any other type falls through to standard `_forward_txn`. In other words, **the component's first act is "apply map/unmap commands to the page table"**. - **Utility path (direct call)**: a sibling PE engine (PE_DMA / PE_GEMM) calls `pe_mmu.mmu.translate(va)` directly. This path produces no SimPy events; the caller (when `overhead_ns > 0`) issues a `yield env.timeout(mmu.overhead_ns)` in its own process. ## Context ADR-0011 defined three address models (PA/VA/LA) and agreed that "VA model = translation via PE_MMU". But in code, `PeMmuComponent` performs two complementary roles simultaneously: 1. **A topology-graph component**: it receives `MmuMapMsg` / `MmuUnmapMsg` sideband messages over the cube NoC and updates the page table. 2. **A PE-local utility**: PE_DMA / PE_GEMM on the same PE call `translate(va)` directly with zero SimPy latency (the caller pays `overhead_ns` if any). Without an ADR covering both roles, the following questions are ambiguous: - "Why isn't there a SimPy event for the MMU translate?" (Answer: the caller pays it.) - What is the sub-page region model, and why? (The code docstring has it, but no ADR — only a memory note `project_mmu_subpage_stopgap`.) - Who sends map/unmap, and when must they be visible? (Ordering contract.) Additionally, `PeMMU.map()` has "append, last-write-wins on overlap" semantics, which is impossible to express with a one-PA-per-entry page table. That is a deliberate **simulator stopgap** to support DPPolicy sub-page sharding (e.g., 128 B payloads against 4 KiB pages) without silent last-write-wins misrouting. This deviation from real HW MMU semantics must be ADR-pinned. ## Decision ### D1. Explicit dual role — component and utility `PeMmuComponent` exposes two interfaces from a single class: - Component interface: `_inbox` consumption, `_worker` loop (handles MMU sideband messages). - Utility interface: the `mmu` property exposes the underlying `PeMMU` object, which PE_DMA / PE_GEMM hold directly and invoke `translate()` on. The latter is **not a layer skip**: inside a PE, the engines and PE_MMU are siblings under the "components" layer (ADR-0007). Cross-layer violations only apply to runtime API ↔ sim_engine ↔ components boundaries. ### D2. Latency model — `translate()` is pure; caller owns the timeout `PeMMU.translate()` is a pure function and yields nothing in SimPy. The caller (a PE engine) issues `if mmu.overhead_ns > 0: yield env.timeout(mmu.overhead_ns)` in its own process after translation. Rationale: the PE engine process already holds its own `record_start` / `record_end` (op_log) hooks, so keeping timing inside the caller's process preserves consistent timing accounting. A separate MMU process would split the engine's processing flow and blur op_log / pipeline overlap semantics. #### D2.1. Current implementation asymmetry — pipeline vs non-pipeline (known) At the time of writing, `pe_dma.py` handles MMU overhead differently in its two call paths: - **non-pipeline (`handle_command`)**: after `translate()`, applies `if self._mmu.overhead_ns > 0: yield env.timeout(self._mmu.overhead_ns)`. - **pipeline (`_do_pipeline_dma`)**: calls `translate()` only, **omitting** the overhead timeout — though the comment says "same logic as non-pipeline path", the behaviors differ. In the default topology, `tlb_overhead_ns = 0.0`, so this asymmetry does not manifest. With `tlb_overhead_ns > 0`, however, GEMM/Math via the pipeline path appears MMU-overhead faster than the equivalent non-pipeline workload. The D2 contract states that **all** callers pay the overhead; the pipeline omission is **not an intentional design** — ADR-0014 D6 (pipeline self-routing) does not exempt it. Remediation options (require a separate Phase 1/2): - (a) Add `if mmu.overhead_ns > 0: yield env.timeout(...)` in `_do_pipeline_dma` to align with D2 — **preferred**. - (b) Narrow the D2 contract to "non-pipeline only" and document the pipeline exemption in an ADR-0014 update — discouraged, since it weakens the overhead's meaning. This ADR recommends (a) and assumes a small follow-up change either before or just after acceptance. ### D3. Page table structure — sub-page region list (stopgap) `self._table: dict[vpn, list[(start_in_page, end_in_page, pa_at_offset_zero)]]` holds multiple disjoint regions per page. - `map(va, pa, size)`: append regions when the range crosses a page boundary. - `translate(va)`: look up regions for the VPN and iterate **in reverse** so the most recent overlapping region wins (last-write-wins). - `unmap(va, size)`: remove only regions whose extent is **fully contained** within the unmap range; partial-overlap boundaries are left in place and the caller is expected to unmap on the same boundaries used for map. This is documented as a **simulator stopgap** that supplements the VA model from ADR-0011. It prevents silent last-write-wins misrouting when DPPolicy shards below page granularity. Memory note: `project_mmu_subpage_stopgap`. ### D4. PageFault signals PA fallback If `translate()` is called with an unmapped VA, `PageFault` is raised. PE_DMA catches the exception and **uses the original address as a PA** (the PA-only backward-compatibility path from ADR-0011). PageFault is therefore not an error — it is the signal for "no VA mapping, interpret as PA". This path is intentional and preserves backward compatibility with the ADR-0011 PA-only mode. ### D5. MMU sideband-message reception contract `MmuMapMsg` / `MmuUnmapMsg` arrive over the fabric at PE_MMU's `_inbox` (SPEC R10: "MMU map installation incurs measured fabric latency"). Schemas live in `runtime_api/kernel.py`: - `MmuMapMsg.entries: tuple[dict, ...]` — each dict is `{"va": int, "pa": int, "size": int}`. - `MmuUnmapMsg.entries: tuple[dict, ...]` — each dict is `{"va": int, "size": int}`. PE_MMU reception flow: 1. `_worker` does `_inbox.get()` for one message. 2. `hasattr(msg, "request")` confirms a Transaction wrapper. 3. `isinstance(msg.request, MmuMapMsg)` → for each entry, call `self._mmu.map(va=e["va"], pa=e["pa"], size=e["size"])`. 4. `isinstance(msg.request, MmuUnmapMsg)` → for each entry, call `self._mmu.unmap(va=e["va"], size=e["size"])`. 5. Both signal `msg.done.succeed()` after completion. An external caller (runtime API) `await`ing `done` therefore receives a SimPy guarantee that "the mapping is installed on-device" — this is the realization of ADR-0011's "MMU map installation incurs measured fabric latency". This ADR does **not** define the **sender or fan-out policy** for the sideband message — those are runtime API responsibilities. Only the receive contract belongs here. ### D6. Non-MMU Transactions delegate to generic forwarding If a message pulled from `_inbox` is not `MmuMapMsg` / `MmuUnmapMsg` (or lacks a `request` attribute), `_forward_txn` handles it normally. This keeps the door open for future topologies where PE_MMU sits on a pass-through path — current code never sends such traffic, but the routing remains safe. ## Alternatives Considered ### A1. Make `translate()` a SimPy generator Rejected. As D2 explains, this blurs op_log / pipeline overlap accounting in the PE engine. ### A2. Use small page size (e.g., 128 B) instead of sub-page regions Rejected. Would explode page-table memory and cube-wide map message size. Most mappings are 2 MiB; pushing the page size below that for the few DPPolicy sharding cases inflates average cost. ### A3. Make PE_MMU a PE_CPU helper only (not a topology node) Rejected. ADR-0011 requires that MMU map installation incur measured fabric latency (via `MmuMapMsg`), which requires PE_MMU to be a node on the graph. It also keeps cube NoC visualizer output consistent. ## Consequences - PE_MMU's dual role is justified at ADR level, so future "unify into one" refactor pressure has a documented counterpoint. - The sub-page region model is explicitly labeled a stopgap, providing a basis for deprecating it when LA model (ADR-0011) lands. - The "`translate()` does not yield" contract is locked in (D2), so any future proposal to add an internal MMU timeout can be denied with a documented rationale. - PA fallback (D4) is normalized, preventing defensive logic from treating PageFault as an error.