# ADR-0011: Memory Addressing Simplification (PA-first) ## Status Accepted ## Context A realistic system uses host-side virtual addressing and an MMU/IOMMU-style translation path for DMA: host allocates physical memory at PE level, maps it into a virtual address space, installs mappings, and DMA requests use virtual addresses that are translated to physical addresses. For early development, we want a minimal, deterministic model that enables: - correct routing and latency accounting through the graph, - stable tensor deployment and kernel execution semantics, - future extension toward VA/MMU without rewriting workflows. --- ## Decision ### D1. Phase 0 model is PA-only The simulator uses a PA-first model: - All device memory accesses (MemoryRead/MemoryWrite) operate on device physical addresses (PA) plus size. - Tensor handles store PA-based shard mappings after deployment. - KernelLaunch passes tensor arguments as PA-based mappings (or references to them). - MMU/IOMMU concepts (virtual address spaces, page tables, translation latency) are NOT modeled in Phase 0. ### D2. Allocation produces PA mappings Device allocation selects PE-local memory regions and returns PA mappings sufficient to execute kernels and issue DMA requests. ### D3. Extension path (non-breaking) A future ADR MAY introduce an optional VA/MMU layer by: - introducing virtual addresses in tensor handles, - adding a mapping-install step, - modeling translation latency and page granularity. The Phase 0 PA model remains a valid fast-path configuration. --- ## Consequences - Early implementation stays simple and testable. - All latency remains explicit via graph traversal, not hidden translation. - Future VA/MMU modeling can be added without breaking existing benchmarks. --- ## Links - ADR-0007 (runtime_api vs sim_engine boundaries) - ADR-0008 (tensor deployment) - ADR-0009 (kernel execution) - SPEC R2 (latency by traversal)