commit - release 1

2026-03-18 11:47:48 -07:00
commit 6f43807900
109 changed files with 14909 additions and 0 deletions
@@ -0,0 +1,65 @@
+# ADR-0011: Memory Addressing Simplification (PA-first)
+
+## Status
+
+Accepted
+
+## Context
+
+A realistic system uses host-side virtual addressing and an MMU/IOMMU-style
+translation path for DMA: host allocates physical memory at PE level, maps it
+into a virtual address space, installs mappings, and DMA requests use virtual
+addresses that are translated to physical addresses.
+
+For early development, we want a minimal, deterministic model that enables:
+
+- correct routing and latency accounting through the graph,
+- stable tensor deployment and kernel execution semantics,
+- future extension toward VA/MMU without rewriting workflows.
+
+---
+
+## Decision
+
+### D1. Phase 0 model is PA-only
+
+The simulator uses a PA-first model:
+
+- All device memory accesses (MemoryRead/MemoryWrite) operate on device physical
+  addresses (PA) plus size.
+- Tensor handles store PA-based shard mappings after deployment.
+- KernelLaunch passes tensor arguments as PA-based mappings (or references to them).
+- MMU/IOMMU concepts (virtual address spaces, page tables, translation latency)
+  are NOT modeled in Phase 0.
+
+### D2. Allocation produces PA mappings
+
+Device allocation selects PE-local memory regions and returns PA mappings
+sufficient to execute kernels and issue DMA requests.
+
+### D3. Extension path (non-breaking)
+
+A future ADR MAY introduce an optional VA/MMU layer by:
+
+- introducing virtual addresses in tensor handles,
+- adding a mapping-install step,
+- modeling translation latency and page granularity.
+
+The Phase 0 PA model remains a valid fast-path configuration.
+
+---
+
+## Consequences
+
+- Early implementation stays simple and testable.
+- All latency remains explicit via graph traversal, not hidden translation.
+- Future VA/MMU modeling can be added without breaking existing benchmarks.
+
+---
+
+## Links
+
+- ADR-0007 (runtime_api vs sim_engine boundaries)
+- ADR-0008 (tensor deployment)
+- ADR-0009 (kernel execution)
+- SPEC R2 (latency by traversal)