commit - release 1
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# ADR-0011: Memory Addressing Simplification (PA-first)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
A realistic system uses host-side virtual addressing and an MMU/IOMMU-style
|
||||
translation path for DMA: host allocates physical memory at PE level, maps it
|
||||
into a virtual address space, installs mappings, and DMA requests use virtual
|
||||
addresses that are translated to physical addresses.
|
||||
|
||||
For early development, we want a minimal, deterministic model that enables:
|
||||
|
||||
- correct routing and latency accounting through the graph,
|
||||
- stable tensor deployment and kernel execution semantics,
|
||||
- future extension toward VA/MMU without rewriting workflows.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Phase 0 model is PA-only
|
||||
|
||||
The simulator uses a PA-first model:
|
||||
|
||||
- All device memory accesses (MemoryRead/MemoryWrite) operate on device physical
|
||||
addresses (PA) plus size.
|
||||
- Tensor handles store PA-based shard mappings after deployment.
|
||||
- KernelLaunch passes tensor arguments as PA-based mappings (or references to them).
|
||||
- MMU/IOMMU concepts (virtual address spaces, page tables, translation latency)
|
||||
are NOT modeled in Phase 0.
|
||||
|
||||
### D2. Allocation produces PA mappings
|
||||
|
||||
Device allocation selects PE-local memory regions and returns PA mappings
|
||||
sufficient to execute kernels and issue DMA requests.
|
||||
|
||||
### D3. Extension path (non-breaking)
|
||||
|
||||
A future ADR MAY introduce an optional VA/MMU layer by:
|
||||
|
||||
- introducing virtual addresses in tensor handles,
|
||||
- adding a mapping-install step,
|
||||
- modeling translation latency and page granularity.
|
||||
|
||||
The Phase 0 PA model remains a valid fast-path configuration.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Early implementation stays simple and testable.
|
||||
- All latency remains explicit via graph traversal, not hidden translation.
|
||||
- Future VA/MMU modeling can be added without breaking existing benchmarks.
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- ADR-0007 (runtime_api vs sim_engine boundaries)
|
||||
- ADR-0008 (tensor deployment)
|
||||
- ADR-0009 (kernel execution)
|
||||
- SPEC R2 (latency by traversal)
|
||||
Reference in New Issue
Block a user