Files
kernbench2/docs/adr/ADR-0011-memory-addressing-simplification.md
T
2026-03-18 11:47:48 -07:00

1.9 KiB

ADR-0011: Memory Addressing Simplification (PA-first)

Status

Accepted

Context

A realistic system uses host-side virtual addressing and an MMU/IOMMU-style translation path for DMA: host allocates physical memory at PE level, maps it into a virtual address space, installs mappings, and DMA requests use virtual addresses that are translated to physical addresses.

For early development, we want a minimal, deterministic model that enables:

  • correct routing and latency accounting through the graph,
  • stable tensor deployment and kernel execution semantics,
  • future extension toward VA/MMU without rewriting workflows.

Decision

D1. Phase 0 model is PA-only

The simulator uses a PA-first model:

  • All device memory accesses (MemoryRead/MemoryWrite) operate on device physical addresses (PA) plus size.
  • Tensor handles store PA-based shard mappings after deployment.
  • KernelLaunch passes tensor arguments as PA-based mappings (or references to them).
  • MMU/IOMMU concepts (virtual address spaces, page tables, translation latency) are NOT modeled in Phase 0.

D2. Allocation produces PA mappings

Device allocation selects PE-local memory regions and returns PA mappings sufficient to execute kernels and issue DMA requests.

D3. Extension path (non-breaking)

A future ADR MAY introduce an optional VA/MMU layer by:

  • introducing virtual addresses in tensor handles,
  • adding a mapping-install step,
  • modeling translation latency and page granularity.

The Phase 0 PA model remains a valid fast-path configuration.


Consequences

  • Early implementation stays simple and testable.
  • All latency remains explicit via graph traversal, not hidden translation.
  • Future VA/MMU modeling can be added without breaking existing benchmarks.

  • ADR-0007 (runtime_api vs sim_engine boundaries)
  • ADR-0008 (tensor deployment)
  • ADR-0009 (kernel execution)
  • SPEC R2 (latency by traversal)