Add virtual memory support: PE_MMU, VA allocator, fabric MmuMapMsg
Implement VA/MMU layer (ADR-0011 Phase 1) enabling Triton kernels to use contiguous virtual addresses on sharded tensors. Key changes: - PE_MMU component: hybrid inbox (MmuMapMsg) + sync translate() for PE_DMA - VirtualAllocator + PEMemAllocator: free-list with coalescing - MmuMapMsg/MmuUnmapMsg fabric path with SIP-level routing - DPPolicy-based mapping: replicate=local, sharded=broadcast - Tensor lifecycle: del + weakref cleanup, context manager - Rename: TensorHandle.pa→addr, DmaReadCmd.src_pa→src_addr, ctx→torch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -207,12 +207,15 @@ benchmark instances by default.
|
||||
|
||||
## R10. Memory Addressing (Phase 0)
|
||||
|
||||
In Phase 0, the simulator uses a **PA-first memory model**:
|
||||
The simulator uses a **VA/PA memory model** (ADR-0011):
|
||||
|
||||
- All memory operations use device physical addresses (PA) only.
|
||||
- Virtual addressing, MMU/IOMMU, and address translation latency are out of scope.
|
||||
- Tensors are assigned a contiguous virtual address (VA) range at deployment.
|
||||
- PE_MMU translates VA→PA per access; TLB overhead is configurable.
|
||||
- Mapping installation (MmuMapMsg) traverses the fabric with measured latency.
|
||||
- Replicate tensors use per-cube local PA mapping; sharded tensors broadcast.
|
||||
- PA-only fallback is retained for backward compatibility.
|
||||
- Tensor placement is represented as a list of PA shards, each explicitly tagged
|
||||
with `(sip, cube, pe)`.
|
||||
with `(sip, cube, pe)`, plus a tensor-wide `va_base`.
|
||||
|
||||
All memory access latency MUST be modeled explicitly via graph traversal.
|
||||
No implicit translation or hidden latency is allowed.
|
||||
|
||||
Reference in New Issue
Block a user