Add virtual memory support: PE_MMU, VA allocator, fabric MmuMapMsg

Implement VA/MMU layer (ADR-0011 Phase 1) enabling Triton kernels to use
contiguous virtual addresses on sharded tensors.

Key changes:
- PE_MMU component: hybrid inbox (MmuMapMsg) + sync translate() for PE_DMA
- VirtualAllocator + PEMemAllocator: free-list with coalescing
- MmuMapMsg/MmuUnmapMsg fabric path with SIP-level routing
- DPPolicy-based mapping: replicate=local, sharded=broadcast
- Tensor lifecycle: del + weakref cleanup, context manager
- Rename: TensorHandle.pa→addr, DmaReadCmd.src_pa→src_addr, ctx→torch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-26 00:01:47 -07:00
parent 62fb01ae18
commit 08812eda58
34 changed files with 2131 additions and 139 deletions
+6 -6
View File
@@ -28,7 +28,7 @@ class TensorHandle:
"""
id: str
pa: int # physical address in HBM/TCM
addr: int # address (VA when MMU enabled, PA otherwise)
shape: tuple[int, ...]
dtype: str
nbytes: int # total byte size
@@ -50,19 +50,19 @@ class CompletionHandle:
@dataclass(frozen=True)
class DmaReadCmd:
"""DMA READ: HBM → PE_TCM."""
"""DMA READ: HBM → PE_TCM. src_addr is VA (translated to PA by PE_DMA)."""
handle: TensorHandle
src_pa: int
src_addr: int
nbytes: int
@dataclass(frozen=True)
class DmaWriteCmd:
"""DMA WRITE: PE_TCM → HBM."""
"""DMA WRITE: PE_TCM → HBM. dst_addr is VA (translated to PA by PE_DMA)."""
handle: TensorHandle
dst_pa: int
dst_addr: int
nbytes: int
@@ -108,7 +108,7 @@ class CompositeCmd:
op: Literal["gemm", "math"]
a: TensorHandle
b: TensorHandle | None
out_pa: int
out_addr: int
out_nbytes: int
math_op: str | None = None # for op="math": which math operation