Add virtual memory support: PE_MMU, VA allocator, fabric MmuMapMsg

Implement VA/MMU layer (ADR-0011 Phase 1) enabling Triton kernels to use contiguous virtual addresses on sharded tensors. Key changes: - PE_MMU component: hybrid inbox (MmuMapMsg) + sync translate() for PE_DMA - VirtualAllocator + PEMemAllocator: free-list with coalescing - MmuMapMsg/MmuUnmapMsg fabric path with SIP-level routing - DPPolicy-based mapping: replicate=local, sharded=broadcast - Tensor lifecycle: del + weakref cleanup, context manager - Rename: TensorHandle.pa→addr, DmaReadCmd.src_pa→src_addr, ctx→torch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:01:47 -07:00
parent 62fb01ae18
commit 08812eda58
34 changed files with 2131 additions and 139 deletions
@@ -28,7 +28,7 @@ class TensorHandle:
    """

    id: str
-    pa: int                          # physical address in HBM/TCM
+    addr: int                        # address (VA when MMU enabled, PA otherwise)
    shape: tuple[int, ...]
    dtype: str
    nbytes: int                      # total byte size
@@ -50,19 +50,19 @@ class CompletionHandle:

@dataclass(frozen=True)
 class DmaReadCmd:
-    """DMA READ: HBM → PE_TCM."""
+    """DMA READ: HBM → PE_TCM. src_addr is VA (translated to PA by PE_DMA)."""

    handle: TensorHandle
-    src_pa: int
+    src_addr: int
    nbytes: int


@dataclass(frozen=True)
 class DmaWriteCmd:
-    """DMA WRITE: PE_TCM → HBM."""
+    """DMA WRITE: PE_TCM → HBM. dst_addr is VA (translated to PA by PE_DMA)."""

    handle: TensorHandle
-    dst_pa: int
+    dst_addr: int
    nbytes: int


@@ -108,7 +108,7 @@ class CompositeCmd:
    op: Literal["gemm", "math"]
    a: TensorHandle
    b: TensorHandle | None
-    out_pa: int
+    out_addr: int
    out_nbytes: int
    math_op: str | None = None       # for op="math": which math operation