Add virtual memory support: PE_MMU, VA allocator, fabric MmuMapMsg

Implement VA/MMU layer (ADR-0011 Phase 1) enabling Triton kernels to use contiguous virtual addresses on sharded tensors. Key changes: - PE_MMU component: hybrid inbox (MmuMapMsg) + sync translate() for PE_DMA - VirtualAllocator + PEMemAllocator: free-list with coalescing - MmuMapMsg/MmuUnmapMsg fabric path with SIP-level routing - DPPolicy-based mapping: replicate=local, sharded=broadcast - Tensor lifecycle: del + weakref cleanup, context manager - Rename: TensorHandle.pa→addr, DmaReadCmd.src_pa→src_addr, ctx→torch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:01:47 -07:00
parent 62fb01ae18
commit 08812eda58
34 changed files with 2131 additions and 139 deletions
@@ -22,6 +22,7 @@ _PE_COMP_OFFSETS = {
    "pe_dma": (0.0, -0.15),
    "pe_gemm": (0.0, 0.0),
    "pe_math": (0.0, 0.15),
+    "pe_mmu": (0.15, -0.15),
    "pe_tcm": (0.3, 0.0),
 }

@@ -495,6 +496,15 @@ def _instantiate_cube(
                kind="pe_response",
            ))

+            # noc → PE_MMU (MMU mapping install)
+            pe_mmu_id = f"{pp}.pe_mmu"
+            if pe_mmu_id in nodes:
+                edges.append(Edge(
+                    src=f"{cp}.noc", dst=pe_mmu_id,
+                    distance_mm=clinks.get("noc_to_pe_mmu_mm", 0.0),
+                    kind="command",
+                ))
+
            pe_idx += 1

    # ── xbar_top/bot → HBM slices ──
@@ -1073,6 +1083,7 @@ def _build_pe_view(spec: dict) -> ViewGraph:
        "pe_dma": (7.0, 1.5),
        "pe_gemm": (7.0, 4.0),
        "pe_math": (7.0, 6.5),
+        "pe_mmu": (4.0, 1.5),
        "pe_tcm": (10.0, 4.0),
    }