Add SchedulerV2 (pe_accel), DPPolicy overrides, and new benchmarks

- Add cycle-accurate PE accelerator scheduler (SchedulerV2) with tiled GEMM/Math pipelines (DMA_IN → GEMM → MATH → DMA_WB) - Add DPPolicy num_pes/num_cubes/num_sips overrides for single-PE testing - Support tuple target_pe for targeting specific PE subsets - Add gemm_single_pe and gpt3_qkv benchmarks - Switch default topology to pe_scheduler_v2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:18:49 -07:00
parent 63669f82cb
commit 114510d4b9
22 changed files with 1822 additions and 15 deletions
@@ -105,7 +105,7 @@ class DPMetadata:
    dp_policy: DPPolicy | None = None
    sip: int = 0
    cube: int = 0
-    target_pe: int | str = 0  # int → single PE, "all" → all PEs
+    target_pe: int | tuple[int, ...] | str = 0  # int → single PE, tuple → specific PEs, "all" → all PEs


 class Tensor:
@@ -166,7 +166,7 @@ class Tensor:
        dp_policy: DPPolicy | None = None,
        sip: int = 0,
        cube: int = 0,
-        target_pe: int | str = 0,
+        target_pe: int | tuple[int, ...] | str = 0,
    ) -> Tensor:
        """Set DP placement metadata (like torch.Tensor.to())."""
        if placement is None: