Add SchedulerV2 (pe_accel), DPPolicy overrides, and new benchmarks

- Add cycle-accurate PE accelerator scheduler (SchedulerV2) with tiled GEMM/Math pipelines (DMA_IN → GEMM → MATH → DMA_WB) - Add DPPolicy num_pes/num_cubes/num_sips overrides for single-PE testing - Support tuple target_pe for targeting specific PE subsets - Add gemm_single_pe and gpt3_qkv benchmarks - Switch default topology to pe_scheduler_v2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:18:49 -07:00
parent 63669f82cb
commit 114510d4b9
22 changed files with 1822 additions and 15 deletions
@@ -320,10 +320,12 @@ class MCpuComponent(ComponentBase):
        else:
            txn.done.succeed()

-    def _resolve_pe_ids(self, target_pe: int | str) -> list[int]:
+    def _resolve_pe_ids(self, target_pe: int | tuple | str) -> list[int]:
        """Return list of PE IDs to fan out to (used by kernel launch fan-out)."""
        if isinstance(target_pe, int):
            return [target_pe]
+        if isinstance(target_pe, tuple):
+            return list(target_pe)
        # "all": all PEs in local cube
        n_slices = 8
        if self.ctx and self.ctx.spec: