Add SchedulerV2 (pe_accel), DPPolicy overrides, and new benchmarks

- Add cycle-accurate PE accelerator scheduler (SchedulerV2) with tiled
  GEMM/Math pipelines (DMA_IN → GEMM → MATH → DMA_WB)
- Add DPPolicy num_pes/num_cubes/num_sips overrides for single-PE testing
- Support tuple target_pe for targeting specific PE subsets
- Add gemm_single_pe and gpt3_qkv benchmarks
- Switch default topology to pe_scheduler_v2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-26 23:18:49 -07:00
parent 63669f82cb
commit 114510d4b9
22 changed files with 1822 additions and 15 deletions
+3 -1
View File
@@ -320,10 +320,12 @@ class MCpuComponent(ComponentBase):
else:
txn.done.succeed()
def _resolve_pe_ids(self, target_pe: int | str) -> list[int]:
def _resolve_pe_ids(self, target_pe: int | tuple | str) -> list[int]:
"""Return list of PE IDs to fan out to (used by kernel launch fan-out)."""
if isinstance(target_pe, int):
return [target_pe]
if isinstance(target_pe, tuple):
return list(target_pe)
# "all": all PEs in local cube
n_slices = 8
if self.ctx and self.ctx.spec: