- Add cycle-accurate PE accelerator scheduler (SchedulerV2) with tiled
GEMM/Math pipelines (DMA_IN → GEMM → MATH → DMA_WB)
- Add DPPolicy num_pes/num_cubes/num_sips overrides for single-PE testing
- Support tuple target_pe for targeting specific PE subsets
- Add gemm_single_pe and gpt3_qkv benchmarks
- Switch default topology to pe_scheduler_v2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>