Implement ADR-0021: PE pipeline refactor with token self-routing

Step 1-2: Backup existing code
- builtin/ → builtin_legacy/ (unchanged backup)
- custom/pe_accel/ → custom/pe_accel_legacy/ (unchanged backup)

Step 3-4: New pipeline types and tiling
- pe_types.py: StageType, Stage, TilePlan, PipelinePlan, PipelineContext, TileToken
- tiling.py: generate_gemm_plan, generate_math_plan (ported from pe_accel)

Step 5: Component implementations (ADR-0021 D4-D6)
- PE_SCHEDULER: _feed_loop (singleton FIFO feeder) + plan generation
- PE_FETCH_STORE: new component — TCM ↔ Register File
- PE_GEMM: TileToken pipeline + legacy PeInternalTxn dual-mode
- PE_MATH: TileToken pipeline + legacy dual-mode
- PE_DMA: TileToken pipeline + legacy + fabric Transaction triple-mode
- PE_TCM: TcmRequest handler with dual-channel BW serialization

Step 6: Infrastructure
- topology.yaml: pe_fetch_store component + chaining edges
- components.yaml: pe_fetch_store_v1 registration
- builder.py: PE_COMP_OFFSETS, _add_pe_internal_edges, PE view positions
- Tests: node/edge counts, PE component sets updated

All components handle both TileToken (pipeline) and PeInternalTxn (legacy).
Token self-routing: components read next stage from token.plan, chain via out_port.
366 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-08 23:35:31 -07:00
parent 161132cdcb
commit b6eb97c49a
40 changed files with 4055 additions and 214 deletions
+8 -7
View File
@@ -38,13 +38,14 @@ components:
sram_v1: kernbench.components.builtin.sram:SramComponent
# PE-level
pe_cpu_v1: kernbench.components.builtin.pe_cpu:PeCpuComponent
pe_scheduler_v1: kernbench.components.builtin.pe_scheduler:PeSchedulerComponent
pe_dma_v1: kernbench.components.builtin.pe_dma:PeDmaComponent
pe_gemm_v1: kernbench.components.builtin.pe_gemm:PeGemmComponent
pe_math_v1: kernbench.components.builtin.pe_math:PeMathComponent
pe_mmu_v1: kernbench.components.builtin.pe_mmu:PeMmuComponent
pe_tcm_v1: kernbench.components.builtin.pe_tcm:PeTcmComponent
pe_cpu_v1: kernbench.components.builtin.pe_cpu:PeCpuComponent
pe_scheduler_v1: kernbench.components.builtin.pe_scheduler:PeSchedulerComponent
pe_dma_v1: kernbench.components.builtin.pe_dma:PeDmaComponent
pe_gemm_v1: kernbench.components.builtin.pe_gemm:PeGemmComponent
pe_math_v1: kernbench.components.builtin.pe_math:PeMathComponent
pe_fetch_store_v1: kernbench.components.builtin.pe_fetch_store:PeFetchStoreComponent
pe_mmu_v1: kernbench.components.builtin.pe_mmu:PeMmuComponent
pe_tcm_v1: kernbench.components.builtin.pe_tcm:PeTcmComponent
# Custom — add your implementations here
pe_scheduler_v2: kernbench.components.custom.pe_accel.scheduler:SchedulerV2Component