Implement ADR-0021: PE pipeline refactor with token self-routing
Step 1-2: Backup existing code - builtin/ → builtin_legacy/ (unchanged backup) - custom/pe_accel/ → custom/pe_accel_legacy/ (unchanged backup) Step 3-4: New pipeline types and tiling - pe_types.py: StageType, Stage, TilePlan, PipelinePlan, PipelineContext, TileToken - tiling.py: generate_gemm_plan, generate_math_plan (ported from pe_accel) Step 5: Component implementations (ADR-0021 D4-D6) - PE_SCHEDULER: _feed_loop (singleton FIFO feeder) + plan generation - PE_FETCH_STORE: new component — TCM ↔ Register File - PE_GEMM: TileToken pipeline + legacy PeInternalTxn dual-mode - PE_MATH: TileToken pipeline + legacy dual-mode - PE_DMA: TileToken pipeline + legacy + fabric Transaction triple-mode - PE_TCM: TcmRequest handler with dual-channel BW serialization Step 6: Infrastructure - topology.yaml: pe_fetch_store component + chaining edges - components.yaml: pe_fetch_store_v1 registration - builder.py: PE_COMP_OFFSETS, _add_pe_internal_edges, PE view positions - Tests: node/edge counts, PE component sets updated All components handle both TileToken (pipeline) and PeInternalTxn (legacy). Token self-routing: components read next stage from token.plan, chain via out_port. 366 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+15
-6
@@ -63,19 +63,28 @@ cube:
|
||||
pe_cpu: { kind: pe_cpu, impl: pe_cpu_v1, attrs: { overhead_ns: 2.0 } }
|
||||
pe_scheduler: { kind: pe_scheduler, impl: pe_scheduler_v2, attrs: { overhead_ns: 1.0 } }
|
||||
pe_dma: { kind: pe_dma, impl: pe_dma_v1, attrs: { rd_engines: 1, wr_engines: 1 } }
|
||||
pe_gemm: { kind: pe_gemm, impl: pe_gemm_v1, attrs: { overhead_ns: 0.0, shared_resource: accel_slot, peak_tflops_f16: 8.0 } }
|
||||
pe_math: { kind: pe_math, impl: pe_math_v1, attrs: { overhead_ns: 0.0, shared_resource: accel_slot } }
|
||||
pe_mmu: { kind: pe_mmu, impl: pe_mmu_v1, attrs: { tlb_overhead_ns: 0.5, page_size: 4096 } }
|
||||
pe_tcm: { kind: pe_tcm, impl: pe_tcm_v1, attrs:
|
||||
{ size_mb: 16 } }
|
||||
pe_gemm: { kind: pe_gemm, impl: pe_gemm_v1, attrs: { overhead_ns: 0.0, shared_resource: accel_slot, peak_tflops_f16: 8.0 } }
|
||||
pe_math: { kind: pe_math, impl: pe_math_v1, attrs: { overhead_ns: 0.0, shared_resource: accel_slot } }
|
||||
pe_fetch_store: { kind: pe_fetch_store, impl: pe_fetch_store_v1, attrs: { overhead_ns: 0.0 } }
|
||||
pe_mmu: { kind: pe_mmu, impl: pe_mmu_v1, attrs: { tlb_overhead_ns: 0.5, page_size: 4096 } }
|
||||
pe_tcm: { kind: pe_tcm, impl: pe_tcm_v1, attrs: { size_mb: 16, read_bw_gbs: 512.0, write_bw_gbs: 512.0 } }
|
||||
links:
|
||||
pe_cpu_to_scheduler_mm: 0.5
|
||||
scheduler_to_dma_mm: 0.5
|
||||
scheduler_to_gemm_mm: 0.5
|
||||
scheduler_to_math_mm: 0.5
|
||||
scheduler_to_fetch_store_mm: 0.5
|
||||
dma_to_tcm_bw_gbs: 512.0
|
||||
dma_to_tcm_mm: 0.5
|
||||
gemm_to_tcm_bw_gbs: 512.0 # GEMM reads inputs from TCM (ADR-0014 D5)
|
||||
dma_to_fetch_store_mm: 0.0 # DMA → fetch_store chaining (ADR-0021)
|
||||
fetch_store_to_tcm_bw_gbs: 512.0
|
||||
fetch_store_to_tcm_mm: 0.0
|
||||
fetch_store_to_gemm_mm: 0.0 # fetch → GEMM chaining (ADR-0021)
|
||||
fetch_store_to_math_mm: 0.0 # fetch → MATH chaining (ADR-0021)
|
||||
gemm_to_fetch_store_mm: 0.0 # GEMM → store chaining (ADR-0021)
|
||||
math_to_fetch_store_mm: 0.0 # MATH → store chaining (ADR-0021)
|
||||
fetch_store_to_dma_mm: 0.0 # store → DMA writeback chaining (ADR-0021)
|
||||
gemm_to_tcm_bw_gbs: 512.0
|
||||
gemm_to_tcm_mm: 0.5
|
||||
math_to_tcm_bw_gbs: 512.0
|
||||
math_to_tcm_mm: 0.5
|
||||
|
||||
Reference in New Issue
Block a user