Add SIP-level tensor parallelism, component registry YAML, VA offset verification

- DPPolicy: 3-level (sip/cube/pe), unified naming (column_wise/row_wise) - PE_CPU: auto num_programs from cube shard count - context.launch(): per-SIP KernelLaunchMsg with local va_base + auto local shape - deploy_tensor: removed mmus param, MMU mapping is context-only responsibility - ComponentRegistry: YAML-based lazy loading (components.yaml), impls→builtin rename - VA offset bench + tests: 2D/1D, standard Triton kernel pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:13:17 -07:00
parent 08812eda58
commit 63669f82cb
35 changed files with 813 additions and 219 deletions
@@ -5,7 +5,7 @@ from typing import Any
 import simpy

 from kernbench.common.types import Completion, RequestHandle, Trace
-import kernbench.components.impls  # noqa: F401 — registers built-in implementations
+import kernbench.components.builtin  # noqa: F401 — registers built-in implementations
 from kernbench.components.base import ComponentBase, ComponentRegistry
 from kernbench.components.context import ComponentContext
 from kernbench.policy.address.phyaddr import PhysAddr