Add SIP-level tensor parallelism, component registry YAML, VA offset verification

- DPPolicy: 3-level (sip/cube/pe), unified naming (column_wise/row_wise)
- PE_CPU: auto num_programs from cube shard count
- context.launch(): per-SIP KernelLaunchMsg with local va_base + auto local shape
- deploy_tensor: removed mmus param, MMU mapping is context-only responsibility
- ComponentRegistry: YAML-based lazy loading (components.yaml), impls→builtin rename
- VA offset bench + tests: 2D/1D, standard Triton kernel pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-26 01:13:17 -07:00
parent 08812eda58
commit 63669f82cb
35 changed files with 813 additions and 219 deletions
+3 -3
View File
@@ -163,11 +163,11 @@ DefaultComponent ← 안전한 fallback
## 슬라이드 7 — Registry 등록 방식
```python
# kernbench/components/impls/__init__.py
# kernbench/components/builtin/__init__.py
from kernbench.components.base import ComponentRegistry
from kernbench.components.impls.noc import TwoDMeshNocComponent
from kernbench.components.impls.io_cpu import IoCpuComponent
from kernbench.components.builtin.noc import TwoDMeshNocComponent
from kernbench.components.builtin.io_cpu import IoCpuComponent
# ...
ComponentRegistry.register("noc_2d_mesh_v1", TwoDMeshNocComponent)