ywkang
|
81ce55571d
|
Rename impl names: add builtin. prefix for clear provenance
- components.yaml: all builtin impls use builtin.xxx naming
- topology.yaml: all impl references updated to builtin.xxx
- builder.py: hardcoded ucie impl → builtin.ucie
- Tests: all impl string references updated
Convention: builtin.<name> for built-in, custom.<name> for user-defined.
382 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-09 00:16:24 -07:00 |
|
ywkang
|
eb792e6212
|
Remove xbar/noc remnants, rule-based cube-view connectors
- Delete xbar.py and noc.py (TwoDMeshNocComponent) — unused since router mesh
- Remove xbar_v1/noc_2d_mesh_v1 from components.yaml
- Fix pe_to_xbar → pe_to_router in routing exclusion set
- Fix xbar_to_hbm_bw_gbs → hbm_to_router_bw_gbs in report.py
- Update all docstrings/comments referencing xbar/bridge → router mesh
- Cube-view connectors: rule-based _connector_points helper
- PE↔router: single diagonal line (not chevron)
- UCIe N/S: 45°→horizontal→45°
- UCIe E/W: 45°→vertical→45°
- HBM ports: 45°→horizontal→45°
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-06 23:59:12 -07:00 |
|
ywkang
|
63669f82cb
|
Add SIP-level tensor parallelism, component registry YAML, VA offset verification
- DPPolicy: 3-level (sip/cube/pe), unified naming (column_wise/row_wise)
- PE_CPU: auto num_programs from cube shard count
- context.launch(): per-SIP KernelLaunchMsg with local va_base + auto local shape
- deploy_tensor: removed mmus param, MMU mapping is context-only responsibility
- ComponentRegistry: YAML-based lazy loading (components.yaml), impls→builtin rename
- VA offset bench + tests: 2D/1D, standard Triton kernel pattern
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-26 01:13:17 -07:00 |
|
ywkang
|
d75da439c6
|
Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep
- Probe CLI: restructured output (tables first, routes below), per-hop
timestamps, split cross-cube into best/worst cases, D2H read section
- UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix
cross-cube-best < cross-half latency inversion
- HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing
effective BW from 256 to 204.8 GB/s
- Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases
- Probe default data size: 4KB -> 32KB for more realistic measurements
- IOChiplet NOC + D2H topology and tests
- NOC mesh, xbar, BW occupancy components and tests
- Cube mesh visualization diagram
278 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-19 01:16:18 -07:00 |
|
ywkang
|
6f43807900
|
commit - release 1
|
2026-03-18 11:47:48 -07:00 |
|