ywkang
eb792e6212
Remove xbar/noc remnants, rule-based cube-view connectors
...
- Delete xbar.py and noc.py (TwoDMeshNocComponent) — unused since router mesh
- Remove xbar_v1/noc_2d_mesh_v1 from components.yaml
- Fix pe_to_xbar → pe_to_router in routing exclusion set
- Fix xbar_to_hbm_bw_gbs → hbm_to_router_bw_gbs in report.py
- Update all docstrings/comments referencing xbar/bridge → router mesh
- Cube-view connectors: rule-based _connector_points helper
- PE↔router: single diagonal line (not chevron)
- UCIe N/S: 45°→horizontal→45°
- UCIe E/W: 45°→vertical→45°
- HBM ports: 45°→horizontal→45°
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-06 23:59:12 -07:00
ywkang
5917b3497c
Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)
...
- Remove xbar_top/bot, bridge, single noc node from topology
- Each cube_mesh.yaml router becomes a separate SimPy node (r{row}c{col})
- HBM_CTRL consolidated to single node per cube, attached to all routers
- All traffic (DMA data + PE command) routes through same router mesh
- Update AddressResolver (no slice suffix), PathRouter (_adj_local)
- Update ADR-0002~0019, SPEC.md to remove xbar/bridge references
- Regenerate SVG diagrams for new topology structure
- Skip cross-SIP PE_TCM and PE_MMU routing tests (not yet wired)
326 passed, 13 skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-04 17:51:28 -07:00
ywkang
63669f82cb
Add SIP-level tensor parallelism, component registry YAML, VA offset verification
...
- DPPolicy: 3-level (sip/cube/pe), unified naming (column_wise/row_wise)
- PE_CPU: auto num_programs from cube shard count
- context.launch(): per-SIP KernelLaunchMsg with local va_base + auto local shape
- deploy_tensor: removed mmus param, MMU mapping is context-only responsibility
- ComponentRegistry: YAML-based lazy loading (components.yaml), impls→builtin rename
- VA offset bench + tests: 2D/1D, standard Triton kernel pattern
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-26 01:13:17 -07:00
ywkang
d75da439c6
Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep
...
- Probe CLI: restructured output (tables first, routes below), per-hop
timestamps, split cross-cube into best/worst cases, D2H read section
- UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix
cross-cube-best < cross-half latency inversion
- HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing
effective BW from 256 to 204.8 GB/s
- Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases
- Probe default data size: 4KB -> 32KB for more realistic measurements
- IOChiplet NOC + D2H topology and tests
- NOC mesh, xbar, BW occupancy components and tests
- Cube mesh visualization diagram
278 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-19 01:16:18 -07:00
ywkang
6f43807900
commit - release 1
2026-03-18 11:47:48 -07:00