Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep

- Probe CLI: restructured output (tables first, routes below), per-hop
  timestamps, split cross-cube into best/worst cases, D2H read section
- UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix
  cross-cube-best < cross-half latency inversion
- HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing
  effective BW from 256 to 204.8 GB/s
- Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases
- Probe default data size: 4KB -> 32KB for more realistic measurements
- IOChiplet NOC + D2H topology and tests
- NOC mesh, xbar, BW occupancy components and tests
- Cube mesh visualization diagram

278 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-19 01:16:18 -07:00
parent 6f43807900
commit d75da439c6
24 changed files with 3456 additions and 501 deletions
+2 -2
View File
@@ -513,7 +513,7 @@ def test_pe_cpu_overhead_timing():
overhead_ns = engine2._env.now
# Overhead kernel should take 100 cycles more
assert overhead_ns == base_ns + 100, (
assert abs(overhead_ns - (base_ns + 100)) < 1e-6, (
f"Expected {base_ns + 100}ns with overhead, got {overhead_ns}ns"
)
clear_registry()
@@ -1072,7 +1072,7 @@ def test_multi_cube_kernel_launch():
assert comp2.ok is True
assert single_ns > 0
assert multi_ns > 0
assert multi_ns >= single_ns, (
assert multi_ns >= single_ns - 0.01, (
f"Multi-cube ({multi_ns}ns) should be >= single-cube ({single_ns}ns)"
)