Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep
- Probe CLI: restructured output (tables first, routes below), per-hop timestamps, split cross-cube into best/worst cases, D2H read section - UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix cross-cube-best < cross-half latency inversion - HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing effective BW from 256 to 204.8 GB/s - Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases - Probe default data size: 4KB -> 32KB for more realistic measurements - IOChiplet NOC + D2H topology and tests - NOC mesh, xbar, BW occupancy components and tests - Cube mesh visualization diagram 278 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+10
-7
@@ -327,11 +327,13 @@ def test_formula_latency_lower_bound():
|
||||
assert formula > 0, "formula must be > 0"
|
||||
|
||||
|
||||
def test_formula_latency_exact_no_contention():
|
||||
"""With no contention, formula should approximate actual for PE DMA.
|
||||
def test_formula_latency_lower_bound_no_contention():
|
||||
"""With no contention, formula is a lower bound for PE DMA.
|
||||
|
||||
PE DMA is single-request with no fan-out or aggregation,
|
||||
so formula ≈ actual (within small tolerance for SimPy scheduling).
|
||||
PE DMA routes through NOC, which applies internal mesh traversal
|
||||
latency (XY routing based on physical positions) not captured by the
|
||||
formula (NOC edges have distance_mm=0 since NOC is distributed).
|
||||
Formula <= actual is the invariant.
|
||||
"""
|
||||
from kernbench.runtime_api.kernel import PeDmaMsg
|
||||
from kernbench.policy.address.phyaddr import PhysAddr as PA
|
||||
@@ -360,10 +362,11 @@ def test_formula_latency_exact_no_contention():
|
||||
_, trace = engine.get_completion(h)
|
||||
actual = trace["total_ns"]
|
||||
|
||||
# No contention: formula should equal actual
|
||||
assert abs(formula - actual) < 0.01, (
|
||||
f"formula ({formula:.4f}) ≈ actual ({actual:.4f}) expected with no contention"
|
||||
# Formula is a lower bound; NOC internal traversal adds latency
|
||||
assert formula <= actual + 0.01, (
|
||||
f"formula ({formula:.4f}) must be <= actual ({actual:.4f})"
|
||||
)
|
||||
assert actual > 0
|
||||
|
||||
|
||||
# ── 10. remote cube access succeeds with higher latency ────────────
|
||||
|
||||
Reference in New Issue
Block a user