Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep

- Probe CLI: restructured output (tables first, routes below), per-hop timestamps, split cross-cube into best/worst cases, D2H read section - UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix cross-cube-best < cross-half latency inversion - HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing effective BW from 256 to 204.8 GB/s - Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases - Probe default data size: 4KB -> 32KB for more realistic measurements - IOChiplet NOC + D2H topology and tests - NOC mesh, xbar, BW occupancy components and tests - Cube mesh visualization diagram 278 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 01:16:18 -07:00
parent 6f43807900
commit d75da439c6
24 changed files with 3456 additions and 501 deletions
@@ -2,7 +2,7 @@

 ## Status

-Proposed
+Accepted

 ## Context

@@ -43,22 +43,33 @@ Each directed edge (src → dst) results in:

 ---

-### D2. Wire process (propagation delay)
+### D2. Wire process (propagation delay + BW occupancy)

 For each directed edge (src, dst) in the topology graph, a SimPy wire process
-models propagation delay:
+models propagation delay and BW occupancy:

 ```python
-def wire_process(env, out_port, in_port, delay_ns):
+def wire_process(env, out_port, in_port, delay_ns, bw_gbs):
+    available_at = 0.0
    while True:
        cmd = yield out_port.get()
+        if bw_gbs > 0:
+            nbytes = getattr(cmd, "nbytes", 0)
+            if nbytes > 0:
+                wait = available_at - env.now
+                if wait > 0:
+                    yield env.timeout(wait)
+                available_at = env.now + (nbytes / bw_gbs)
        yield env.timeout(delay_ns)
        yield in_port.put(cmd)
 ```

 Wire processes are started at engine initialization.
-BW constraints are enforced by the sending component's out_port capacity or token model,
-not by the wire process itself.
+Each directed edge maintains an `available_at` timestamp tracking when the link
+becomes free for the next transaction. When a transaction occupies a link, the
+next transaction on the same directed link must wait until occupancy clears
+(back-to-back serialization). TX and RX directions are independent (separate
+wire processes with separate `available_at` state).

 ---