Latency model: HBM PC striping + chunk-loop drain (ADR-0033)

Previous model double-counted slow-upstream paths (e.g., 64KB via UCIe 128 GB/s was ~2x pessimistic). HBM CTRL now distributes bursts across 8 pseudo-channels via global round-robin, with per-chunk commit timing that pipelines correctly against the bottleneck link's data arrival. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:59:07 -07:00
parent f6d262e359
commit 5fdb6f8797
11 changed files with 1192 additions and 52 deletions
@@ -380,12 +380,18 @@ def test_pe_dma_record_start_after_channel_acquire():
    )

    durations = [r.t_end - r.t_start for r in dma_records]
-    # All three should have the same actual transfer time within ±1 ns.
+    # All three should have similar transfer time. Under the PC striping
+    # model (ADR-0033 D1), per-PC `available_at` state introduces small
+    # timing differences between consecutive same-direction reads to the
+    # same PC set (the second read's start = max(eff_start, pc_avail[pc])).
+    # Tolerance widened from ±1ns to ±3ns to absorb this variance without
+    # weakening the invariant that queue wait is excluded from the recorded
+    # interval (still validated by the t_start >= prev_end check below).
    base = durations[0]
    assert base > 0, f"first dma duration must be positive, got {base}"
    for i, d in enumerate(durations):
-        assert abs(d - base) <= 1.0, (
-            f"op {i} duration {d} differs from baseline {base} by >1 ns "
+        assert abs(d - base) <= 3.0, (
+            f"op {i} duration {d} differs from baseline {base} by >3 ns "
            f"— record_start may still be including queue wait"
        )