Latency model: HBM PC striping + chunk-loop drain (ADR-0033)

Previous model double-counted slow-upstream paths (e.g., 64KB via UCIe
128 GB/s was ~2x pessimistic). HBM CTRL now distributes bursts across
8 pseudo-channels via global round-robin, with per-chunk commit timing
that pipelines correctly against the bottleneck link's data arrival.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 21:59:07 -07:00
parent f6d262e359
commit 5fdb6f8797
11 changed files with 1192 additions and 52 deletions
+9 -3
View File
@@ -380,12 +380,18 @@ def test_pe_dma_record_start_after_channel_acquire():
)
durations = [r.t_end - r.t_start for r in dma_records]
# All three should have the same actual transfer time within ±1 ns.
# All three should have similar transfer time. Under the PC striping
# model (ADR-0033 D1), per-PC `available_at` state introduces small
# timing differences between consecutive same-direction reads to the
# same PC set (the second read's start = max(eff_start, pc_avail[pc])).
# Tolerance widened from ±1ns to ±3ns to absorb this variance without
# weakening the invariant that queue wait is excluded from the recorded
# interval (still validated by the t_start >= prev_end check below).
base = durations[0]
assert base > 0, f"first dma duration must be positive, got {base}"
for i, d in enumerate(durations):
assert abs(d - base) <= 1.0, (
f"op {i} duration {d} differs from baseline {base} by >1 ns "
assert abs(d - base) <= 3.0, (
f"op {i} duration {d} differs from baseline {base} by >3 ns "
f"— record_start may still be including queue wait"
)