a76487ca48
User asked to surface system-wide congestion (more accurate than
single-cube), bring back the latency-breakdown plot under a separate
filename, and rename the obscure ``streaming`` category.
Scenarios:
Renamed all_pe_to_pe0 → all_pe_cube0_to_pe0 (clarify cube scope).
Added two SIP-wide scenarios:
sip_local_all — every PE in sip0 (128 total) accesses its own
local slice. All paths disjoint (each PE owns
its own hbm_ctrl.peX), so the model should
scale linearly with cube count.
sip_hotspot_pe0 — every PE in sip0 (128 total) targets
sip0.cube0.pe0_slice. Worst-case hotspot:
UCIe inbound + r0c0→hbm_ctrl.pe0 saturated.
Each bar now carries an ``N=...`` annotation showing the issuer
count, and the chart titles say the scope explicitly.
Effective BW + util at 16 KB:
sip_local_all N=128 eff= 27.2 TB/s util_a= 83 %
sip_hotspot_pe0 N=128 eff= 134 GB/s util_a= 93 %
(UCIe-into-cube0 saturated)
Plots:
no_congestion.png + congestion.png — Effective BW utilization
(two bars: single vs aggregate peak)
breakdown_no_congestion.png +
breakdown_congestion.png — stacked latency breakdown
(renamed from previous)
summary.csv with columns for both views.
The visual y-cap on BW utilization is 150 %. Bars exceeding it (e.g.
sip_local_all's util_single = 10,639 %) are drawn at the cap with an
upward arrow and the real value annotated. The verification rule for
``util_single`` is loosened to ``≤ n_issuers × 100 % + 5 %`` so
massively-parallel disjoint scenarios pass.
Category renamed: ``streaming`` → ``wire_transfer``. It is the
bulk-transfer time = (n_flits − 1) × flit_bytes / bottleneck_bw — the
cost of streaming the rest of the payload through the slowest wire
after the first flit has arrived.
All checks PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.0 KiB
5.0 KiB
| 1 | graph | scenario | label | nbytes | n_issuers | total_ns | makespan_ns | min_lat_ns | peak_single_bw_gbs | peak_aggregate_bw_gbs | effective_bw_gbs | util_single_pct | util_aggregate_pct | pe_setup | noc_mesh | ucie | fabric | wire_transfer | hbm_ctrl | contention | path | first_path |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | no_congestion | local | SAME_CUBE PE_LOCAL | 16384 | 1 | 77.0 | 256.0 | 256.0 | 212.7792207792208 | 83.11688311688312 | 83.11688311688312 | 1.0 | 2.0 | 0.0 | 0.0 | 63.0 | 9.0 | 2.0 | pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0 | |||
| 3 | no_congestion | same_cube_best | SAME_CUBE REMOTE_BEST (pe0→pe1) | 16384 | 1 | 82.06 | 256.0 | 256.0 | 199.6587862539605 | 77.99171338045332 | 77.99171338045332 | 1.0 | 5.03 | 0.0 | 0.0 | 63.0 | 9.0 | 4.030000000000001 | pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1 | |||
| 4 | no_congestion | same_cube_worst | SAME_CUBE REMOTE_WORST (pe0→pe7) | 16384 | 1 | 117.50000000000001 | 256.0 | 256.0 | 139.4382978723404 | 54.46808510638297 | 54.46808510638297 | 1.0 | 26.25 | 0.0 | 0.0 | 63.0 | 9.0 | 18.250000000000014 | pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7 | |||
| 5 | no_congestion | remote_cube_best | REMOTE_CUBE REMOTE_BEST (cube0→cube1) | 16384 | 1 | 202.51999999999998 | 128.0 | 128.0 | 80.90065178747778 | 63.20363420896702 | 63.20363420896702 | 1.0 | 6.0 | 32.510000000000005 | 0.0 | 126.0 | 9.0 | 28.00999999999999 | pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0 | |||
| 6 | no_congestion | remote_cube_worst | REMOTE_CUBE REMOTE_WORST (cube0→cube15.pe7) | 16384 | 1 | 573.1199999999999 | 128.0 | 128.0 | 28.587381351200452 | 22.333891680625353 | 22.333891680625353 | 1.0 | 30.0 | 219.05999999999995 | 0.0 | 126.0 | 9.0 | 188.05999999999995 | pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7 | |||
| 7 | no_congestion | remote_sip | REMOTE_SIP SAME_CUBE_SAME_PE (sip0→sip1) | 16384 | 1 | 408.5216666666663 | 128.0 | 128.0 | 40.10558395515541 | 31.332487464965165 | 31.332487464965165 | 1.0 | 4.0 | 37.040000000000006 | 22.09666666666667 | 126.0 | 9.0 | 209.38499999999962 | pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0 | |||
| 8 | congestion | ctrl_hot_1 | cube0 1×PE → pe0_slice | 16384 | 1 | 82.06 | 82.06 | 256.0 | 256.0 | 199.6587862539605 | 77.99171338045332 | 77.99171338045332 | 1.0 | 5.03 | 0.0 | 0.0 | 63.0 | 9.0 | 4.030000000000001 | pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0 | ||
| 9 | congestion | ctrl_hot_2 | cube0 2×PE → pe0_slice | 16384 | 2 | 158.3450000000001 | 134.2400000000001 | 256.0 | 256.0 | 206.94054122327813 | 80.83614891534302 | 80.83614891534302 | 1.0 | 5.03 | 0.0 | 0.0 | 63.0 | 9.0 | 80.31500000000011 | pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0 | ||
| 10 | congestion | ctrl_hot_3 | cube0 3×PE → pe0_slice | 16384 | 3 | 230.0750000000001 | 139.94000000000008 | 256.0 | 256.0 | 213.6346843420623 | 83.45104857111808 | 83.45104857111808 | 1.0 | 5.03 | 0.0 | 0.0 | 63.0 | 9.0 | 152.0450000000001 | pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0 | ||
| 11 | congestion | ucie_eastbound | cube0 8×PE corresp. → cube1 | 16384 | 8 | 962.52 | 438.52 | 128.0 | 159.99999999999997 | 136.17587167019906 | 106.387399742343 | 85.10991979387443 | 1.0 | 6.0 | 32.510000000000005 | 0.0 | 126.0 | 9.0 | 788.01 | pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0 | ||
| 12 | congestion | all_pe_cube0_to_pe0 | cube0 8×PE → pe0_slice | 16384 | 8 | 558.2499999999998 | 195.0 | 256.0 | 256.0 | 234.7908643081058 | 91.71518137035383 | 91.71518137035383 | 1.0 | 2.0 | 0.0 | 0.0 | 63.0 | 9.0 | 483.2499999999998 | pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0 | ||
| 13 | congestion | sip_local_all | sip0 128×PE → own slice | 16384 | 128 | 77.0 | 77.0 | 256.0 | 32768.0 | 27235.74025974026 | 10638.961038961039 | 83.11688311688312 | 1.0 | 2.0 | 0.0 | 0.0 | 63.0 | 9.0 | 2.0 | pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0 | ||
| 14 | congestion | sip_hotspot_pe0 | sip0 128×PE → cube0.pe0_slice | 16384 | 128 | 15618.595000000001 | 204.0 | 256.0 | 143.9999999999998 | 134.2727690935068 | 52.4503004271511 | 93.24497853715764 | 1.0 | 2.0 | 0.0 | 0.0 | 63.0 | 9.0 | 15543.595000000001 | pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0 |