Switch PE_DMA perf plots to Effective BW utilization

Replaces the latency-breakdown stacked bars with a single utilization
bar per scenario. Each bar shows ``effective_bw / peak_bottleneck_bw``
with both values annotated, and a horizontal "single-path peak" line at
100 %. The colour band (green ≥70 %, amber ≥40 %, red <40 %) makes the
no-congestion distance roll-off scannable at a glance.

Definitions:
  effective_bw = (total bytes transferred) / wall-clock time
    no_congestion: nbytes / total_ns
    congestion:    n_issuers × nbytes / makespan_ns  (aggregate)
  peak_bw      = min(edge.bw_gbs) on first issuer's path
  util_pct     = effective_bw / peak_bw × 100

The congestion graph shows that 8×PE eastbound exceeds 100 % of a
single-path peak (106.4 %): UCIe-N's 4 connections × 128 GB/s give
512 GB/s of aggregate eastbound capacity, so concurrent issuers across
disjoint conns sum past any single conn's 128 GB/s. The 8×PE→pe0_slice
hotspot reaches 91.7 %, almost saturating the shared r0c0→hbm_ctrl.pe0
bottleneck — the simulator's address-based PC striping + per-flit
arbitration model amortises the cost cleanly.

Self-verification updated to BW invariants:
  (1) effective BW shrinks as topological distance grows
  (2) util_pct ∈ (0, 250 %]
  (3) single-issuer util_pct ≤ 100 %
  (4) effective_bw = nbytes / total_ns for single requests
  (5) congestion aggregate BW grows monotonically with issuer count
      on the hot-target series
  (6) 8-PE all-hit-pe0 saturates ≥ 70 % of shared peak

All checks PASS at the current model.

The CSV retains all breakdown components (pe_setup, noc_mesh, ucie,
fabric, streaming, hbm_ctrl, contention) so a future replot can still
recover the latency-breakdown view without re-running the simulator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-15 07:59:45 -07:00
parent a759d58007
commit 0bf220fed0
4 changed files with 155 additions and 151 deletions
Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 67 KiB

+12 -12
View File
@@ -1,24 +1,24 @@
graph,scenario,label,nbytes,n_issuers,total_ns,makespan_ns,min_lat_ns,pe_setup,noc_mesh,ucie,hbm_ctrl,contention,path,first_path
graph,scenario,label,nbytes,n_issuers,total_ns,makespan_ns,min_lat_ns,bottleneck_bw_gbs,effective_bw_gbs,util_pct,pe_setup,noc_mesh,ucie,fabric,streaming,hbm_ctrl,contention,path,first_path
no_congestion,local,"SAME_CUBE
PE_LOCAL",16384,,77.0,,,1.0,2.0,0.0,9.0,2.0,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0,
PE_LOCAL",16384,1,77.0,,,256.0,212.7792207792208,83.11688311688312,1.0,2.0,0.0,0.0,63.0,9.0,2.0,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0,
no_congestion,same_cube_best,"SAME_CUBE
REMOTE_BEST
(pe0→pe1)",16384,,82.06,,,1.0,5.03,0.0,9.0,4.030000000000001,pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1,
(pe0→pe1)",16384,1,82.06,,,256.0,199.6587862539605,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1,
no_congestion,same_cube_worst,"SAME_CUBE
REMOTE_WORST
(pe0→pe7)",16384,,117.50000000000001,,,1.0,26.25,0.0,9.0,18.250000000000014,pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7,
(pe0→pe7)",16384,1,117.50000000000001,,,256.0,139.4382978723404,54.46808510638297,1.0,26.25,0.0,0.0,63.0,9.0,18.250000000000014,pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7,
no_congestion,remote_cube_best,"REMOTE_CUBE
REMOTE_BEST
(cube0→cube1)",16384,,202.51999999999998,,,1.0,6.0,32.510000000000005,9.0,28.00999999999999,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0,
(cube0→cube1)",16384,1,202.51999999999998,,,128.0,80.90065178747778,63.20363420896702,1.0,6.0,32.510000000000005,0.0,126.0,9.0,28.00999999999999,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0,
no_congestion,remote_cube_worst,"REMOTE_CUBE
REMOTE_WORST
(cube0→cube15.pe7)",16384,,573.1199999999999,,,1.0,30.0,219.05999999999995,9.0,188.05999999999995,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7,
(cube0→cube15.pe7)",16384,1,573.1199999999999,,,128.0,28.587381351200452,22.333891680625353,1.0,30.0,219.05999999999995,0.0,126.0,9.0,188.05999999999995,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7,
no_congestion,remote_sip,"REMOTE_SIP
SAME_CUBE_SAME_PE
(sip0→sip1)",16384,,408.5216666666663,,,1.0,4.0,37.040000000000006,9.0,209.38499999999962,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0,
congestion,ctrl_hot_1,1×PE → pe0_slice,16384,1,,82.06,82.06,1.0,5.03,0.0,9.0,4.030000000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
congestion,ctrl_hot_2,2×PE → pe0_slice,16384,2,,158.3450000000001,134.2400000000001,1.0,5.03,0.0,9.0,80.31500000000011,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
congestion,ctrl_hot_3,3×PE → pe0_slice,16384,3,,230.0750000000001,139.94000000000008,1.0,5.03,0.0,9.0,152.0450000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
(sip0→sip1)",16384,1,408.5216666666663,,,128.0,40.10558395515541,31.332487464965165,1.0,4.0,37.040000000000006,22.09666666666667,126.0,9.0,209.38499999999962,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0,
congestion,ctrl_hot_1,1×PE → pe0_slice,16384,1,,82.06,82.06,256.0,199.6587862539605,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
congestion,ctrl_hot_2,2×PE → pe0_slice,16384,2,,158.3450000000001,134.2400000000001,256.0,206.94054122327813,80.83614891534302,1.0,5.03,0.0,0.0,63.0,9.0,80.31500000000011,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
congestion,ctrl_hot_3,3×PE → pe0_slice,16384,3,,230.0750000000001,139.94000000000008,256.0,213.6346843420623,83.45104857111808,1.0,5.03,0.0,0.0,63.0,9.0,152.0450000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
congestion,ucie_eastbound,"8×PE corresp.
cube0→cube1",16384,8,,962.52,438.52,1.0,6.0,32.510000000000005,9.0,788.01,,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
congestion,all_pe_to_pe0,8×PE → pe0_slice,16384,8,,558.2499999999998,195.0,1.0,2.0,0.0,9.0,483.2499999999998,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
cube0→cube1",16384,8,,962.52,438.52,128.0,136.17587167019906,106.387399742343,1.0,6.0,32.510000000000005,0.0,126.0,9.0,788.01,,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
congestion,all_pe_to_pe0,8×PE → pe0_slice,16384,8,,558.2499999999998,195.0,256.0,234.7908643081058,91.71518137035383,1.0,2.0,0.0,0.0,63.0,9.0,483.2499999999998,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
1 graph scenario label nbytes n_issuers total_ns makespan_ns min_lat_ns bottleneck_bw_gbs effective_bw_gbs util_pct pe_setup noc_mesh ucie fabric streaming hbm_ctrl contention path first_path
2 no_congestion local SAME_CUBE PE_LOCAL 16384 1 77.0 256.0 212.7792207792208 83.11688311688312 1.0 2.0 0.0 0.0 63.0 9.0 2.0 pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
3 no_congestion same_cube_best SAME_CUBE REMOTE_BEST (pe0→pe1) 16384 1 82.06 256.0 199.6587862539605 77.99171338045332 1.0 5.03 0.0 0.0 63.0 9.0 4.030000000000001 pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1
4 no_congestion same_cube_worst SAME_CUBE REMOTE_WORST (pe0→pe7) 16384 1 117.50000000000001 256.0 139.4382978723404 54.46808510638297 1.0 26.25 0.0 0.0 63.0 9.0 18.250000000000014 pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7
5 no_congestion remote_cube_best REMOTE_CUBE REMOTE_BEST (cube0→cube1) 16384 1 202.51999999999998 128.0 80.90065178747778 63.20363420896702 1.0 6.0 32.510000000000005 0.0 126.0 9.0 28.00999999999999 pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
6 no_congestion remote_cube_worst REMOTE_CUBE REMOTE_WORST (cube0→cube15.pe7) 16384 1 573.1199999999999 128.0 28.587381351200452 22.333891680625353 1.0 30.0 219.05999999999995 0.0 126.0 9.0 188.05999999999995 pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7
7 no_congestion remote_sip REMOTE_SIP SAME_CUBE_SAME_PE (sip0→sip1) 16384 1 408.5216666666663 128.0 40.10558395515541 31.332487464965165 1.0 4.0 37.040000000000006 22.09666666666667 126.0 9.0 209.38499999999962 pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0
8 congestion ctrl_hot_1 1×PE → pe0_slice 16384 1 82.06 82.06 256.0 199.6587862539605 77.99171338045332 1.0 5.03 0.0 0.0 63.0 9.0 4.030000000000001 pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
9 congestion ctrl_hot_2 2×PE → pe0_slice 16384 2 158.3450000000001 134.2400000000001 256.0 206.94054122327813 80.83614891534302 1.0 5.03 0.0 0.0 63.0 9.0 80.31500000000011 pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
10 congestion ctrl_hot_3 3×PE → pe0_slice 16384 3 230.0750000000001 139.94000000000008 256.0 213.6346843420623 83.45104857111808 1.0 5.03 0.0 0.0 63.0 9.0 152.0450000000001 pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
11 congestion ucie_eastbound 8×PE corresp. cube0→cube1 16384 8 962.52 438.52 128.0 136.17587167019906 106.387399742343 1.0 6.0 32.510000000000005 0.0 126.0 9.0 788.01 pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
12 congestion all_pe_to_pe0 8×PE → pe0_slice 16384 8 558.2499999999998 195.0 256.0 234.7908643081058 91.71518137035383 1.0 2.0 0.0 0.0 63.0 9.0 483.2499999999998 pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
13
14
15
16
17
18
19
20
21
22
23
24