PE_DMA perf: dual-peak utilisation (single-path + aggregate)
Each scenario now shows TWO bars:
util_single = effective_bw / single-path peak × 100
(peak = min bw_gbs on first issuer's path)
util_aggregate = effective_bw / aggregate-resource peak × 100
(peak = max-min fair share across concurrent paths)
Aggregate peak uses a max-min fair-share computation: each concurrent
path's sustainable share on an edge is bw_gbs / usage_count, the
per-path throughput is the min share along its edges, and the aggregate
peak is the sum across paths. This produces the correct answer for both
shared-bottleneck scenarios (N paths converge on one wire → aggregate =
wire BW) and multi-lane shared resources (UCIe's 4 connections used in
parallel → aggregate ≈ 4 × per-conn BW), without enumerating max-flow.
Single-issuer (no_congestion) → util_single == util_aggregate by
definition. Congestion exposes the divergence:
ctrl_hot_{1,2,3}, all_pe_to_pe0 → both metrics agree (one shared
bottleneck: r0c0→hbm_ctrl.pe0 @ 256 GB/s)
8×PE eastbound → util_single=106 % (single conn @ 128 GB/s) but
util_aggregate=85 % (UCIe-W.conn0 @ 7-way shared,
aggregate peak ≈ 160 GB/s under the current
cross-cube routing that funnels via cube1.r0c0).
Verification updated to assert:
(2) util_aggregate ≤ 100 % (effective BW can't exceed the aggregate
resource peak, by construction).
(3) single-issuer util_single == util_aggregate.
(7) ucie_eastbound: util_aggregate is meaningfully smaller than
util_single (the multi-lane peak correction is observable).
CSV grows with peak_aggregate_bw_gbs and util_aggregate_pct columns;
breakdown columns retained.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Binary file not shown.
|
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 68 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 67 KiB After Width: | Height: | Size: 73 KiB |
@@ -1,24 +1,24 @@
|
||||
graph,scenario,label,nbytes,n_issuers,total_ns,makespan_ns,min_lat_ns,bottleneck_bw_gbs,effective_bw_gbs,util_pct,pe_setup,noc_mesh,ucie,fabric,streaming,hbm_ctrl,contention,path,first_path
|
||||
graph,scenario,label,nbytes,n_issuers,total_ns,makespan_ns,min_lat_ns,peak_single_bw_gbs,peak_aggregate_bw_gbs,effective_bw_gbs,util_single_pct,util_aggregate_pct,pe_setup,noc_mesh,ucie,fabric,streaming,hbm_ctrl,contention,path,first_path
|
||||
no_congestion,local,"SAME_CUBE
|
||||
PE_LOCAL",16384,1,77.0,,,256.0,212.7792207792208,83.11688311688312,1.0,2.0,0.0,0.0,63.0,9.0,2.0,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0,
|
||||
PE_LOCAL",16384,1,77.0,,,256.0,256.0,212.7792207792208,83.11688311688312,83.11688311688312,1.0,2.0,0.0,0.0,63.0,9.0,2.0,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0,
|
||||
no_congestion,same_cube_best,"SAME_CUBE
|
||||
REMOTE_BEST
|
||||
(pe0→pe1)",16384,1,82.06,,,256.0,199.6587862539605,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1,
|
||||
(pe0→pe1)",16384,1,82.06,,,256.0,256.0,199.6587862539605,77.99171338045332,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1,
|
||||
no_congestion,same_cube_worst,"SAME_CUBE
|
||||
REMOTE_WORST
|
||||
(pe0→pe7)",16384,1,117.50000000000001,,,256.0,139.4382978723404,54.46808510638297,1.0,26.25,0.0,0.0,63.0,9.0,18.250000000000014,pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7,
|
||||
(pe0→pe7)",16384,1,117.50000000000001,,,256.0,256.0,139.4382978723404,54.46808510638297,54.46808510638297,1.0,26.25,0.0,0.0,63.0,9.0,18.250000000000014,pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7,
|
||||
no_congestion,remote_cube_best,"REMOTE_CUBE
|
||||
REMOTE_BEST
|
||||
(cube0→cube1)",16384,1,202.51999999999998,,,128.0,80.90065178747778,63.20363420896702,1.0,6.0,32.510000000000005,0.0,126.0,9.0,28.00999999999999,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0,
|
||||
(cube0→cube1)",16384,1,202.51999999999998,,,128.0,128.0,80.90065178747778,63.20363420896702,63.20363420896702,1.0,6.0,32.510000000000005,0.0,126.0,9.0,28.00999999999999,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0,
|
||||
no_congestion,remote_cube_worst,"REMOTE_CUBE
|
||||
REMOTE_WORST
|
||||
(cube0→cube15.pe7)",16384,1,573.1199999999999,,,128.0,28.587381351200452,22.333891680625353,1.0,30.0,219.05999999999995,0.0,126.0,9.0,188.05999999999995,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7,
|
||||
(cube0→cube15.pe7)",16384,1,573.1199999999999,,,128.0,128.0,28.587381351200452,22.333891680625353,22.333891680625353,1.0,30.0,219.05999999999995,0.0,126.0,9.0,188.05999999999995,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7,
|
||||
no_congestion,remote_sip,"REMOTE_SIP
|
||||
SAME_CUBE_SAME_PE
|
||||
(sip0→sip1)",16384,1,408.5216666666663,,,128.0,40.10558395515541,31.332487464965165,1.0,4.0,37.040000000000006,22.09666666666667,126.0,9.0,209.38499999999962,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0,
|
||||
congestion,ctrl_hot_1,1×PE → pe0_slice,16384,1,,82.06,82.06,256.0,199.6587862539605,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_2,2×PE → pe0_slice,16384,2,,158.3450000000001,134.2400000000001,256.0,206.94054122327813,80.83614891534302,1.0,5.03,0.0,0.0,63.0,9.0,80.31500000000011,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_3,3×PE → pe0_slice,16384,3,,230.0750000000001,139.94000000000008,256.0,213.6346843420623,83.45104857111808,1.0,5.03,0.0,0.0,63.0,9.0,152.0450000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
(sip0→sip1)",16384,1,408.5216666666663,,,128.0,128.0,40.10558395515541,31.332487464965165,31.332487464965165,1.0,4.0,37.040000000000006,22.09666666666667,126.0,9.0,209.38499999999962,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0,
|
||||
congestion,ctrl_hot_1,1×PE → pe0_slice,16384,1,,82.06,82.06,256.0,256.0,199.6587862539605,77.99171338045332,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_2,2×PE → pe0_slice,16384,2,,158.3450000000001,134.2400000000001,256.0,256.0,206.94054122327813,80.83614891534302,80.83614891534302,1.0,5.03,0.0,0.0,63.0,9.0,80.31500000000011,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_3,3×PE → pe0_slice,16384,3,,230.0750000000001,139.94000000000008,256.0,256.0,213.6346843420623,83.45104857111808,83.45104857111808,1.0,5.03,0.0,0.0,63.0,9.0,152.0450000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ucie_eastbound,"8×PE corresp.
|
||||
cube0→cube1",16384,8,,962.52,438.52,128.0,136.17587167019906,106.387399742343,1.0,6.0,32.510000000000005,0.0,126.0,9.0,788.01,,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,all_pe_to_pe0,8×PE → pe0_slice,16384,8,,558.2499999999998,195.0,256.0,234.7908643081058,91.71518137035383,1.0,2.0,0.0,0.0,63.0,9.0,483.2499999999998,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
cube0→cube1",16384,8,,962.52,438.52,128.0,159.99999999999997,136.17587167019906,106.387399742343,85.10991979387443,1.0,6.0,32.510000000000005,0.0,126.0,9.0,788.01,,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,all_pe_to_pe0,8×PE → pe0_slice,16384,8,,558.2499999999998,195.0,256.0,256.0,234.7908643081058,91.71518137035383,91.71518137035383,1.0,2.0,0.0,0.0,63.0,9.0,483.2499999999998,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
|
||||
|
Reference in New Issue
Block a user