PE_DMA perf: SIP-wide scenarios + dual outputs + clearer naming
User asked to surface system-wide congestion (more accurate than
single-cube), bring back the latency-breakdown plot under a separate
filename, and rename the obscure ``streaming`` category.
Scenarios:
Renamed all_pe_to_pe0 → all_pe_cube0_to_pe0 (clarify cube scope).
Added two SIP-wide scenarios:
sip_local_all — every PE in sip0 (128 total) accesses its own
local slice. All paths disjoint (each PE owns
its own hbm_ctrl.peX), so the model should
scale linearly with cube count.
sip_hotspot_pe0 — every PE in sip0 (128 total) targets
sip0.cube0.pe0_slice. Worst-case hotspot:
UCIe inbound + r0c0→hbm_ctrl.pe0 saturated.
Each bar now carries an ``N=...`` annotation showing the issuer
count, and the chart titles say the scope explicitly.
Effective BW + util at 16 KB:
sip_local_all N=128 eff= 27.2 TB/s util_a= 83 %
sip_hotspot_pe0 N=128 eff= 134 GB/s util_a= 93 %
(UCIe-into-cube0 saturated)
Plots:
no_congestion.png + congestion.png — Effective BW utilization
(two bars: single vs aggregate peak)
breakdown_no_congestion.png +
breakdown_congestion.png — stacked latency breakdown
(renamed from previous)
summary.csv with columns for both views.
The visual y-cap on BW utilization is 150 %. Bars exceeding it (e.g.
sip_local_all's util_single = 10,639 %) are drawn at the cap with an
upward arrow and the real value annotated. The verification rule for
``util_single`` is loosened to ``≤ n_issuers × 100 % + 5 %`` so
massively-parallel disjoint scenarios pass.
Category renamed: ``streaming`` → ``wire_transfer``. It is the
bulk-transfer time = (n_flits − 1) × flit_bytes / bottleneck_bw — the
cost of streaming the rest of the payload through the slowest wire
after the first flit has arrived.
All checks PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
graph,scenario,label,nbytes,n_issuers,total_ns,makespan_ns,min_lat_ns,peak_single_bw_gbs,peak_aggregate_bw_gbs,effective_bw_gbs,util_single_pct,util_aggregate_pct,pe_setup,noc_mesh,ucie,fabric,streaming,hbm_ctrl,contention,path,first_path
|
||||
graph,scenario,label,nbytes,n_issuers,total_ns,makespan_ns,min_lat_ns,peak_single_bw_gbs,peak_aggregate_bw_gbs,effective_bw_gbs,util_single_pct,util_aggregate_pct,pe_setup,noc_mesh,ucie,fabric,wire_transfer,hbm_ctrl,contention,path,first_path
|
||||
no_congestion,local,"SAME_CUBE
|
||||
PE_LOCAL",16384,1,77.0,,,256.0,256.0,212.7792207792208,83.11688311688312,83.11688311688312,1.0,2.0,0.0,0.0,63.0,9.0,2.0,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0,
|
||||
no_congestion,same_cube_best,"SAME_CUBE
|
||||
@@ -16,9 +16,18 @@ REMOTE_WORST
|
||||
no_congestion,remote_sip,"REMOTE_SIP
|
||||
SAME_CUBE_SAME_PE
|
||||
(sip0→sip1)",16384,1,408.5216666666663,,,128.0,128.0,40.10558395515541,31.332487464965165,31.332487464965165,1.0,4.0,37.040000000000006,22.09666666666667,126.0,9.0,209.38499999999962,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0,
|
||||
congestion,ctrl_hot_1,1×PE → pe0_slice,16384,1,,82.06,82.06,256.0,256.0,199.6587862539605,77.99171338045332,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_2,2×PE → pe0_slice,16384,2,,158.3450000000001,134.2400000000001,256.0,256.0,206.94054122327813,80.83614891534302,80.83614891534302,1.0,5.03,0.0,0.0,63.0,9.0,80.31500000000011,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_3,3×PE → pe0_slice,16384,3,,230.0750000000001,139.94000000000008,256.0,256.0,213.6346843420623,83.45104857111808,83.45104857111808,1.0,5.03,0.0,0.0,63.0,9.0,152.0450000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ucie_eastbound,"8×PE corresp.
|
||||
cube0→cube1",16384,8,,962.52,438.52,128.0,159.99999999999997,136.17587167019906,106.387399742343,85.10991979387443,1.0,6.0,32.510000000000005,0.0,126.0,9.0,788.01,,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,all_pe_to_pe0,8×PE → pe0_slice,16384,8,,558.2499999999998,195.0,256.0,256.0,234.7908643081058,91.71518137035383,91.71518137035383,1.0,2.0,0.0,0.0,63.0,9.0,483.2499999999998,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_1,"cube0
|
||||
1×PE → pe0_slice",16384,1,,82.06,82.06,256.0,256.0,199.6587862539605,77.99171338045332,77.99171338045332,1.0,5.03,0.0,0.0,63.0,9.0,4.030000000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_2,"cube0
|
||||
2×PE → pe0_slice",16384,2,,158.3450000000001,134.2400000000001,256.0,256.0,206.94054122327813,80.83614891534302,80.83614891534302,1.0,5.03,0.0,0.0,63.0,9.0,80.31500000000011,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ctrl_hot_3,"cube0
|
||||
3×PE → pe0_slice",16384,3,,230.0750000000001,139.94000000000008,256.0,256.0,213.6346843420623,83.45104857111808,83.45104857111808,1.0,5.03,0.0,0.0,63.0,9.0,152.0450000000001,,pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,ucie_eastbound,"cube0
|
||||
8×PE corresp.
|
||||
→ cube1",16384,8,,962.52,438.52,128.0,159.99999999999997,136.17587167019906,106.387399742343,85.10991979387443,1.0,6.0,32.510000000000005,0.0,126.0,9.0,788.01,,pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,all_pe_cube0_to_pe0,"cube0
|
||||
8×PE → pe0_slice",16384,8,,558.2499999999998,195.0,256.0,256.0,234.7908643081058,91.71518137035383,91.71518137035383,1.0,2.0,0.0,0.0,63.0,9.0,483.2499999999998,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,sip_local_all,"sip0
|
||||
128×PE → own slice",16384,128,,77.0,77.0,256.0,32768.0,27235.74025974026,10638.961038961039,83.11688311688312,1.0,2.0,0.0,0.0,63.0,9.0,2.0,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
congestion,sip_hotspot_pe0,"sip0
|
||||
128×PE → cube0.pe0_slice",16384,128,,15618.595000000001,204.0,256.0,143.9999999999998,134.2727690935068,52.4503004271511,93.24497853715764,1.0,2.0,0.0,0.0,63.0,9.0,15543.595000000001,,pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
|
||||
|
||||
|
Reference in New Issue
Block a user