Files
kernbench2/docs/diagrams/pe_dma_perf/summary.csv
T
ywkang a143925a12 PE_DMA perf: dual-peak utilisation (single-path + aggregate)
Each scenario now shows TWO bars:

  util_single    = effective_bw / single-path peak × 100
                   (peak = min bw_gbs on first issuer's path)
  util_aggregate = effective_bw / aggregate-resource peak × 100
                   (peak = max-min fair share across concurrent paths)

Aggregate peak uses a max-min fair-share computation: each concurrent
path's sustainable share on an edge is bw_gbs / usage_count, the
per-path throughput is the min share along its edges, and the aggregate
peak is the sum across paths. This produces the correct answer for both
shared-bottleneck scenarios (N paths converge on one wire → aggregate =
wire BW) and multi-lane shared resources (UCIe's 4 connections used in
parallel → aggregate ≈ 4 × per-conn BW), without enumerating max-flow.

Single-issuer (no_congestion) → util_single == util_aggregate by
definition. Congestion exposes the divergence:
  ctrl_hot_{1,2,3}, all_pe_to_pe0 → both metrics agree (one shared
                    bottleneck: r0c0→hbm_ctrl.pe0 @ 256 GB/s)
  8×PE eastbound → util_single=106 % (single conn @ 128 GB/s) but
                    util_aggregate=85 % (UCIe-W.conn0 @ 7-way shared,
                    aggregate peak ≈ 160 GB/s under the current
                    cross-cube routing that funnels via cube1.r0c0).

Verification updated to assert:
  (2) util_aggregate ≤ 100 % (effective BW can't exceed the aggregate
      resource peak, by construction).
  (3) single-issuer util_single == util_aggregate.
  (7) ucie_eastbound: util_aggregate is meaningfully smaller than
      util_single (the multi-lane peak correction is observable).

CSV grows with peak_aggregate_bw_gbs and util_aggregate_pct columns;
breakdown columns retained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 08:53:00 -07:00

4.5 KiB

1graphscenariolabelnbytesn_issuerstotal_nsmakespan_nsmin_lat_nspeak_single_bw_gbspeak_aggregate_bw_gbseffective_bw_gbsutil_single_pctutil_aggregate_pctpe_setupnoc_meshuciefabricstreaminghbm_ctrlcontentionpathfirst_path
2no_congestionlocalSAME_CUBE PE_LOCAL16384177.0256.0256.0212.779220779220883.1168831168831283.116883116883121.02.00.00.063.09.02.0pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0
3no_congestionsame_cube_bestSAME_CUBE REMOTE_BEST (pe0→pe1)16384182.06256.0256.0199.658786253960577.9917133804533277.991713380453321.05.030.00.063.09.04.030000000000001pe0.pe_dma -> cube0.r0c0 -> cube0.r0c1 -> hbm_ctrl.pe1
4no_congestionsame_cube_worstSAME_CUBE REMOTE_WORST (pe0→pe7)163841117.50000000000001256.0256.0139.438297872340454.4680851063829754.468085106382971.026.250.00.063.09.018.250000000000014pe0.pe_dma -> cube0.r0c0 -> cube0.r1c0 -> cube0.r1c1 -> cube0.r1c2 -> cube0.r1c3 -> cube0.r4c3 -> cube0.r4c4 -> cube0.r5c4 -> cube0.r5c5 -> hbm_ctrl.pe7
5no_congestionremote_cube_bestREMOTE_CUBE REMOTE_BEST (cube0→cube1)163841202.51999999999998128.0128.080.9006517874777863.2036342089670263.203634208967021.06.032.5100000000000050.0126.09.028.00999999999999pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
6no_congestionremote_cube_worstREMOTE_CUBE REMOTE_WORST (cube0→cube15.pe7)163841573.1199999999999128.0128.028.58738135120045222.33389168062535322.3338916806253531.030.0219.059999999999950.0126.09.0188.05999999999995pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> ucie-N.conn0 -> cube1.ucie-N -> ucie-N.conn3 -> cube1.r0c5 -> ucie-E.conn0 -> cube1.ucie-E -> cube2.ucie-W -> ucie-W.conn0 -> cube2.r0c0 -> ucie-N.conn0 -> cube2.ucie-N -> ucie-N.conn3 -> cube2.r0c5 -> ucie-E.conn0 -> cube2.ucie-E -> cube3.ucie-W -> ucie-W.conn0 -> cube3.r0c0 -> ucie-N.conn0 -> cube3.ucie-N -> ucie-N.conn3 -> cube3.r0c5 -> ucie-E.conn0 -> cube3.ucie-E -> ucie-E.conn3 -> cube3.r5c5 -> ucie-S.conn3 -> cube3.ucie-S -> cube7.ucie-N -> ucie-N.conn3 -> cube7.r0c5 -> ucie-E.conn0 -> cube7.ucie-E -> ucie-E.conn3 -> cube7.r5c5 -> ucie-S.conn3 -> cube7.ucie-S -> cube11.ucie-N -> ucie-N.conn3 -> cube11.r0c5 -> ucie-E.conn0 -> cube11.ucie-E -> ucie-E.conn3 -> cube11.r5c5 -> ucie-S.conn3 -> cube11.ucie-S -> cube15.ucie-N -> ucie-N.conn3 -> cube15.r0c5 -> ucie-E.conn0 -> cube15.ucie-E -> ucie-E.conn3 -> cube15.r5c5 -> hbm_ctrl.pe7
7no_congestionremote_sipREMOTE_SIP SAME_CUBE_SAME_PE (sip0→sip1)163841408.5216666666663128.0128.040.1055839551554131.33248746496516531.3324874649651651.04.037.04000000000000622.09666666666667126.09.0209.38499999999962pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> io0.ucie-P0 -> ucie-P0.conn0 -> io0.noc -> io0.pcie_ep -> fabric.switch0 -> io0.pcie_ep -> io0.noc -> ucie-P0.conn0 -> io0.ucie-P0 -> cube0.ucie-N -> ucie-N.conn0 -> cube0.r0c0 -> hbm_ctrl.pe0
8congestionctrl_hot_11×PE → pe0_slice16384182.0682.06256.0256.0199.658786253960577.9917133804533277.991713380453321.05.030.00.063.09.04.030000000000001pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
9congestionctrl_hot_22×PE → pe0_slice163842158.3450000000001134.2400000000001256.0256.0206.9405412232781380.8361489153430280.836148915343021.05.030.00.063.09.080.31500000000011pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
10congestionctrl_hot_33×PE → pe0_slice163843230.0750000000001139.94000000000008256.0256.0213.634684342062383.4510485711180883.451048571118081.05.030.00.063.09.0152.0450000000001pe1.pe_dma -> cube0.r0c1 -> cube0.r0c0 -> hbm_ctrl.pe0
11congestionucie_eastbound8×PE corresp. cube0→cube1163848962.52438.52128.0159.99999999999997136.17587167019906106.38739974234385.109919793874431.06.032.5100000000000050.0126.09.0788.01pe0.pe_dma -> cube0.r0c0 -> ucie-N.conn0 -> cube0.ucie-N -> ucie-N.conn3 -> cube0.r0c5 -> ucie-E.conn0 -> cube0.ucie-E -> cube1.ucie-W -> ucie-W.conn0 -> cube1.r0c0 -> hbm_ctrl.pe0
12congestionall_pe_to_pe08×PE → pe0_slice163848558.2499999999998195.0256.0256.0234.790864308105891.7151813703538391.715181370353831.02.00.00.063.09.0483.2499999999998pe0.pe_dma -> cube0.r0c0 -> hbm_ctrl.pe0