Files
kernbench2/docs/diagrams/allreduce_latency_plots/summary.csv
T
mukesh 1c5752a9ec Intercube allreduce: center root + bidirectional reduce
Move the algorithmic root cube from the corner (cube_w-1,
cube_h-1) to the geometric center (cube_w//2, cube_h//2) and
have each phase converge bidirectionally so the intra-SIP
critical path drops from ~12 hops to ~8 hops on a 4×4 mesh
(left half W→E + right half E→W in row reduce; top half N→S +
bottom half S→N in col reduce; mirrored on broadcast).

Result on torus_2d 6 SIPs at 96 KB / PE on TCM:
  before (corner root)  : 22.0 µs
  after  (center root)  : 17.2 µs   (−22%)

Same shape on ring_1d (−7%) and mesh_2d_no_wrap (−12%); also
holds across SRAM and HBM (~−20% each).

Phase 1 test (test_intercube_root_center.py) asserts the
torus_2d 96 KB latency drops below 20.5 µs and that all 96
cubes still validate (correctness preserved).

Plot updates:
- overview.png: replace constant 10.6 µs theoretical line with
  user-supplied hand-derived curve (per-cube packet count =
  bytes_per_pe × 8 PEs ÷ 128 B; 1346 ns startup + 1.20 ns/pkt).
- All summary.csv numbers and per-topology PNGs regenerated.
- pe2pe_latency_plots and ipcq diagram emitter PNGs refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:28:58 -07:00

2.4 KiB

1algorithmsip_topologyn_sipsn_elembytes_per_pebytes_per_siplatency_ns
2intercube_allreducemesh_2d_no_wrap68162562626.302499999998
3intercube_allreducemesh_2d_no_wrap6326410242634.7399999999952
4intercube_allreducemesh_2d_no_wrap66412820482645.9899999999925
5intercube_allreducemesh_2d_no_wrap612825640962668.489999999987
6intercube_allreducemesh_2d_no_wrap65121024163842812.489999999987
7intercube_allreducemesh_2d_no_wrap610242048327683010.489999999987
8intercube_allreducemesh_2d_no_wrap620484096655363406.489999999987
9intercube_allreducemesh_2d_no_wrap6409681921310724198.489999999965
10intercube_allreducemesh_2d_no_wrap68192163842621445782.489999999969
11intercube_allreducemesh_2d_no_wrap616384327685242888950.489999999925
12intercube_allreducemesh_2d_no_wrap63276865536104857615286.48999999986
13intercube_allreducemesh_2d_no_wrap64915298304157286421622.489999999932
14intercube_allreducering_1d68162562302.9849999999933
15intercube_allreducering_1d6326410242310.8599999999906
16intercube_allreducering_1d66412820482321.359999999988
17intercube_allreducering_1d612825640962342.3599999999824
18intercube_allreducering_1d65121024163842479.3599999999824
19intercube_allreducering_1d610242048327682669.3599999999824
20intercube_allreducering_1d620484096655363049.3599999999824
21intercube_allreducering_1d6409681921310723809.3599999999715
22intercube_allreducering_1d68192163842621445329.359999999979
23intercube_allreducering_1d616384327685242888369.35999999992
24intercube_allreducering_1d63276865536104857614449.359999999899
25intercube_allreducering_1d64915298304157286420529.35999999997
26intercube_allreducetorus_2d68162561644.2899999999936
27intercube_allreducetorus_2d6326410241651.0399999999909
28intercube_allreducetorus_2d66412820481660.0399999999881
29intercube_allreducetorus_2d612825640961678.0399999999827
30intercube_allreducetorus_2d65121024163841795.0399999999827
31intercube_allreducetorus_2d610242048327681957.0399999999827
32intercube_allreducetorus_2d620484096655362281.0399999999827
33intercube_allreducetorus_2d6409681921310722929.039999999979
34intercube_allreducetorus_2d68192163842621444225.039999999986
35intercube_allreducetorus_2d616384327685242886817.039999999943
36intercube_allreducetorus_2d63276865536104857612001.03999999992
37intercube_allreducetorus_2d64915298304157286417185.039999999994