Files
kernbench2/docs/diagrams/allreduce_latency_plots/summary.csv
T
mukesh b610cb0d9a sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/
Convert the multidevice allreduce correctness + latency/buffer-kind sweeps
to run through the real PyTorch-distributed path
(init_process_group(backend="ahbm") -> mp.spawn -> dist.all_reduce) instead
of direct ctx.launch, and reorganize the CCL/allreduce tests into a
tests/sccl/ package split one test per file.

Production change (required for the distributed path on non-square SIP grids):
- AhbmCCLBackend now reads explicit system.sips.w/h from the spec, with a
  square-only sqrt fallback that raises on ambiguity, instead of silently
  guessing round(sqrt(count)). This fixes the 2x3 / 3x2 torus + mesh cases,
  which previously resolved to a wrong 2x2 grid. Mirrors the test helper's
  _sip_topo_dims precedence (explicit w/h > square fallback > raise).

Test reorganization (tests/sccl/):
- _allreduce_helpers.py: shared plumbing (distributed driver, config writers,
  direct-launch run_allreduce parity reference, sweep/buffer-kind constants,
  plot aggregators, topology-diagram + FSIM-comparison emitters).
- test_allreduce_ring_torus_mesh.py: correctness across ring/torus/mesh.
- test_distributed_default_topology.py: full distributed path on topology.yaml.
- test_plot_latency_sweep.py / test_plot_buffer_kind_sweep.py: sweep rows.
- test_plot_topology_diagram.py / test_plot_comparison_fsim.py: plot emitters.
- test_intercube_root_center.py: moved in (ADR-0032 center-root latency guard).

Also:
- Move the FSIM comparison plot generator out of scripts/ into the sccl suite.
- Delete superseded test files (test_allreduce_multidevice,
  test_distributed_lrab_hierarchical_allreduce, test_allreduce_buffer_kind_sweep)
  and repoint conftest aggregators + the ipcq buffer-kind importers.
- Regenerate the allreduce_latency_plots derived artifacts from the full sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 22:24:43 -07:00

2.7 KiB

1algorithmsip_topologyn_sipsn_elembytes_per_pebytes_per_siplatency_ns
2lrab_hierarchical_allreducemesh_2d_no_wrap68162562666.552500000015
3lrab_hierarchical_allreducemesh_2d_no_wrap6326410242747.7400000000152
4lrab_hierarchical_allreducemesh_2d_no_wrap66412820482855.990000000018
5lrab_hierarchical_allreducemesh_2d_no_wrap612825640963072.490000000019
6lrab_hierarchical_allreducemesh_2d_no_wrap65121024163843337.1133333333582
7lrab_hierarchical_allreducemesh_2d_no_wrap610242048327683708.0333333333692
8lrab_hierarchical_allreducemesh_2d_no_wrap620484096655364449.873333333393
9lrab_hierarchical_allreducemesh_2d_no_wrap6409681921310725933.020000000124
10lrab_hierarchical_allreducemesh_2d_no_wrap68192163842621448900.379999999863
11lrab_hierarchical_allreducemesh_2d_no_wrap6163843276852428814835.099999999224
12lrab_hierarchical_allreducemesh_2d_no_wrap63276865536104857626704.540000000765
13lrab_hierarchical_allreducemesh_2d_no_wrap64915298304157286438573.97999999701
14lrab_hierarchical_allreducering_1d68162562365.255833333347
15lrab_hierarchical_allreducering_1d6326410242436.9433333333473
16lrab_hierarchical_allreducering_1d66412820482532.526666666683
17lrab_hierarchical_allreducering_1d612825640962723.693333333349
18lrab_hierarchical_allreducering_1d65121024163843048.635000000021
19lrab_hierarchical_allreducering_1d610242048327683393.4016666666957
20lrab_hierarchical_allreducering_1d620484096655364082.401666666714
21lrab_hierarchical_allreducering_1d6409681921310725458.80166666677
22lrab_hierarchical_allreducering_1d68192163842621448216.934999999943
23lrab_hierarchical_allreducering_1d6163843276852428813733.201666665835
24lrab_hierarchical_allreducering_1d63276865536104857624765.73500000064
25lrab_hierarchical_allreducering_1d64915298304157286435798.268333331536
26lrab_hierarchical_allreducetorus_2d68162561700.6025000000095
27lrab_hierarchical_allreducetorus_2d6326410241753.2900000000102
28lrab_hierarchical_allreducetorus_2d66412820481823.540000000012
29lrab_hierarchical_allreducetorus_2d612825640961964.040000000012
30lrab_hierarchical_allreducetorus_2d65121024163842196.8183333333463
31lrab_hierarchical_allreducetorus_2d610242048327682477.2783333333473
32lrab_hierarchical_allreducetorus_2d620484096655363038.1983333333583
33lrab_hierarchical_allreducetorus_2d6409681921310724159.5050000000665
34lrab_hierarchical_allreducetorus_2d68192163842621446403.185000000109
35lrab_hierarchical_allreducetorus_2d6163843276852428810890.5449999995
36lrab_hierarchical_allreducetorus_2d63276865536104857619865.265000000378
37lrab_hierarchical_allreducetorus_2d64915298304157286428839.98500000059