b610cb0d9a
Convert the multidevice allreduce correctness + latency/buffer-kind sweeps to run through the real PyTorch-distributed path (init_process_group(backend="ahbm") -> mp.spawn -> dist.all_reduce) instead of direct ctx.launch, and reorganize the CCL/allreduce tests into a tests/sccl/ package split one test per file. Production change (required for the distributed path on non-square SIP grids): - AhbmCCLBackend now reads explicit system.sips.w/h from the spec, with a square-only sqrt fallback that raises on ambiguity, instead of silently guessing round(sqrt(count)). This fixes the 2x3 / 3x2 torus + mesh cases, which previously resolved to a wrong 2x2 grid. Mirrors the test helper's _sip_topo_dims precedence (explicit w/h > square fallback > raise). Test reorganization (tests/sccl/): - _allreduce_helpers.py: shared plumbing (distributed driver, config writers, direct-launch run_allreduce parity reference, sweep/buffer-kind constants, plot aggregators, topology-diagram + FSIM-comparison emitters). - test_allreduce_ring_torus_mesh.py: correctness across ring/torus/mesh. - test_distributed_default_topology.py: full distributed path on topology.yaml. - test_plot_latency_sweep.py / test_plot_buffer_kind_sweep.py: sweep rows. - test_plot_topology_diagram.py / test_plot_comparison_fsim.py: plot emitters. - test_intercube_root_center.py: moved in (ADR-0032 center-root latency guard). Also: - Move the FSIM comparison plot generator out of scripts/ into the sccl suite. - Delete superseded test files (test_allreduce_multidevice, test_distributed_lrab_hierarchical_allreduce, test_allreduce_buffer_kind_sweep) and repoint conftest aggregators + the ipcq buffer-kind importers. - Regenerate the allreduce_latency_plots derived artifacts from the full sweep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.7 KiB
2.7 KiB
| 1 | algorithm | sip_topology | n_sips | n_elem | bytes_per_pe | bytes_per_sip | latency_ns |
|---|---|---|---|---|---|---|---|
| 2 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 8 | 16 | 256 | 2666.552500000015 |
| 3 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 32 | 64 | 1024 | 2747.7400000000152 |
| 4 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 64 | 128 | 2048 | 2855.990000000018 |
| 5 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 128 | 256 | 4096 | 3072.490000000019 |
| 6 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 512 | 1024 | 16384 | 3337.1133333333582 |
| 7 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 1024 | 2048 | 32768 | 3708.0333333333692 |
| 8 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 2048 | 4096 | 65536 | 4449.873333333393 |
| 9 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 4096 | 8192 | 131072 | 5933.020000000124 |
| 10 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 8192 | 16384 | 262144 | 8900.379999999863 |
| 11 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 16384 | 32768 | 524288 | 14835.099999999224 |
| 12 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 32768 | 65536 | 1048576 | 26704.540000000765 |
| 13 | lrab_hierarchical_allreduce | mesh_2d_no_wrap | 6 | 49152 | 98304 | 1572864 | 38573.97999999701 |
| 14 | lrab_hierarchical_allreduce | ring_1d | 6 | 8 | 16 | 256 | 2365.255833333347 |
| 15 | lrab_hierarchical_allreduce | ring_1d | 6 | 32 | 64 | 1024 | 2436.9433333333473 |
| 16 | lrab_hierarchical_allreduce | ring_1d | 6 | 64 | 128 | 2048 | 2532.526666666683 |
| 17 | lrab_hierarchical_allreduce | ring_1d | 6 | 128 | 256 | 4096 | 2723.693333333349 |
| 18 | lrab_hierarchical_allreduce | ring_1d | 6 | 512 | 1024 | 16384 | 3048.635000000021 |
| 19 | lrab_hierarchical_allreduce | ring_1d | 6 | 1024 | 2048 | 32768 | 3393.4016666666957 |
| 20 | lrab_hierarchical_allreduce | ring_1d | 6 | 2048 | 4096 | 65536 | 4082.401666666714 |
| 21 | lrab_hierarchical_allreduce | ring_1d | 6 | 4096 | 8192 | 131072 | 5458.80166666677 |
| 22 | lrab_hierarchical_allreduce | ring_1d | 6 | 8192 | 16384 | 262144 | 8216.934999999943 |
| 23 | lrab_hierarchical_allreduce | ring_1d | 6 | 16384 | 32768 | 524288 | 13733.201666665835 |
| 24 | lrab_hierarchical_allreduce | ring_1d | 6 | 32768 | 65536 | 1048576 | 24765.73500000064 |
| 25 | lrab_hierarchical_allreduce | ring_1d | 6 | 49152 | 98304 | 1572864 | 35798.268333331536 |
| 26 | lrab_hierarchical_allreduce | torus_2d | 6 | 8 | 16 | 256 | 1700.6025000000095 |
| 27 | lrab_hierarchical_allreduce | torus_2d | 6 | 32 | 64 | 1024 | 1753.2900000000102 |
| 28 | lrab_hierarchical_allreduce | torus_2d | 6 | 64 | 128 | 2048 | 1823.540000000012 |
| 29 | lrab_hierarchical_allreduce | torus_2d | 6 | 128 | 256 | 4096 | 1964.040000000012 |
| 30 | lrab_hierarchical_allreduce | torus_2d | 6 | 512 | 1024 | 16384 | 2196.8183333333463 |
| 31 | lrab_hierarchical_allreduce | torus_2d | 6 | 1024 | 2048 | 32768 | 2477.2783333333473 |
| 32 | lrab_hierarchical_allreduce | torus_2d | 6 | 2048 | 4096 | 65536 | 3038.1983333333583 |
| 33 | lrab_hierarchical_allreduce | torus_2d | 6 | 4096 | 8192 | 131072 | 4159.5050000000665 |
| 34 | lrab_hierarchical_allreduce | torus_2d | 6 | 8192 | 16384 | 262144 | 6403.185000000109 |
| 35 | lrab_hierarchical_allreduce | torus_2d | 6 | 16384 | 32768 | 524288 | 10890.5449999995 |
| 36 | lrab_hierarchical_allreduce | torus_2d | 6 | 32768 | 65536 | 1048576 | 19865.265000000378 |
| 37 | lrab_hierarchical_allreduce | torus_2d | 6 | 49152 | 98304 | 1572864 | 28839.98500000059 |