e9cc40f74d
mesh_2d, torus_2d, and mesh_2d_no_wrap accept optional w,h kwargs; sqrt fall-back preserved for square layouts (back-compat tests confirm 4-SIP and 9-SIP square configs still work). sfr_config reads system.sips.w/h from spec and threads dims through to the topology fn. test_allreduce_multidevice CONFIGS switched from 4 SIPs (square) to 6 SIPs: ring_1d_6sip, torus_2d_6sip_2x3, mesh_2d_no_wrap_6sip_2x3. _write_temp_configs writes system.sips.w/h when supplied; _sip_topo_dims reads them back. Latency sweep loop also moved to 6-SIP layouts. Linear-scale plot variants dropped -- only log-scale *.png + summary.csv emitted. Plots in tests/allreduce_latency_plots regenerated. New tests/test_sip_topology_rectangular.py asserts neighbor correctness for 2x3 layouts and back-compat for square fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.2 KiB
2.2 KiB
| 1 | algorithm | sip_topology | n_sips | n_elem | bytes_per_pe | bytes_per_sip | latency_ns |
|---|---|---|---|---|---|---|---|
| 2 | intercube_allreduce | ring_1d | 6 | 8 | 16 | 256 | 3073.1299999999937 |
| 3 | intercube_allreduce | ring_1d | 6 | 32 | 64 | 1024 | 3079.8799999999947 |
| 4 | intercube_allreduce | ring_1d | 6 | 64 | 128 | 2048 | 3088.879999999992 |
| 5 | intercube_allreduce | ring_1d | 6 | 128 | 256 | 4096 | 3106.8799999999865 |
| 6 | intercube_allreduce | ring_1d | 6 | 512 | 1024 | 16384 | 3225.8799999999865 |
| 7 | intercube_allreduce | ring_1d | 6 | 1024 | 2048 | 32768 | 3391.8799999999865 |
| 8 | intercube_allreduce | ring_1d | 6 | 2048 | 4096 | 65536 | 3723.8799999999865 |
| 9 | intercube_allreduce | ring_1d | 6 | 4096 | 8192 | 131072 | 4387.879999999965 |
| 10 | intercube_allreduce | ring_1d | 6 | 8192 | 16384 | 262144 | 5715.879999999957 |
| 11 | intercube_allreduce | ring_1d | 6 | 16384 | 32768 | 524288 | 8371.879999999932 |
| 12 | intercube_allreduce | ring_1d | 6 | 32768 | 65536 | 1048576 | 13683.879999999903 |
| 13 | intercube_allreduce | torus_2d | 6 | 8 | 16 | 256 | 2190.4799999999923 |
| 14 | intercube_allreduce | torus_2d | 6 | 32 | 64 | 1024 | 2196.479999999993 |
| 15 | intercube_allreduce | torus_2d | 6 | 64 | 128 | 2048 | 2204.4799999999905 |
| 16 | intercube_allreduce | torus_2d | 6 | 128 | 256 | 4096 | 2220.479999999985 |
| 17 | intercube_allreduce | torus_2d | 6 | 512 | 1024 | 16384 | 2325.479999999985 |
| 18 | intercube_allreduce | torus_2d | 6 | 1024 | 2048 | 32768 | 2471.479999999985 |
| 19 | intercube_allreduce | torus_2d | 6 | 2048 | 4096 | 65536 | 2763.479999999985 |
| 20 | intercube_allreduce | torus_2d | 6 | 4096 | 8192 | 131072 | 3347.4799999999777 |
| 21 | intercube_allreduce | torus_2d | 6 | 8192 | 16384 | 262144 | 4515.4799999999705 |
| 22 | intercube_allreduce | torus_2d | 6 | 16384 | 32768 | 524288 | 6851.479999999952 |
| 23 | intercube_allreduce | torus_2d | 6 | 32768 | 65536 | 1048576 | 11523.479999999923 |
| 24 | intercube_allreduce | mesh_2d_no_wrap | 6 | 8 | 16 | 256 | 3508.4249999999993 |
| 25 | intercube_allreduce | mesh_2d_no_wrap | 6 | 32 | 64 | 1024 | 3515.55 |
| 26 | intercube_allreduce | mesh_2d_no_wrap | 6 | 64 | 128 | 2048 | 3525.0499999999975 |
| 27 | intercube_allreduce | mesh_2d_no_wrap | 6 | 128 | 256 | 4096 | 3544.049999999992 |
| 28 | intercube_allreduce | mesh_2d_no_wrap | 6 | 512 | 1024 | 16384 | 3667.049999999992 |
| 29 | intercube_allreduce | mesh_2d_no_wrap | 6 | 1024 | 2048 | 32768 | 3837.049999999992 |
| 30 | intercube_allreduce | mesh_2d_no_wrap | 6 | 2048 | 4096 | 65536 | 4177.049999999992 |
| 31 | intercube_allreduce | mesh_2d_no_wrap | 6 | 4096 | 8192 | 131072 | 4857.049999999959 |
| 32 | intercube_allreduce | mesh_2d_no_wrap | 6 | 8192 | 16384 | 262144 | 6217.049999999945 |
| 33 | intercube_allreduce | mesh_2d_no_wrap | 6 | 16384 | 32768 | 524288 | 8937.049999999937 |
| 34 | intercube_allreduce | mesh_2d_no_wrap | 6 | 32768 | 65536 | 1048576 | 14377.049999999872 |