a44f832be5
Allreduce + pe2pe + ipcq + pe_view auto-regenerated by test sweeps running against the new chunk-streaming wire timing (per-flit wormhole) — absolute numbers shift upward to reflect bottleneck-link transit charged once per flit (instead of the previous cut-through subtraction at HBM CTRL). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.5 KiB
2.5 KiB
| 1 | algorithm | sip_topology | n_sips | n_elem | bytes_per_pe | bytes_per_sip | latency_ns |
|---|---|---|---|---|---|---|---|
| 2 | intercube_allreduce | mesh_2d_no_wrap | 6 | 8 | 16 | 256 | 2666.5524999999725 |
| 3 | intercube_allreduce | mesh_2d_no_wrap | 6 | 32 | 64 | 1024 | 2747.7399999999725 |
| 4 | intercube_allreduce | mesh_2d_no_wrap | 6 | 64 | 128 | 2048 | 2855.98999999998 |
| 5 | intercube_allreduce | mesh_2d_no_wrap | 6 | 128 | 256 | 4096 | 3072.4899999999725 |
| 6 | intercube_allreduce | mesh_2d_no_wrap | 6 | 512 | 1024 | 16384 | 3336.579999999951 |
| 7 | intercube_allreduce | mesh_2d_no_wrap | 6 | 1024 | 2048 | 32768 | 3707.49999999992 |
| 8 | intercube_allreduce | mesh_2d_no_wrap | 6 | 2048 | 4096 | 65536 | 4449.339999999875 |
| 9 | intercube_allreduce | mesh_2d_no_wrap | 6 | 4096 | 8192 | 131072 | 5933.020000000055 |
| 10 | intercube_allreduce | mesh_2d_no_wrap | 6 | 8192 | 16384 | 262144 | 8900.380000000157 |
| 11 | intercube_allreduce | mesh_2d_no_wrap | 6 | 16384 | 32768 | 524288 | 14835.099999997583 |
| 12 | intercube_allreduce | mesh_2d_no_wrap | 6 | 32768 | 65536 | 1048576 | 26704.540000017492 |
| 13 | intercube_allreduce | mesh_2d_no_wrap | 6 | 49152 | 98304 | 1572864 | 38573.980000026335 |
| 14 | intercube_allreduce | ring_1d | 6 | 8 | 16 | 256 | 2365.2558333333036 |
| 15 | intercube_allreduce | ring_1d | 6 | 32 | 64 | 1024 | 2436.9433333333036 |
| 16 | intercube_allreduce | ring_1d | 6 | 64 | 128 | 2048 | 2532.526666666643 |
| 17 | intercube_allreduce | ring_1d | 6 | 128 | 256 | 4096 | 2723.6933333333036 |
| 18 | intercube_allreduce | ring_1d | 6 | 512 | 1024 | 16384 | 3042.0349999999544 |
| 19 | intercube_allreduce | ring_1d | 6 | 1024 | 2048 | 32768 | 3390.201666666597 |
| 20 | intercube_allreduce | ring_1d | 6 | 2048 | 4096 | 65536 | 4079.7349999998714 |
| 21 | intercube_allreduce | ring_1d | 6 | 4096 | 8192 | 131072 | 5458.801666666721 |
| 22 | intercube_allreduce | ring_1d | 6 | 8192 | 16384 | 262144 | 8216.93500000014 |
| 23 | intercube_allreduce | ring_1d | 6 | 16384 | 32768 | 524288 | 13733.201666664638 |
| 24 | intercube_allreduce | ring_1d | 6 | 32768 | 65536 | 1048576 | 24765.735000014545 |
| 25 | intercube_allreduce | ring_1d | 6 | 49152 | 98304 | 1572864 | 35798.268333355256 |
| 26 | intercube_allreduce | torus_2d | 6 | 8 | 16 | 256 | 1700.6024999999754 |
| 27 | intercube_allreduce | torus_2d | 6 | 32 | 64 | 1024 | 1753.2899999999754 |
| 28 | intercube_allreduce | torus_2d | 6 | 64 | 128 | 2048 | 1823.539999999979 |
| 29 | intercube_allreduce | torus_2d | 6 | 128 | 256 | 4096 | 1964.0399999999754 |
| 30 | intercube_allreduce | torus_2d | 6 | 512 | 1024 | 16384 | 2196.2849999999653 |
| 31 | intercube_allreduce | torus_2d | 6 | 1024 | 2048 | 32768 | 2476.74499999995 |
| 32 | intercube_allreduce | torus_2d | 6 | 2048 | 4096 | 65536 | 3037.664999999919 |
| 33 | intercube_allreduce | torus_2d | 6 | 4096 | 8192 | 131072 | 4159.50500000003 |
| 34 | intercube_allreduce | torus_2d | 6 | 8192 | 16384 | 262144 | 6403.185000000081 |
| 35 | intercube_allreduce | torus_2d | 6 | 16384 | 32768 | 524288 | 10890.544999998769 |
| 36 | intercube_allreduce | torus_2d | 6 | 32768 | 65536 | 1048576 | 19865.265000008738 |
| 37 | intercube_allreduce | torus_2d | 6 | 49152 | 98304 | 1572864 | 28839.985000013185 |