1c5752a9ec
Move the algorithmic root cube from the corner (cube_w-1, cube_h-1) to the geometric center (cube_w//2, cube_h//2) and have each phase converge bidirectionally so the intra-SIP critical path drops from ~12 hops to ~8 hops on a 4×4 mesh (left half W→E + right half E→W in row reduce; top half N→S + bottom half S→N in col reduce; mirrored on broadcast). Result on torus_2d 6 SIPs at 96 KB / PE on TCM: before (corner root) : 22.0 µs after (center root) : 17.2 µs (−22%) Same shape on ring_1d (−7%) and mesh_2d_no_wrap (−12%); also holds across SRAM and HBM (~−20% each). Phase 1 test (test_intercube_root_center.py) asserts the torus_2d 96 KB latency drops below 20.5 µs and that all 96 cubes still validate (correctness preserved). Plot updates: - overview.png: replace constant 10.6 µs theoretical line with user-supplied hand-derived curve (per-cube packet count = bytes_per_pe × 8 PEs ÷ 128 B; 1346 ns startup + 1.20 ns/pkt). - All summary.csv numbers and per-topology PNGs regenerated. - pe2pe_latency_plots and ipcq diagram emitter PNGs refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.4 KiB
2.4 KiB
| 1 | algorithm | sip_topology | n_sips | n_elem | bytes_per_pe | bytes_per_sip | latency_ns |
|---|---|---|---|---|---|---|---|
| 2 | intercube_allreduce | mesh_2d_no_wrap | 6 | 8 | 16 | 256 | 2626.302499999998 |
| 3 | intercube_allreduce | mesh_2d_no_wrap | 6 | 32 | 64 | 1024 | 2634.7399999999952 |
| 4 | intercube_allreduce | mesh_2d_no_wrap | 6 | 64 | 128 | 2048 | 2645.9899999999925 |
| 5 | intercube_allreduce | mesh_2d_no_wrap | 6 | 128 | 256 | 4096 | 2668.489999999987 |
| 6 | intercube_allreduce | mesh_2d_no_wrap | 6 | 512 | 1024 | 16384 | 2812.489999999987 |
| 7 | intercube_allreduce | mesh_2d_no_wrap | 6 | 1024 | 2048 | 32768 | 3010.489999999987 |
| 8 | intercube_allreduce | mesh_2d_no_wrap | 6 | 2048 | 4096 | 65536 | 3406.489999999987 |
| 9 | intercube_allreduce | mesh_2d_no_wrap | 6 | 4096 | 8192 | 131072 | 4198.489999999965 |
| 10 | intercube_allreduce | mesh_2d_no_wrap | 6 | 8192 | 16384 | 262144 | 5782.489999999969 |
| 11 | intercube_allreduce | mesh_2d_no_wrap | 6 | 16384 | 32768 | 524288 | 8950.489999999925 |
| 12 | intercube_allreduce | mesh_2d_no_wrap | 6 | 32768 | 65536 | 1048576 | 15286.48999999986 |
| 13 | intercube_allreduce | mesh_2d_no_wrap | 6 | 49152 | 98304 | 1572864 | 21622.489999999932 |
| 14 | intercube_allreduce | ring_1d | 6 | 8 | 16 | 256 | 2302.9849999999933 |
| 15 | intercube_allreduce | ring_1d | 6 | 32 | 64 | 1024 | 2310.8599999999906 |
| 16 | intercube_allreduce | ring_1d | 6 | 64 | 128 | 2048 | 2321.359999999988 |
| 17 | intercube_allreduce | ring_1d | 6 | 128 | 256 | 4096 | 2342.3599999999824 |
| 18 | intercube_allreduce | ring_1d | 6 | 512 | 1024 | 16384 | 2479.3599999999824 |
| 19 | intercube_allreduce | ring_1d | 6 | 1024 | 2048 | 32768 | 2669.3599999999824 |
| 20 | intercube_allreduce | ring_1d | 6 | 2048 | 4096 | 65536 | 3049.3599999999824 |
| 21 | intercube_allreduce | ring_1d | 6 | 4096 | 8192 | 131072 | 3809.3599999999715 |
| 22 | intercube_allreduce | ring_1d | 6 | 8192 | 16384 | 262144 | 5329.359999999979 |
| 23 | intercube_allreduce | ring_1d | 6 | 16384 | 32768 | 524288 | 8369.35999999992 |
| 24 | intercube_allreduce | ring_1d | 6 | 32768 | 65536 | 1048576 | 14449.359999999899 |
| 25 | intercube_allreduce | ring_1d | 6 | 49152 | 98304 | 1572864 | 20529.35999999997 |
| 26 | intercube_allreduce | torus_2d | 6 | 8 | 16 | 256 | 1644.2899999999936 |
| 27 | intercube_allreduce | torus_2d | 6 | 32 | 64 | 1024 | 1651.0399999999909 |
| 28 | intercube_allreduce | torus_2d | 6 | 64 | 128 | 2048 | 1660.0399999999881 |
| 29 | intercube_allreduce | torus_2d | 6 | 128 | 256 | 4096 | 1678.0399999999827 |
| 30 | intercube_allreduce | torus_2d | 6 | 512 | 1024 | 16384 | 1795.0399999999827 |
| 31 | intercube_allreduce | torus_2d | 6 | 1024 | 2048 | 32768 | 1957.0399999999827 |
| 32 | intercube_allreduce | torus_2d | 6 | 2048 | 4096 | 65536 | 2281.0399999999827 |
| 33 | intercube_allreduce | torus_2d | 6 | 4096 | 8192 | 131072 | 2929.039999999979 |
| 34 | intercube_allreduce | torus_2d | 6 | 8192 | 16384 | 262144 | 4225.039999999986 |
| 35 | intercube_allreduce | torus_2d | 6 | 16384 | 32768 | 524288 | 6817.039999999943 |
| 36 | intercube_allreduce | torus_2d | 6 | 32768 | 65536 | 1048576 | 12001.03999999992 |
| 37 | intercube_allreduce | torus_2d | 6 | 49152 | 98304 | 1572864 | 17185.039999999994 |