1c5752a9ec
Move the algorithmic root cube from the corner (cube_w-1, cube_h-1) to the geometric center (cube_w//2, cube_h//2) and have each phase converge bidirectionally so the intra-SIP critical path drops from ~12 hops to ~8 hops on a 4×4 mesh (left half W→E + right half E→W in row reduce; top half N→S + bottom half S→N in col reduce; mirrored on broadcast). Result on torus_2d 6 SIPs at 96 KB / PE on TCM: before (corner root) : 22.0 µs after (center root) : 17.2 µs (−22%) Same shape on ring_1d (−7%) and mesh_2d_no_wrap (−12%); also holds across SRAM and HBM (~−20% each). Phase 1 test (test_intercube_root_center.py) asserts the torus_2d 96 KB latency drops below 20.5 µs and that all 96 cubes still validate (correctness preserved). Plot updates: - overview.png: replace constant 10.6 µs theoretical line with user-supplied hand-derived curve (per-cube packet count = bytes_per_pe × 8 PEs ÷ 128 B; 1346 ns startup + 1.20 ns/pkt). - All summary.csv numbers and per-topology PNGs regenerated. - pe2pe_latency_plots and ipcq diagram emitter PNGs refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.6 KiB
6.6 KiB
| 1 | hop | label | size_bytes | path | total_ns |
|---|---|---|---|---|---|
| 2 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 128 | ipcq | 31.6399999999976 |
| 3 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 128 | raw | 12.019999999996799 |
| 4 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 256 | ipcq | 33.6399999999976 |
| 5 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 256 | raw | 13.019999999996799 |
| 6 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 384 | ipcq | 35.6399999999976 |
| 7 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 384 | raw | 14.019999999996799 |
| 8 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 512 | ipcq | 37.6399999999976 |
| 9 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 512 | raw | 15.019999999996799 |
| 10 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 768 | ipcq | 41.6399999999976 |
| 11 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 768 | raw | 17.0199999999968 |
| 12 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 1024 | ipcq | 45.6399999999976 |
| 13 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 1024 | raw | 19.0199999999968 |
| 14 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 2048 | ipcq | 61.6399999999976 |
| 15 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 2048 | raw | 27.0199999999968 |
| 16 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 4096 | ipcq | 93.6399999999976 |
| 17 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 4096 | raw | 43.0199999999968 |
| 18 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 8192 | ipcq | 157.64000000000306 |
| 19 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 8192 | raw | 75.02000000000407 |
| 20 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 10240 | ipcq | 189.64000000000306 |
| 21 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 10240 | raw | 91.02000000000407 |
| 22 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 128 | ipcq | 31.6399999999976 |
| 23 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 128 | raw | 12.019999999996799 |
| 24 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 256 | ipcq | 33.6399999999976 |
| 25 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 256 | raw | 13.019999999996799 |
| 26 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 384 | ipcq | 35.6399999999976 |
| 27 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 384 | raw | 14.019999999996799 |
| 28 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 512 | ipcq | 37.6399999999976 |
| 29 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 512 | raw | 15.019999999996799 |
| 30 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 768 | ipcq | 41.6399999999976 |
| 31 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 768 | raw | 17.0199999999968 |
| 32 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 1024 | ipcq | 45.6399999999976 |
| 33 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 1024 | raw | 19.0199999999968 |
| 34 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 2048 | ipcq | 61.6399999999976 |
| 35 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 2048 | raw | 27.0199999999968 |
| 36 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 4096 | ipcq | 93.6399999999976 |
| 37 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 4096 | raw | 43.0199999999968 |
| 38 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 8192 | ipcq | 157.64000000000306 |
| 39 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 8192 | raw | 75.02000000000407 |
| 40 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 10240 | ipcq | 189.64000000000306 |
| 41 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 10240 | raw | 91.02000000000407 |
| 42 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 128 | ipcq | 67.65999999999804 |
| 43 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 128 | raw | 68.53999999999724 |
| 44 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 256 | ipcq | 69.65999999999804 |
| 45 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 256 | raw | 70.03999999999724 |
| 46 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 384 | ipcq | 71.65999999999804 |
| 47 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 384 | raw | 71.53999999999724 |
| 48 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 512 | ipcq | 73.65999999999804 |
| 49 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 512 | raw | 73.03999999999724 |
| 50 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 768 | ipcq | 77.65999999999804 |
| 51 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 768 | raw | 76.03999999999724 |
| 52 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 1024 | ipcq | 81.65999999999804 |
| 53 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 1024 | raw | 79.03999999999724 |
| 54 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 2048 | ipcq | 97.65999999999804 |
| 55 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 2048 | raw | 91.03999999999724 |
| 56 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 4096 | ipcq | 129.65999999999804 |
| 57 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 4096 | raw | 115.03999999999724 |
| 58 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 8192 | ipcq | 193.65999999999985 |
| 59 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 8192 | raw | 163.04000000000087 |
| 60 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 10240 | ipcq | 225.65999999999985 |
| 61 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 10240 | raw | 187.04000000000087 |
| 62 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 128 | ipcq | 87.65999999999804 |
| 63 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 128 | raw | 88.53999999999724 |
| 64 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 256 | ipcq | 89.65999999999804 |
| 65 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 256 | raw | 90.03999999999724 |
| 66 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 384 | ipcq | 91.65999999999804 |
| 67 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 384 | raw | 91.53999999999724 |
| 68 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 512 | ipcq | 93.65999999999804 |
| 69 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 512 | raw | 93.03999999999724 |
| 70 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 768 | ipcq | 97.65999999999804 |
| 71 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 768 | raw | 96.03999999999724 |
| 72 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 1024 | ipcq | 101.65999999999804 |
| 73 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 1024 | raw | 99.03999999999724 |
| 74 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 2048 | ipcq | 117.65999999999804 |
| 75 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 2048 | raw | 111.03999999999724 |
| 76 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 4096 | ipcq | 149.65999999999804 |
| 77 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 4096 | raw | 135.03999999999724 |
| 78 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 8192 | ipcq | 213.65999999999985 |
| 79 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 8192 | raw | 183.04000000000087 |
| 80 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 10240 | ipcq | 245.65999999999985 |
| 81 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 10240 | raw | 207.04000000000087 |