a563169e89
The pe2pe overview compared IPCQ (tl.send + tl.recv) against raw DMA (tl.load + tl.store), but DMA is one-sided — DST never reads — while tl.recv pays a slot-read on DST. The comparison was unfair: IPCQ looked slower partly because it does more work. Adds tl.recv_no_consume() — a separate, diagnostic-only entry point that blocks for slot arrival but skips the slot-read (and bank-hop) charge on DST. Production tl.recv is unchanged (no `consume` kwarg on the public API), so the diagnostic flag can never accidentally leak into real workloads. Updates test_pe_to_pe_latency to call tl.recv_no_consume so the overview.png shows IPCQ no-consume vs raw DMA on equal footing. Also fixes PLOT_DIR back to docs/diagrams/pe2pe_latency_plots/ (was lost in a merge). Adds scripts/replot_pe2pe.py for label-only re-renders without re-measuring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.6 KiB
6.6 KiB
| 1 | hop | label | size_bytes | path | total_ns |
|---|---|---|---|---|---|
| 2 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 128 | ipcq | 31.3899999999976 |
| 3 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 128 | raw | 12.019999999996799 |
| 4 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 256 | ipcq | 33.1399999999976 |
| 5 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 256 | raw | 13.019999999996799 |
| 6 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 384 | ipcq | 34.8899999999976 |
| 7 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 384 | raw | 14.019999999996799 |
| 8 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 512 | ipcq | 36.6399999999976 |
| 9 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 512 | raw | 15.019999999996799 |
| 10 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 768 | ipcq | 40.1399999999976 |
| 11 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 768 | raw | 17.0199999999968 |
| 12 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 1024 | ipcq | 43.6399999999976 |
| 13 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 1024 | raw | 19.0199999999968 |
| 14 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 2048 | ipcq | 57.6399999999976 |
| 15 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 2048 | raw | 27.0199999999968 |
| 16 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 4096 | ipcq | 85.6399999999976 |
| 17 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 4096 | raw | 43.0199999999968 |
| 18 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 8192 | ipcq | 141.64000000000306 |
| 19 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 8192 | raw | 75.02000000000407 |
| 20 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 10240 | ipcq | 169.64000000000306 |
| 21 | h1_intra_horizontal | Intra-cube horizontal (pe0 to pe1) | 10240 | raw | 91.02000000000407 |
| 22 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 128 | ipcq | 31.3899999999976 |
| 23 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 128 | raw | 12.019999999996799 |
| 24 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 256 | ipcq | 33.1399999999976 |
| 25 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 256 | raw | 13.019999999996799 |
| 26 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 384 | ipcq | 34.8899999999976 |
| 27 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 384 | raw | 14.019999999996799 |
| 28 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 512 | ipcq | 36.6399999999976 |
| 29 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 512 | raw | 15.019999999996799 |
| 30 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 768 | ipcq | 40.1399999999976 |
| 31 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 768 | raw | 17.0199999999968 |
| 32 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 1024 | ipcq | 43.6399999999976 |
| 33 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 1024 | raw | 19.0199999999968 |
| 34 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 2048 | ipcq | 57.6399999999976 |
| 35 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 2048 | raw | 27.0199999999968 |
| 36 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 4096 | ipcq | 85.6399999999976 |
| 37 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 4096 | raw | 43.0199999999968 |
| 38 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 8192 | ipcq | 141.64000000000306 |
| 39 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 8192 | raw | 75.02000000000407 |
| 40 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 10240 | ipcq | 169.64000000000306 |
| 41 | h2_intra_vertical | Intra-cube vertical (pe0 to pe4) | 10240 | raw | 91.02000000000407 |
| 42 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 128 | ipcq | 67.40999999999804 |
| 43 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 128 | raw | 68.53999999999724 |
| 44 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 256 | ipcq | 69.15999999999804 |
| 45 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 256 | raw | 70.03999999999724 |
| 46 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 384 | ipcq | 70.90999999999804 |
| 47 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 384 | raw | 71.53999999999724 |
| 48 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 512 | ipcq | 72.65999999999804 |
| 49 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 512 | raw | 73.03999999999724 |
| 50 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 768 | ipcq | 76.15999999999804 |
| 51 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 768 | raw | 76.03999999999724 |
| 52 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 1024 | ipcq | 79.65999999999804 |
| 53 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 1024 | raw | 79.03999999999724 |
| 54 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 2048 | ipcq | 93.65999999999804 |
| 55 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 2048 | raw | 91.03999999999724 |
| 56 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 4096 | ipcq | 121.65999999999804 |
| 57 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 4096 | raw | 115.03999999999724 |
| 58 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 8192 | ipcq | 177.65999999999985 |
| 59 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 8192 | raw | 163.04000000000087 |
| 60 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 10240 | ipcq | 205.65999999999985 |
| 61 | h3_inter_cube_horizontal | Inter-cube horizontal (cube0 to cube1) | 10240 | raw | 187.04000000000087 |
| 62 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 128 | ipcq | 87.40999999999804 |
| 63 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 128 | raw | 88.53999999999724 |
| 64 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 256 | ipcq | 89.15999999999804 |
| 65 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 256 | raw | 90.03999999999724 |
| 66 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 384 | ipcq | 90.90999999999804 |
| 67 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 384 | raw | 91.53999999999724 |
| 68 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 512 | ipcq | 92.65999999999804 |
| 69 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 512 | raw | 93.03999999999724 |
| 70 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 768 | ipcq | 96.15999999999804 |
| 71 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 768 | raw | 96.03999999999724 |
| 72 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 1024 | ipcq | 99.65999999999804 |
| 73 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 1024 | raw | 99.03999999999724 |
| 74 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 2048 | ipcq | 113.65999999999804 |
| 75 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 2048 | raw | 111.03999999999724 |
| 76 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 4096 | ipcq | 141.65999999999804 |
| 77 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 4096 | raw | 135.03999999999724 |
| 78 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 8192 | ipcq | 197.65999999999985 |
| 79 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 8192 | raw | 183.04000000000087 |
| 80 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 10240 | ipcq | 225.65999999999985 |
| 81 | h4_inter_cube_vertical | Inter-cube vertical (cube0 to cube4) | 10240 | raw | 207.04000000000087 |