a44f832be5
Allreduce + pe2pe + ipcq + pe_view auto-regenerated by test sweeps running against the new chunk-streaming wire timing (per-flit wormhole) — absolute numbers shift upward to reflect bottleneck-link transit charged once per flit (instead of the previous cut-through subtraction at HBM CTRL). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
602 B
602 B
| 1 | buffer_kind | sip_topology | n_sips | n_elem | bytes_per_pe | latency_ns |
|---|---|---|---|---|---|---|
| 2 | hbm | torus_2d | 6 | 128 | 256 | 2144.0399999999754 |
| 3 | hbm | torus_2d | 6 | 1024 | 2048 | 2908.74499999995 |
| 4 | hbm | torus_2d | 6 | 8192 | 16384 | 8851.185000000081 |
| 5 | hbm | torus_2d | 6 | 32768 | 65536 | 29225.265000008752 |
| 6 | sram | torus_2d | 6 | 128 | 256 | 2060.0399999999754 |
| 7 | sram | torus_2d | 6 | 1024 | 2048 | 2908.74499999995 |
| 8 | sram | torus_2d | 6 | 8192 | 16384 | 9523.185000000081 |
| 9 | sram | torus_2d | 6 | 32768 | 65536 | 32201.265000008752 |
| 10 | tcm | torus_2d | 6 | 128 | 256 | 1964.0399999999754 |
| 11 | tcm | torus_2d | 6 | 1024 | 2048 | 2476.74499999995 |
| 12 | tcm | torus_2d | 6 | 8192 | 16384 | 6403.185000000081 |
| 13 | tcm | torus_2d | 6 | 32768 | 65536 | 19865.265000008738 |