Files
kernbench2/tests/allreduce_latency_plots/summary.csv
T
ywkang fca24feac5 Fix all remaining test failures: single-cube allreduce + matplotlib dep
- intercube_allreduce: add single-cube fast path that skips intra-SIP
  mesh reduce and goes directly to inter-SIP exchange. Fixes IPCQ
  deadlock when TP launches kernel on one cube per SIP.
- distributed.py: derive effective cube dims from tensor shard placement
  instead of hardcoding topology mesh size.
- pyproject.toml: add matplotlib>=3.7 to dependencies.
- pe_dma.py (prior commit): add MMU translation in pipeline DMA path.

577 passed, 0 failed (was 529 passed, 10 failed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 21:25:31 -07:00

38 lines
2.5 KiB
CSV

algorithm,sip_topology,n_sips,n_elem,bytes_per_pe,bytes_per_sip,latency_ns
intercube_allreduce,mesh_2d_no_wrap,6,8,16,256,3508.4249999999993
intercube_allreduce,mesh_2d_no_wrap,6,32,64,1024,3515.55
intercube_allreduce,mesh_2d_no_wrap,6,64,128,2048,3525.0499999999975
intercube_allreduce,mesh_2d_no_wrap,6,128,256,4096,3544.049999999992
intercube_allreduce,mesh_2d_no_wrap,6,512,1024,16384,3667.049999999992
intercube_allreduce,mesh_2d_no_wrap,6,1024,2048,32768,3837.049999999992
intercube_allreduce,mesh_2d_no_wrap,6,2048,4096,65536,4177.049999999992
intercube_allreduce,mesh_2d_no_wrap,6,4096,8192,131072,4857.049999999959
intercube_allreduce,mesh_2d_no_wrap,6,8192,16384,262144,6217.049999999945
intercube_allreduce,mesh_2d_no_wrap,6,16384,32768,524288,8937.049999999937
intercube_allreduce,mesh_2d_no_wrap,6,32768,65536,1048576,14377.049999999872
intercube_allreduce,mesh_2d_no_wrap,6,49152,98304,1572864,19817.049999999872
intercube_allreduce,ring_1d,6,8,16,256,3073.1299999999937
intercube_allreduce,ring_1d,6,32,64,1024,3079.8799999999947
intercube_allreduce,ring_1d,6,64,128,2048,3088.879999999992
intercube_allreduce,ring_1d,6,128,256,4096,3106.8799999999865
intercube_allreduce,ring_1d,6,512,1024,16384,3225.8799999999865
intercube_allreduce,ring_1d,6,1024,2048,32768,3391.8799999999865
intercube_allreduce,ring_1d,6,2048,4096,65536,3723.8799999999865
intercube_allreduce,ring_1d,6,4096,8192,131072,4387.879999999965
intercube_allreduce,ring_1d,6,8192,16384,262144,5715.879999999957
intercube_allreduce,ring_1d,6,16384,32768,524288,8371.879999999932
intercube_allreduce,ring_1d,6,32768,65536,1048576,13683.879999999903
intercube_allreduce,ring_1d,6,49152,98304,1572864,18995.879999999917
intercube_allreduce,torus_2d,6,8,16,256,2190.4799999999923
intercube_allreduce,torus_2d,6,32,64,1024,2196.479999999993
intercube_allreduce,torus_2d,6,64,128,2048,2204.4799999999905
intercube_allreduce,torus_2d,6,128,256,4096,2220.479999999985
intercube_allreduce,torus_2d,6,512,1024,16384,2325.479999999985
intercube_allreduce,torus_2d,6,1024,2048,32768,2471.479999999985
intercube_allreduce,torus_2d,6,2048,4096,65536,2763.479999999985
intercube_allreduce,torus_2d,6,4096,8192,131072,3347.4799999999777
intercube_allreduce,torus_2d,6,8192,16384,262144,4515.4799999999705
intercube_allreduce,torus_2d,6,16384,32768,524288,6851.479999999952
intercube_allreduce,torus_2d,6,32768,65536,1048576,11523.479999999923
intercube_allreduce,torus_2d,6,49152,98304,1572864,16195.479999999952