kernbench2/tests/pe2pe_latency_plots/h2_intra_vertical.png at fca24feac5ab77515621bab1521f14f49182e84a

Files

T

ywkang fca24feac5 Fix all remaining test failures: single-cube allreduce + matplotlib dep

- intercube_allreduce: add single-cube fast path that skips intra-SIP
  mesh reduce and goes directly to inter-SIP exchange. Fixes IPCQ
  deadlock when TP launches kernel on one cube per SIP.
- distributed.py: derive effective cube dims from tensor shard placement
  instead of hardcoding topology mesh size.
- pyproject.toml: add matplotlib>=3.7 to dependencies.
- pe_dma.py (prior commit): add MMU translation in pipeline DMA path.

577 passed, 0 failed (was 529 passed, 10 failed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-27 21:25:31 -07:00

48 KiB

960x600px

Raw History

/ywkang/kernbench2/raw/commit/fca24feac5ab77515621bab1521f14f49182e84a/tests/pe2pe_latency_plots/h2_intra_vertical.png

48 KiB 960x600px Raw History

48 KiB

960x600px

Raw History