kernbench2/docs/diagrams/pe2pe_latency_plots/overview.png at 5fdb6f87978a2065ce9f865a7d2453f42c3fc861

Files

T

mukesh a563169e89 Add tl.recv_no_consume diagnostic API for apples-to-apples pe2pe plot

The pe2pe overview compared IPCQ (tl.send + tl.recv) against raw DMA
(tl.load + tl.store), but DMA is one-sided — DST never reads — while
tl.recv pays a slot-read on DST. The comparison was unfair: IPCQ
looked slower partly because it does more work.

Adds tl.recv_no_consume() — a separate, diagnostic-only entry point
that blocks for slot arrival but skips the slot-read (and bank-hop)
charge on DST. Production tl.recv is unchanged (no `consume` kwarg
on the public API), so the diagnostic flag can never accidentally
leak into real workloads.

Updates test_pe_to_pe_latency to call tl.recv_no_consume so the
overview.png shows IPCQ no-consume vs raw DMA on equal footing.
Also fixes PLOT_DIR back to docs/diagrams/pe2pe_latency_plots/
(was lost in a merge). Adds scripts/replot_pe2pe.py for label-only
re-renders without re-measuring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 18:20:44 -07:00

109 KiB

1560x1080px

Raw History

/ywkang/kernbench2/raw/commit/5fdb6f87978a2065ce9f865a7d2453f42c3fc861/docs/diagrams/pe2pe_latency_plots/overview.png

109 KiB 1560x1080px Raw History

109 KiB

1560x1080px

Raw History