84a1325e5c
Charge per-tier bandwidth + setup overhead at IPCQ slot WRITE
(receiver inbound DMA, in pe_dma._handle_ipcq_inbound) and slot
READ (recv consume, in pe_ipcq._handle_recv). Tier table
(common/ipcq_types.py):
tcm : 512 GB/s, 0 ns
sram : 128 GB/s, 2 ns
hbm : 32 GB/s, 6 ns
Before this change, slot read/write was free regardless of
buffer_kind, making memory-tier choice invisible in simulated
latency. After the change, swapping buffer_kind in ccl.yaml
produces measurable per-tier separation in allreduce latency.
Tests:
test_ipcq_buffer_kind_latency.py — three micro-tests asserting
tcm < sram < hbm ordering, payload-scaling, and that
buffer_kind sensitivity grows with payload (credit-only path
stays fabric-bound).
test_allreduce_buffer_kind_sweep.py — 12-config parametrized
sweep emitting buffer_kind_sweep.png (3 lines, torus_2d).
conftest sessionfinish hook generalised to dispatch multiple
sweep aggregators (allreduce + buffer-kind).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generated Diagrams
This directory contains diagrams generated from topology compilation.
What these files are
- Derived artifacts generated from:
- compiled topology graph
- distance (accumulated latency) metadata
- view/layout rules (ADR-0005)
These files are meant for quick visual inspection and review.
Default outputs
- SIP view:
sip_view.mmd(and/orsip_view.dot) - CUBE view:
cube_view.mmd(and/orcube_view.dot) - PE view:
pe_view.mmd(and/orpe_view.dot)
How to preview
- In VS Code:
- open
.mmdor.mdcontaining Mermaid blocks and use Markdown Preview - for
.dot, use a Graphviz preview extension ordot -Tpng
- open
Notes
- Diagrams are representative and distance-aware by default.
- Instance indices are not required unless debugging asymmetry.
- Outputs should be deterministic for the same topology and rules.