Files
kernbench2/docs/diagrams
mukesh 84a1325e5c ADR-0023 D9.7: IPCQ slot-memory latency model (TCM/SRAM/HBM)
Charge per-tier bandwidth + setup overhead at IPCQ slot WRITE
(receiver inbound DMA, in pe_dma._handle_ipcq_inbound) and slot
READ (recv consume, in pe_ipcq._handle_recv). Tier table
(common/ipcq_types.py):
  tcm  : 512 GB/s, 0 ns
  sram : 128 GB/s, 2 ns
  hbm  :  32 GB/s, 6 ns

Before this change, slot read/write was free regardless of
buffer_kind, making memory-tier choice invisible in simulated
latency. After the change, swapping buffer_kind in ccl.yaml
produces measurable per-tier separation in allreduce latency.

Tests:
  test_ipcq_buffer_kind_latency.py — three micro-tests asserting
    tcm < sram < hbm ordering, payload-scaling, and that
    buffer_kind sensitivity grows with payload (credit-only path
    stays fabric-bound).
  test_allreduce_buffer_kind_sweep.py — 12-config parametrized
    sweep emitting buffer_kind_sweep.png (3 lines, torus_2d).

conftest sessionfinish hook generalised to dispatch multiple
sweep aggregators (allreduce + buffer-kind).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:28:34 -07:00
..
2026-03-18 11:47:48 -07:00
2026-03-18 11:47:48 -07:00
2026-03-18 11:47:48 -07:00
2026-03-18 11:47:48 -07:00

Generated Diagrams

This directory contains diagrams generated from topology compilation.

What these files are

  • Derived artifacts generated from:
    • compiled topology graph
    • distance (accumulated latency) metadata
    • view/layout rules (ADR-0005)

These files are meant for quick visual inspection and review.

Default outputs

  • SIP view: sip_view.mmd (and/or sip_view.dot)
  • CUBE view: cube_view.mmd (and/or cube_view.dot)
  • PE view: pe_view.mmd (and/or pe_view.dot)

How to preview

  • In VS Code:
    • open .mmd or .md containing Mermaid blocks and use Markdown Preview
    • for .dot, use a Graphviz preview extension or dot -Tpng

Notes

  • Diagrams are representative and distance-aware by default.
  • Instance indices are not required unless debugging asymmetry.
  • Outputs should be deterministic for the same topology and rules.