Files
kernbench2/docs/diagrams
mukesh f6d262e359 Honest measured pipeline efficiency: two timing fixes
Two related issues caused measured pipeline efficiency to look
worse than the simulator's actual behavior:

1. DMA timing recorded too early. The op-log start timestamp
   for a DMA op fired when the request entered the queue, and
   the DMA channel was released as soon as the request was
   issued. Back-to-back DMAs therefore appeared to grab the
   channel simultaneously, with per-op duration drifting
   upward as queue depth grew - an artifact, not real cost.

   Fix: defer the start timestamp until after the channel is
   acquired, and hold the channel through the full HBM
   round-trip until the response returns. Per-op duration is
   now constant and equal to the actual transfer interval;
   serialization is visible as queue wait, not as inflated
   service time.

2. Sweep timing window folded in pre-composite work. The PE
   timing window spanned every PE engine record, which
   included the upfront pinned-operand DMA issued before the
   composite GEMM begins. For large-K shapes that one-shot
   load can be nearly half of the window, conflating
   operand-staging cost with composite-pipeline behavior.

   Fix: add a second window scoped to the composite pipeline
   by filtering op_log records to those tagged with a
   tile-pipeline stage; the legacy operand-load path is
   untagged and naturally excluded. For 32x3072x32 load_ref
   the window drops from 1765ns to 992ns and measured eff
   lines up with the steady-state DMA-bound stage limit
   instead of being penalized for the one-time load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:19:17 -07:00
..
2026-03-18 11:47:48 -07:00
2026-03-18 11:47:48 -07:00
2026-03-18 11:47:48 -07:00
2026-03-18 11:47:48 -07:00

Generated Diagrams

This directory contains diagrams generated from topology compilation.

What these files are

  • Derived artifacts generated from:
    • compiled topology graph
    • distance (accumulated latency) metadata
    • view/layout rules (ADR-0005)

These files are meant for quick visual inspection and review.

Default outputs

  • SIP view: sip_view.mmd (and/or sip_view.dot)
  • CUBE view: cube_view.mmd (and/or cube_view.dot)
  • PE view: pe_view.mmd (and/or pe_view.dot)

How to preview

  • In VS Code:
    • open .mmd or .md containing Mermaid blocks and use Markdown Preview
    • for .dot, use a Graphviz preview extension or dot -Tpng

Notes

  • Diagrams are representative and distance-aware by default.
  • Instance indices are not required unless debugging asymmetry.
  • Outputs should be deterministic for the same topology and rules.