a7fe785e5f
Extend tl.composite() with an ordered epilogue list. Each op carries
a scope flag - output_tile (default, runs once per (m,n) before
STORE), k_tile (every K-tile right after GEMM), or kernel. Plan
generator slots MATH stages by scope; pe_math reuses pe_dma's
local-loop pattern so chained epilogues (bias->relu) skip the port
hop. op_log captures per-stage params for telemetry. Topology
gains a gemm->math edge (snapshot test updated).
API stays backward-compatible - `epilogue=` is opt-in.
Example:
h = tl.composite(
op="gemm", a=a, b=b, out_ptr=int(out),
epilogue=[
{"op": "dequant", "scale": s_per_k, "scope": "k_tile"},
{"op": "bias", "bias": bias_vec},
{"op": "relu"},
{"op": "scale", "factor": 0.5},
],
)
tl.wait(h)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generated Diagrams
This directory contains diagrams generated from topology compilation.
What these files are
- Derived artifacts generated from:
- compiled topology graph
- distance (accumulated latency) metadata
- view/layout rules (ADR-0005)
These files are meant for quick visual inspection and review.
Default outputs
- SIP view:
sip_view.mmd(and/orsip_view.dot) - CUBE view:
cube_view.mmd(and/orcube_view.dot) - PE view:
pe_view.mmd(and/orpe_view.dot)
How to preview
- In VS Code:
- open
.mmdor.mdcontaining Mermaid blocks and use Markdown Preview - for
.dot, use a Graphviz preview extension ordot -Tpng
- open
Notes
- Diagrams are representative and distance-aware by default.
- Instance indices are not required unless debugging asymmetry.
- Outputs should be deterministic for the same topology and rules.