kernbench2

Files

T

mukesh 14d800b0ae Kernel-launch sync (ADR-0009 D5) and IPCQ drain at inbound (ADR-0023)

- KernelLaunchMsg gains target_start_ns: IO_CPU stamps a global barrier
  (max path latency across every target PE), M_CPU passes it through,
  PE_CPU yields until it before recording pe_exec_start. Every PE in a
  launch begins kernel execution at the same env.now regardless of its
  dispatch path length — eliminates per-PE dispatch-offset artifact in
  cross-PE and cross-cube latency measurements.

- PE_DMA._handle_ipcq_inbound now pays Transaction.drain_ns at the top,
  matching the terminal-drain behavior of ComponentBase._forward_txn for
  every non-IPCQ Transaction. SRC-side tl.send stays fire-and-forget
  (sender doesn't yield on sub_done); tl.recv now blocks until bytes
  have actually drained into its inbox.

- ComponentContext: new compute_path_latency_ns helper + node_overhead_ns
  field populated by GraphEngine.

- tests/test_kernel_launch_sync.py: asserts all PEs in one launch
  produce identical pe_exec_ns for a no-op kernel (zero spread).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 15:30:29 -07:00

adr

Kernel-launch sync (ADR-0009 D5) and IPCQ drain at inbound (ADR-0023)

2026-04-23 15:30:29 -07:00

diagrams

Remove xbar/noc remnants, rule-based cube-view connectors

2026-04-06 23:59:12 -07:00

ccl-author-guide.en.md

ADR-0026: DPPolicy intra-device only + ShardSpec structural coords

2026-04-14 13:02:19 -07:00

ccl-author-guide.md

ADR-0026: DPPolicy intra-device only + ShardSpec structural coords

2026-04-14 13:02:19 -07:00

di-presentation.md

Add SIP-level tensor parallelism, component registry YAML, VA offset verification

2026-03-26 01:13:17 -07:00

latency-model.md

Add CHANGES.md, README, update SPEC/ADRs for release 2

2026-03-19 01:43:15 -07:00