kernbench2/docs/adr at c1a5cf3a2ae88d98777113f06e60d3cd3c144031 - kernbench2 - YWGitServer

ywkang/kernbench2

Files

T

History

mukesh c1a5cf3a2a ADR-0009 D5: chain-aware target_start_ns + zero-byte launch fanout

The single-walk predictor (find_node_path(io_cpu, pe_cpu) +
compute_path_latency_ns) under-shot actual dispatch latency for far
cubes -- the routing graph could pick a path bypassing M_CPU, and
non-zero-nbytes launch sub-txns serialized on shared first hops.
Far PEs arrived at _execute_kernel after target_start_ns, silently
skipped the barrier yield, and started pe_exec_start late. Their
reported pe_exec_ns under-counted by exactly the late_ns amount
(63 ns observed at h4 cube4.pe0 in the IPCQ test, up to 113 ns
worst case for cubes 9-11), producing the suspicious flat region
in the h4 IPCQ curve at 8192/10240 bytes.

Fix:
  - IO_CPU predictor uses the explicit two-leg chain
    (IO_CPU->M_CPU + M_CPU->PE_CPU - io.overhead - m.overhead), so
    every PE on every targeted cube has a barrier >= its real
    dispatch arrival.
  - Kernel-launch fanout sub-txns carry nbytes=0 (control-plane,
    not data-plane), removing the per-cube fanout serialization
    that pushed far M_CPUs past the predictor.
  - Legacy io_cpu mirror updated.

ADR-0009 D5 mechanism updated to specify the two-leg formula and
the nbytes=0 requirement. New tests/test_d5_barrier_invariant.py
asserts (a) no PE enters _execute_kernel after target_start_ns and
(b) every PE in a multi-cube launch has identical pe_exec_start --
both regressions silently pass on the existing
tests/test_kernel_launch_sync.py because that test only inspects
post-aggregation max(pe_exec_ns).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 15:12:58 -07:00

..

ADR-0001-physaddr-layout.md

commit - release 1

2026-03-18 11:47:48 -07:00

ADR-0002-routing-distance.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0003-target-system-hierarchy.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0004-memory-semantics-local-hbm.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0005-diagram-views-distance-layout.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0006-topology-compilation-distance-diagram.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0007-runtime-api-boundaries.md

commit - release 1

2026-03-18 11:47:48 -07:00

ADR-0008-tensor-deploy-and-allocation.md

commit - release 1

2026-03-18 11:47:48 -07:00

ADR-0009-kernel-execution-messaging.md

ADR-0009 D5: chain-aware target_start_ns + zero-byte launch fanout

2026-04-27 15:12:58 -07:00

ADR-0010-cli-device-selection.md

commit - release 1

2026-03-18 11:47:48 -07:00

ADR-0011-memory-addressing-simplification.md

Add SIP-level tensor parallelism, component registry YAML, VA offset verification

2026-03-26 01:13:17 -07:00

ADR-0012-host-io-message-schema.md

commit - release 1

2026-03-18 11:47:48 -07:00

ADR-0013-verification_strategy.md

commit - release 1

2026-03-18 11:47:48 -07:00

ADR-0014-pe-internal-execution-model.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0015-component-port-wire-model.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0016-iochiplet-noc-and-memory-path.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0017-cube-noc-2d-mesh.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0018-Logical Address.en.md

Add English translations for ADR-0018, 0019, 0020, 0021

2026-04-13 16:31:32 -07:00

ADR-0018-Logical Address.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0019-NOC-Local HBM.en.md

Add English translations for ADR-0018, 0019, 0020, 0021

2026-04-13 16:31:32 -07:00

ADR-0019-NOC-Local HBM.md

Replace xbar/bridge/single-NOC with explicit router mesh (ADR-0019)

2026-04-04 17:51:28 -07:00

ADR-0020-data-execution-two-pass.en.md

Add English translations for ADR-0018, 0019, 0020, 0021

2026-04-13 16:31:32 -07:00

ADR-0020-data-execution-two-pass.md

ADR-0020: 2-Pass data execution model with greenlet kernel runner

2026-04-07 23:53:49 -07:00

ADR-0021-pe-pipeline-refactor.en.md

Add English translations for ADR-0018, 0019, 0020, 0021

2026-04-13 16:31:32 -07:00

ADR-0021-pe-pipeline-refactor.md

ADR-0021: PE pipeline refactor — component separation + token self-routing

2026-04-08 23:21:40 -07:00

ADR-0022-program-id-2d-grid.md

Add 2D grid program_id semantics (ADR-0022)

2026-04-09 16:49:56 -07:00

ADR-0023-ipcq-pe-collective.en.md

ADR-0023 D9: blocking credit-emit with full-path latency

2026-04-27 15:12:38 -07:00

ADR-0023-ipcq-pe-collective.md

ADR-0023 D9: blocking credit-emit with full-path latency

2026-04-27 15:12:38 -07:00

ADR-0024-sip-tp-launcher.md

docs: add ADRs 0024–0031 for SIP-TP launcher stack

2026-04-14 00:38:27 -07:00

ADR-0025-ipcq-direction-addressing.md

docs: add ADRs 0024–0031 for SIP-TP launcher stack

2026-04-14 00:38:27 -07:00

ADR-0026-dppolicy-intra-device.md

ADR-0027 rev7 (Megatron TP + worker-wait generalization) + ADR-0026 typo fix

2026-04-14 14:13:26 -07:00

ADR-0027-megatron-tp.md

ADR-0027 rev7 (Megatron TP + worker-wait generalization) + ADR-0026 typo fix

2026-04-14 14:13:26 -07:00

ADR-0028-dtensor-support.md

docs: add ADRs 0024–0031 for SIP-TP launcher stack

2026-04-14 00:38:27 -07:00

ADR-0029-hierarchical-allreduce.md

docs: add ADRs 0024–0031 for SIP-TP launcher stack

2026-04-14 00:38:27 -07:00

ADR-0030-ipcq-physaddr.md

docs: add ADRs 0024–0031 for SIP-TP launcher stack

2026-04-14 00:38:27 -07:00

ADR-0031-physaddr-pe-resource-extension.md

docs: add ADRs 0024–0031 for SIP-TP launcher stack

2026-04-14 00:38:27 -07:00