Files
kernbench2/docs/adr/ADR-0009-kernel-execution-messaging.md
T
2026-03-18 11:47:48 -07:00

1.7 KiB

ADR-0009: Kernel Execution Messaging and Completion Semantics

Status

Accepted

Context

Kernel execution is initiated by the host and proceeds through device control components:

Host → IO_CPU → M_CPU → PE_CPU → schedulers → engines

Completion propagates in reverse order.

To keep benchmarks simple and topology-agnostic, kernel execution must be endpoint-driven with deterministic aggregation.


Decision

D1. Kernel launch is an endpoint request

A kernel launch is initiated by submitting a single KernelLaunch request to the IO_CPU endpoint.

The runtime API MUST:

  • construct the kernel launch request,
  • submit it to IO_CPU,
  • await a single completion result.

The runtime API MUST NOT orchestrate internal fan-out.


D2. Tensor arguments are passed by metadata

KernelLaunch requests MUST reference tensor arguments via:

  • host-owned tensor handles, or
  • resolved device address maps derived from those handles.

Bulk tensor data MUST NOT be embedded in kernel launch messages.


D3. Fan-out and aggregation are component responsibilities

  • IO_CPU fans out work to M_CPUs.
  • M_CPU fans out work to PE_CPUs.
  • PE_CPU manages kernel execution and engine dispatch.

Completion semantics:

  • M_CPU completes when all targeted PEs complete or a failure policy triggers.
  • IO_CPU completes when all targeted CUBEs complete or a failure policy triggers.

D4. Completion and failure propagation

  • All messages MUST carry correlation identifiers.
  • Completion and failure MUST propagate deterministically to the host.
  • The simulation engine provides futures/handles to observe completion.

  • SPEC R1, R2, R7, R8
  • ADR-0007 (Runtime API boundaries)
  • ADR-0008 (Tensor deployment)