ywkang/kernbench2

Files

T

ywkang 6f43807900 commit - release 1

2026-03-18 11:47:48 -07:00

1.7 KiB

Raw Blame History

ADR-0009: Kernel Execution Messaging and Completion Semantics

Status

Accepted

Context

Kernel execution is initiated by the host and proceeds through device control components:

Host → IO_CPU → M_CPU → PE_CPU → schedulers → engines

Completion propagates in reverse order.

To keep benchmarks simple and topology-agnostic, kernel execution must be endpoint-driven with deterministic aggregation.

Decision

D1. Kernel launch is an endpoint request

A kernel launch is initiated by submitting a single KernelLaunch request to the IO_CPU endpoint.

The runtime API MUST:

construct the kernel launch request,
submit it to IO_CPU,
await a single completion result.

The runtime API MUST NOT orchestrate internal fan-out.

D2. Tensor arguments are passed by metadata

KernelLaunch requests MUST reference tensor arguments via:

host-owned tensor handles, or
resolved device address maps derived from those handles.

Bulk tensor data MUST NOT be embedded in kernel launch messages.

D3. Fan-out and aggregation are component responsibilities

IO_CPU fans out work to M_CPUs.
M_CPU fans out work to PE_CPUs.
PE_CPU manages kernel execution and engine dispatch.

Completion semantics:

M_CPU completes when all targeted PEs complete or a failure policy triggers.
IO_CPU completes when all targeted CUBEs complete or a failure policy triggers.

D4. Completion and failure propagation

All messages MUST carry correlation identifiers.
Completion and failure MUST propagate deterministically to the host.
The simulation engine provides futures/handles to observe completion.

Links

SPEC R1, R2, R7, R8
ADR-0007 (Runtime API boundaries)
ADR-0008 (Tensor deployment)