# ADR-0009: Kernel Execution Messaging and Completion Semantics ## Status Accepted ## Context Kernel execution is initiated by the host and proceeds through device control components: Host → IO_CPU → M_CPU → PE_CPU → schedulers → engines Completion propagates in reverse order. To keep benchmarks simple and topology-agnostic, kernel execution must be endpoint-driven with deterministic aggregation. --- ## Decision ### D1. Kernel launch is an endpoint request A kernel launch is initiated by submitting a single KernelLaunch request to the IO_CPU endpoint. The runtime API MUST: - construct the kernel launch request, - submit it to IO_CPU, - await a single completion result. The runtime API MUST NOT orchestrate internal fan-out. --- ### D2. Tensor arguments are passed by metadata KernelLaunch requests MUST reference tensor arguments via: - host-owned tensor handles, or - resolved device address maps derived from those handles. Bulk tensor data MUST NOT be embedded in kernel launch messages. --- ### D3. Fan-out and aggregation are component responsibilities - IO_CPU fans out work to M_CPUs. - M_CPU fans out work to PE_CPUs. - PE_CPU manages kernel execution and engine dispatch. Completion semantics: - M_CPU completes when all targeted PEs complete or a failure policy triggers. - IO_CPU completes when all targeted CUBEs complete or a failure policy triggers. --- ### D4. Completion and failure propagation - All messages MUST carry correlation identifiers. - Completion and failure MUST propagate deterministically to the host. - The simulation engine provides futures/handles to observe completion. --- ## Links - SPEC R1, R2, R7, R8 - ADR-0007 (Runtime API boundaries) - ADR-0008 (Tensor deployment)