1.7 KiB
1.7 KiB
ADR-0009: Kernel Execution Messaging and Completion Semantics
Status
Accepted
Context
Kernel execution is initiated by the host and proceeds through device control components:
Host → IO_CPU → M_CPU → PE_CPU → schedulers → engines
Completion propagates in reverse order.
To keep benchmarks simple and topology-agnostic, kernel execution must be endpoint-driven with deterministic aggregation.
Decision
D1. Kernel launch is an endpoint request
A kernel launch is initiated by submitting a single KernelLaunch request to the IO_CPU endpoint.
The runtime API MUST:
- construct the kernel launch request,
- submit it to IO_CPU,
- await a single completion result.
The runtime API MUST NOT orchestrate internal fan-out.
D2. Tensor arguments are passed by metadata
KernelLaunch requests MUST reference tensor arguments via:
- host-owned tensor handles, or
- resolved device address maps derived from those handles.
Bulk tensor data MUST NOT be embedded in kernel launch messages.
D3. Fan-out and aggregation are component responsibilities
- IO_CPU fans out work to M_CPUs.
- M_CPU fans out work to PE_CPUs.
- PE_CPU manages kernel execution and engine dispatch.
Completion semantics:
- M_CPU completes when all targeted PEs complete or a failure policy triggers.
- IO_CPU completes when all targeted CUBEs complete or a failure policy triggers.
D4. Completion and failure propagation
- All messages MUST carry correlation identifiers.
- Completion and failure MUST propagate deterministically to the host.
- The simulation engine provides futures/handles to observe completion.
Links
- SPEC R1, R2, R7, R8
- ADR-0007 (Runtime API boundaries)
- ADR-0008 (Tensor deployment)