# ADR-0012: Host ↔ IO_CPU Message Schema (PA-first, PE-tagged) ## Status Accepted ## Context Phase 0 uses a PA-first memory model (ADR-0011): - memory operations use device physical addresses (PA) only, - VA/MMU/IOMMU is not modeled. The host-facing runtime API interacts with the device via the IO_CPU endpoint. We define stable, minimal message schemas for Host ↔ IO_CPU so that: - benchmarks remain stable, - IO_CPU-internal fan-out/aggregation can evolve independently, - completion and failure propagation is deterministic. We also require PE-tagging (Scheme A): each shard explicitly carries (sip,cube,pe) so IO_CPU can deterministically route/fan-out without relying on PA decoding. --- ## Decision ### D1. Contract scope This schema is the stable contract ONLY for Host ↔ IO_CPU. Messages beyond IO_CPU (to M_CPU, PE_CPU, schedulers, engines) are component-internal and are NOT part of this host contract in Phase 0. --- ### D2. Required message set The runtime API MUST use only these message types for Host ↔ IO_CPU: - MemoryWrite - MemoryRead - KernelLaunch All operations required by benchmarks (tensor init/copy, kernel run) MUST be expressible with these messages. --- ### D3. Common envelope (mandatory for all requests) All Host ↔ IO_CPU requests MUST include: - `msg_type: str` - `correlation_id: str` - generated by the host - used to match responses deterministically - `request_id: str` - unique within a correlation_id - `target_device: str` - device identifier (e.g., "sip:0") - `timestamp_tag: str | None` (optional) - debug tag only; MUST NOT affect determinism All Host ↔ IO_CPU responses MUST include: - `correlation_id: str` - `request_id: str` - `completion: Completion` --- ### D4. Completion schema (mandatory) `Completion` MUST have: - `ok: bool` - `error_code: str | None` - `error_message: str | None` Rules: - If `ok == true` then `error_code` and `error_message` MUST be null. - If `ok == false` then `error_code` MUST be non-null. - Completion semantics MUST be deterministic. --- ### D5. MemoryWrite schema (PA-first, PE-tagged) `MemoryWrite` represents a host-initiated write/initialize operation to device memory. Mandatory fields: - common envelope fields (D3) - destination placement tags (Scheme A): - `dst_sip: int` - `dst_cube: int` - `dst_pe: int` - `dst_pa: int` - destination physical address in the destination PE's address space - `nbytes: int` - `src_kind: "pattern" | "host_buffer_ref"` - Phase 0 MUST support "pattern" - `pattern: Pattern | None` - required if `src_kind == "pattern"` `Pattern` (Phase 0 mandatory support): - `pattern_kind: "zero" | "fill_u8" | "fill_u16" | "fill_u32" | "fill_fp16" | "fill_fp32"` - `value: number | None` - required for fill_*; ignored for zero Optional fields: - `dst_mem_kind: "HBM" | "TCM" | "AUTO"` (default "AUTO") - `debug_label: str | None` Notes: - This message MUST NOT embed bulk tensor data in Phase 0. - All latency MUST come from explicit graph traversal and modeled components. --- ### D6. MemoryRead schema (PA-first, PE-tagged) `MemoryRead` represents a host-initiated read from device memory. Mandatory fields: - common envelope fields (D3) - source placement tags (Scheme A): - `src_sip: int` - `src_cube: int` - `src_pe: int` - `src_pa: int` - `nbytes: int` Optional fields: - `dst_kind: "host_sink" | "discard"` (default "host_sink") - `debug_label: str | None` Response payload: - actual bytes are NOT required in Phase 0 (latency/traces focus) - implementations MAY return lightweight stats or hashes later via a new ADR --- ### D7. KernelLaunch schema (PA-first, PE-tagged shards) `KernelLaunch` represents launching a kernel on a target device via IO_CPU. Mandatory fields: - common envelope fields (D3) - `kernel_ref: KernelRef` - `args: list[KernelArg]` `KernelRef` MUST have: - `name: str` - `kind: "deployed" | "builtin"` - `deploy_pa: int | None` — PA where kernel binary was deployed (required for "deployed") - `deploy_sip: int` — SIP where binary resides - `deploy_cube: int` — cube where binary resides - `deploy_pe: int` — PE where binary resides - `nbytes_code: int` — kernel binary size (for BW modeling) Kernel binaries MUST be pre-deployed to device memory via MemoryWrite. KernelLaunch MUST NOT embed kernel source code or IR in the launch message. `KernelArg` supports tensor args by PA mapping and scalars by value. Tensor arg (mandatory): - `arg_kind: "tensor"` - `tensor_pa_map: TensorPAMap` `TensorPAMap` MUST have: - `shards: list[TensorShard]` `TensorShard` MUST have (Scheme A enforced): - `sip: int` - `cube: int` - `pe: int` - `pa: int` - `nbytes: int` - `offset_bytes: int` Scalar arg (mandatory): - `arg_kind: "scalar"` - `dtype: "i32" | "i64" | "fp16" | "fp32" | "bool"` - `value: number | bool` Optional KernelLaunch fields: - `grid: dict | None` - `meta: dict | None` - `failure_policy: "fail_fast" | "collect_all"` (default "fail_fast") - `debug_label: str | None` Notes: - KernelLaunch MUST NOT embed bulk tensor data. - KernelLaunch MUST be submitted only to the IO_CPU endpoint. - IO_CPU MUST fan-out work internally using the shard (sip,cube,pe) tags. --- ## Verification Notes Tests SHOULD validate: - schema validation rejects missing mandatory fields, - deterministic correlation/response matching, - MemoryWrite/Read/KernelLaunch produce explicit hop traces, - all routed requests incur latency > 0. --- ## Links - ADR-0011 (Memory Addressing — PA / VA / LA) - ADR-0007 (runtime_api vs sim_engine boundaries) - ADR-0009 (kernel execution fan-out/aggregation) - ADR-0013 (Verification strategy — V1 message schema validation) - SPEC R2, R7, R8