Files
kernbench2/docs/adr-ko/ADR-0012-api-host-io-message-schema.md
T
ywkang a796c1d2f7 ADR: bilingual structure — EN canonical in adr/, KO mirror in adr-ko/
Establish English as the canonical ADR language with Korean translations
held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror).
Promotion from adr-proposed/ to adr/ now writes English to adr/ and the
Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md.

- Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English,
  2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix
  dropped). ADR-0023 EN regenerated against KO source which had newer
  HW Realization Notes (D16-D23) section.
- docs/adr-history/ left frozen by design (transitional state).
- CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark
  docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline
  section covering bidirectional sync, conflict resolution (EN wins),
  and proposed-language freedom.
- tools/verify_adr_lang_pairs.py: new verification tool checking pair
  completeness, filename mirroring, ADR-ID match, Status byte-equality.
  Pre-commit hook intentionally not added; run on demand or in CI.
- tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF
  normalization, em-dash title separator, underscore-slug edge case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:38:44 -07:00

5.6 KiB

ADR-0012: Host ↔ IO_CPU Message Schema (PA-first, PE-tagged)

Status

Accepted

Context

Phase 0 uses a PA-first memory model (ADR-0011):

  • memory operations use device physical addresses (PA) only,
  • VA/MMU/IOMMU is not modeled.

The host-facing runtime API interacts with the device via the IO_CPU endpoint. We define stable, minimal message schemas for Host ↔ IO_CPU so that:

  • benchmarks remain stable,
  • IO_CPU-internal fan-out/aggregation can evolve independently,
  • completion and failure propagation is deterministic.

We also require PE-tagging (A 방식): each shard explicitly carries (sip,cube,pe) so IO_CPU can deterministically route/fan-out without relying on PA decoding.


Decision

D1. Contract scope

This schema is the stable contract ONLY for Host ↔ IO_CPU.

Messages beyond IO_CPU (to M_CPU, PE_CPU, schedulers, engines) are component-internal and are NOT part of this host contract in Phase 0.


D2. Required message set

The runtime API MUST use only these message types for Host ↔ IO_CPU:

  • MemoryWrite
  • MemoryRead
  • KernelLaunch

All operations required by benchmarks (tensor init/copy, kernel run) MUST be expressible with these messages.


D3. Common envelope (mandatory for all requests)

All Host ↔ IO_CPU requests MUST include:

  • msg_type: str
  • correlation_id: str
    • generated by the host
    • used to match responses deterministically
  • request_id: str
    • unique within a correlation_id
  • target_device: str
    • device identifier (e.g., "sip:0")
  • timestamp_tag: str | None (optional)
    • debug tag only; MUST NOT affect determinism

All Host ↔ IO_CPU responses MUST include:

  • correlation_id: str
  • request_id: str
  • completion: Completion

D4. Completion schema (mandatory)

Completion MUST have:

  • ok: bool
  • error_code: str | None
  • error_message: str | None

Rules:

  • If ok == true then error_code and error_message MUST be null.
  • If ok == false then error_code MUST be non-null.
  • Completion semantics MUST be deterministic.

D5. MemoryWrite schema (PA-first, PE-tagged)

MemoryWrite represents a host-initiated write/initialize operation to device memory.

Mandatory fields:

  • common envelope fields (D3)
  • destination placement tags (A 방식):
    • dst_sip: int
    • dst_cube: int
    • dst_pe: int
  • dst_pa: int
    • destination physical address in the destination PE's address space
  • nbytes: int
  • src_kind: "pattern" | "host_buffer_ref"
    • Phase 0 MUST support "pattern"
  • pattern: Pattern | None
    • required if src_kind == "pattern"

Pattern (Phase 0 mandatory support):

  • pattern_kind: "zero" | "fill_u8" | "fill_u16" | "fill_u32" | "fill_fp16" | "fill_fp32"
  • value: number | None
    • required for fill_*; ignored for zero

Optional fields:

  • dst_mem_kind: "HBM" | "TCM" | "AUTO" (default "AUTO")
  • debug_label: str | None

Notes:

  • This message MUST NOT embed bulk tensor data in Phase 0.
  • All latency MUST come from explicit graph traversal and modeled components.

D6. MemoryRead schema (PA-first, PE-tagged)

MemoryRead represents a host-initiated read from device memory.

Mandatory fields:

  • common envelope fields (D3)
  • source placement tags (A 방식):
    • src_sip: int
    • src_cube: int
    • src_pe: int
  • src_pa: int
  • nbytes: int

Optional fields:

  • dst_kind: "host_sink" | "discard" (default "host_sink")
  • debug_label: str | None

Response payload:

  • actual bytes are NOT required in Phase 0 (latency/traces focus)
  • implementations MAY return lightweight stats or hashes later via a new ADR

D7. KernelLaunch schema (PA-first, PE-tagged shards)

KernelLaunch represents launching a kernel on a target device via IO_CPU.

Mandatory fields:

  • common envelope fields (D3)
  • kernel_ref: KernelRef
  • args: list[KernelArg]

KernelRef MUST have:

  • name: str
  • kind: "deployed" | "builtin"
  • deploy_pa: int | None — PA where kernel binary was deployed (required for "deployed")
  • deploy_sip: int — SIP where binary resides
  • deploy_cube: int — cube where binary resides
  • deploy_pe: int — PE where binary resides
  • nbytes_code: int — kernel binary size (for BW modeling)

Kernel binaries MUST be pre-deployed to device memory via MemoryWrite. KernelLaunch MUST NOT embed kernel source code or IR in the launch message.

KernelArg supports tensor args by PA mapping and scalars by value.

Tensor arg (mandatory):

  • arg_kind: "tensor"
  • tensor_pa_map: TensorPAMap

TensorPAMap MUST have:

  • shards: list[TensorShard]

TensorShard MUST have (A 방식 강제):

  • sip: int
  • cube: int
  • pe: int
  • pa: int
  • nbytes: int
  • offset_bytes: int

Scalar arg (mandatory):

  • arg_kind: "scalar"
  • dtype: "i32" | "i64" | "fp16" | "fp32" | "bool"
  • value: number | bool

Optional KernelLaunch fields:

  • grid: dict | None
  • meta: dict | None
  • failure_policy: "fail_fast" | "collect_all" (default "fail_fast")
  • debug_label: str | None

Notes:

  • KernelLaunch MUST NOT embed bulk tensor data.
  • KernelLaunch MUST be submitted only to the IO_CPU endpoint.
  • IO_CPU MUST fan-out work internally using the shard (sip,cube,pe) tags.

Verification Notes

Tests SHOULD validate:

  • schema validation rejects missing mandatory fields,
  • deterministic correlation/response matching,
  • MemoryWrite/Read/KernelLaunch produce explicit hop traces,
  • all routed requests incur latency > 0.

  • ADR-0011 (Memory Addressing — PA / VA / LA)
  • ADR-0007 (runtime_api vs sim_engine boundaries)
  • ADR-0009 (kernel execution fan-out/aggregation)
  • ADR-0013 (Verification strategy — V1 message schema validation)
  • SPEC R2, R7, R8