Filename + lifecycle:
- ADR rename to ADR-NNNN-<cat>-title.md with 8 3-letter category prefixes
(dev / mem / lat / prog / algo / par / api / ver). Numbers stay immutable.
- ADR Lifecycle split into 3 folders, documented in CLAUDE.md Part 2:
docs/adr/ (Accepted), docs/adr-proposed/ (Proposed/Stub/Draft),
docs/adr-history/ (Superseded/Merged). Status field gains "Draft" for
retroactive docs pending verification.
Merges (one ADR per topic, no change-history annotations):
- ADR-0017 absorbs ADR-0019 (Cube NOC + per-PE HBM connectivity, 10 D-items)
- ADR-0014 absorbs ADR-0021 (PE pipeline execution model, 8 D-items incl.
TileToken self-routing and multi-op composite epilogue scope)
- ADR-0023 absorbs docs/ipcq-dma-codesign-hw.md as new "HW Realization
Notes (Informative)" section (D16-D23 + Open HW Questions). codesign-hw.md
deleted; ADR-0019/0021 moved to adr-history with one-line stub status
Retroactive documentation (G4 closures, code-verified):
- ADR-0037 forwarding component (TransitComponent: first-flit overhead,
serial worker, path-based routing, single impl/multiple names)
- ADR-0036 IO_CPU component (target_start_ns global barrier stamping,
per-cube fan-out, response aggregation)
- ADR-0035 M_CPU & M_CPU.DMA component (3 fan-out paths, DMA Resources,
target_start_ns passthrough)
- ADR-0034 HBM controller internal design (per-PC state, address-based
selection, flit-aware per-flit commit, async finalize, command-only
fallback path)
Content updates:
- ADR-0010 expanded to full CLI surface (run/probe/web), retitled
"Command Line Interface and Execution Semantics"
- ADR-0007 D2 rewritten to current state; ADR-0015 supersession notes pruned
- ADR-0005 wrapped in Decision header with D1-D5; ADR-0022 metadata
block replaced with standard Status header
- ADR-0024 trimmed to rank=SIP launcher essentials (D1-D4);
ADR-0027 cleaned of supersession history
- ADR-0033 D6 cleanup: address-based PC selection moved out of future-work
(now documented in ADR-0034 D3); related D1/D3 wording realigned
- Cross-references back-filled in 5 ADRs (G3 gaps closed)
Onboarding docs split:
- docs/onboarding/ created
- moved: hw-architecture-overview.md, latency-model.md, di-presentation.md,
ccl-author-guide{,.en}.md
- references updated in README, ADR-0023{,.en}, src/kernbench/ccl/__init__.py
Source / test / yaml: ADR-NNNN cross-references in docstrings and YAML
comments updated after the merges (ADR-0021->0014 D6, ADR-0019->0017 D8).
No behavior change.
Tooling:
- tools/verify_adr_lang_pairs.py + tests/test_verify_adr_lang_pairs.py
(ADR EN/KO pair invariant checker)
- .claude/commands/report.md tracked (/report slash command)
- .gitignore: allow .claude/commands/*.md while keeping settings files ignored
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.6 KiB
ADR-0012: Host ↔ IO_CPU Message Schema (PA-first, PE-tagged)
Status
Accepted
Context
Phase 0 uses a PA-first memory model (ADR-0011):
- memory operations use device physical addresses (PA) only,
- VA/MMU/IOMMU is not modeled.
The host-facing runtime API interacts with the device via the IO_CPU endpoint. We define stable, minimal message schemas for Host ↔ IO_CPU so that:
- benchmarks remain stable,
- IO_CPU-internal fan-out/aggregation can evolve independently,
- completion and failure propagation is deterministic.
We also require PE-tagging (A 방식): each shard explicitly carries (sip,cube,pe) so IO_CPU can deterministically route/fan-out without relying on PA decoding.
Decision
D1. Contract scope
This schema is the stable contract ONLY for Host ↔ IO_CPU.
Messages beyond IO_CPU (to M_CPU, PE_CPU, schedulers, engines) are component-internal and are NOT part of this host contract in Phase 0.
D2. Required message set
The runtime API MUST use only these message types for Host ↔ IO_CPU:
- MemoryWrite
- MemoryRead
- KernelLaunch
All operations required by benchmarks (tensor init/copy, kernel run) MUST be expressible with these messages.
D3. Common envelope (mandatory for all requests)
All Host ↔ IO_CPU requests MUST include:
msg_type: strcorrelation_id: str- generated by the host
- used to match responses deterministically
request_id: str- unique within a correlation_id
target_device: str- device identifier (e.g., "sip:0")
timestamp_tag: str | None(optional)- debug tag only; MUST NOT affect determinism
All Host ↔ IO_CPU responses MUST include:
correlation_id: strrequest_id: strcompletion: Completion
D4. Completion schema (mandatory)
Completion MUST have:
ok: boolerror_code: str | Noneerror_message: str | None
Rules:
- If
ok == truethenerror_codeanderror_messageMUST be null. - If
ok == falsethenerror_codeMUST be non-null. - Completion semantics MUST be deterministic.
D5. MemoryWrite schema (PA-first, PE-tagged)
MemoryWrite represents a host-initiated write/initialize operation to device memory.
Mandatory fields:
- common envelope fields (D3)
- destination placement tags (A 방식):
dst_sip: intdst_cube: intdst_pe: int
dst_pa: int- destination physical address in the destination PE's address space
nbytes: intsrc_kind: "pattern" | "host_buffer_ref"- Phase 0 MUST support "pattern"
pattern: Pattern | None- required if
src_kind == "pattern"
- required if
Pattern (Phase 0 mandatory support):
pattern_kind: "zero" | "fill_u8" | "fill_u16" | "fill_u32" | "fill_fp16" | "fill_fp32"value: number | None- required for fill_*; ignored for zero
Optional fields:
dst_mem_kind: "HBM" | "TCM" | "AUTO"(default "AUTO")debug_label: str | None
Notes:
- This message MUST NOT embed bulk tensor data in Phase 0.
- All latency MUST come from explicit graph traversal and modeled components.
D6. MemoryRead schema (PA-first, PE-tagged)
MemoryRead represents a host-initiated read from device memory.
Mandatory fields:
- common envelope fields (D3)
- source placement tags (A 방식):
src_sip: intsrc_cube: intsrc_pe: int
src_pa: intnbytes: int
Optional fields:
dst_kind: "host_sink" | "discard"(default "host_sink")debug_label: str | None
Response payload:
- actual bytes are NOT required in Phase 0 (latency/traces focus)
- implementations MAY return lightweight stats or hashes later via a new ADR
D7. KernelLaunch schema (PA-first, PE-tagged shards)
KernelLaunch represents launching a kernel on a target device via IO_CPU.
Mandatory fields:
- common envelope fields (D3)
kernel_ref: KernelRefargs: list[KernelArg]
KernelRef MUST have:
name: strkind: "deployed" | "builtin"deploy_pa: int | None— PA where kernel binary was deployed (required for "deployed")deploy_sip: int— SIP where binary residesdeploy_cube: int— cube where binary residesdeploy_pe: int— PE where binary residesnbytes_code: int— kernel binary size (for BW modeling)
Kernel binaries MUST be pre-deployed to device memory via MemoryWrite. KernelLaunch MUST NOT embed kernel source code or IR in the launch message.
KernelArg supports tensor args by PA mapping and scalars by value.
Tensor arg (mandatory):
arg_kind: "tensor"tensor_pa_map: TensorPAMap
TensorPAMap MUST have:
shards: list[TensorShard]
TensorShard MUST have (A 방식 강제):
sip: intcube: intpe: intpa: intnbytes: intoffset_bytes: int
Scalar arg (mandatory):
arg_kind: "scalar"dtype: "i32" | "i64" | "fp16" | "fp32" | "bool"value: number | bool
Optional KernelLaunch fields:
grid: dict | Nonemeta: dict | Nonefailure_policy: "fail_fast" | "collect_all"(default "fail_fast")debug_label: str | None
Notes:
- KernelLaunch MUST NOT embed bulk tensor data.
- KernelLaunch MUST be submitted only to the IO_CPU endpoint.
- IO_CPU MUST fan-out work internally using the shard (sip,cube,pe) tags.
Verification Notes
Tests SHOULD validate:
- schema validation rejects missing mandatory fields,
- deterministic correlation/response matching,
- MemoryWrite/Read/KernelLaunch produce explicit hop traces,
- all routed requests incur latency > 0.
Links
- ADR-0011 (Memory Addressing — PA / VA / LA)
- ADR-0007 (runtime_api vs sim_engine boundaries)
- ADR-0009 (kernel execution fan-out/aggregation)
- ADR-0013 (Verification strategy — V1 message schema validation)
- SPEC R2, R7, R8