Files
kernbench2/docs/adr/ADR-0013-verification_strategy.md
T
ywkang 22fd0d2b9d ADR: introduce docs/history/, merge 0011+0018, prune migration cruft
- CLAUDE.md: add ADR Lifecycle subsection (superseded → docs/history/,
  immutable numbering, no renumber)
- ADR-0011: merge ADR-0018 content as "Address Model: LA" section
  alongside PA / VA; status notes VA model is currently implemented
- ADR-0018 / 0029 / 0031: moved to docs/history/ with status updates
  (0018 merged into 0011, 0029 superseded by 0032, 0031 absorbed
  into 0001 rev 2)
- ADR-0019: rewrite Context as PE-HBM connectivity decision
  (self-contained, no LA model framing)
- ADR-0019/0020/0021/0023/0025/0027: Status Proposed → Accepted
  (code verified) and prune Implementation Notes / Affected files /
  Test strategy / "현재 상태" sub-sections describing pre-impl state
- ADR-0024/0026: same migration-flavor cleanup; 0026 also drops D6
  Migration and D8 docs-update sub-decisions
- ADR-0030: status simplified (blocker ADR-0031 now superseded)
- SPEC.md: R10 + §0.2 reflect PA / VA / LA model names
- ADR-0008/0012/0013: refresh ADR-0011 subtitle in Links

21 files changed, 553 insertions(+), 1290 deletions(-).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:42:45 -07:00

3.7 KiB

ADR-0013: Verification Strategy and Phase 1 Test Plan

Status

Accepted

Context

KernBench is a system-level simulator whose correctness is defined by:

  • adherence to SPEC-defined invariants,
  • determinism and debuggability,
  • explicit modeling of routing and latency.

Given the evolving implementation, we need a stable verification strategy that prevents architectural drift while allowing incremental development.

This ADR defines the Phase 1 verification plan and what constitutes "correct behavior" for early implementations.


Decision

D1. Verification is contract-based

Verification MUST be derived from:

  • SPEC requirements,
  • accepted ADRs.

Tests MUST validate architectural contracts, not incidental implementation details.


D2. Phase 1 verification scope

Phase 1 verification focuses on:

  • message contract validity (ADR-0012),
  • routing and fan-out semantics at the IO_CPU boundary (ADR-0009),
  • PA-first memory addressing and shard tagging (ADR-0011),
  • core latency and trace invariants (SPEC 0.1, R2).

Microarchitectural accuracy, bandwidth contention, and cycle-level behavior are explicitly out of scope in Phase 1.


D3. Required Phase 1 verification cases

The following verification cases MUST be supported by the implementation:

V1. Message schema validation

  • KernelLaunch requests missing (sip, cube, pe) in any tensor shard MUST be rejected.
  • MemoryWrite/MemoryRead requests missing destination/source placement tags MUST be rejected.
  • Completion results MUST follow the ok / error_code / error_message contract.

V2. IO_CPU fan-out and aggregation

Given:

  • a topology with one SIP, one CUBE, and two PEs,
  • a KernelLaunch request containing two tensor shards targeting different PEs,

The system MUST:

  • submit a single KernelLaunch to IO_CPU,
  • fan-out work internally to both PEs,
  • aggregate completion and return a single deterministic completion to the host.

V3. Latency and trace invariants

For any valid request:

  • the hop-by-hop trace MUST be non-empty,
  • total latency MUST be greater than zero,
  • repeated runs with identical inputs MUST produce identical traces.

V4. Topology independence and cross-domain coverage

Verification cases MUST pass for multiple topology shapes, including:

  • minimal: (1 SIP, 1 CUBE, 1 PE)
  • multi-PE: (1 SIP, 1 CUBE, N PEs)
  • multi-CUBE within a SIP: (1 SIP, M CUBEs, ≥1 PE per CUBE)
  • multi-SIP tray: (K SIPs, ≥1 CUBE per SIP, ≥1 PE per CUBE)

For multi-CUBE and multi-SIP topologies, Phase 1 verification focuses on:

  • explicit connectivity (required links exist),
  • deterministic routing and control-path traversal,
  • non-empty traces and latency > 0 for representative cross-domain requests (inter-CUBE and inter-SIP paths).

D4. Phase 1 artifacts

Phase 1 MAY include:

  • verification-only test code,
  • topology fixtures,
  • trace inspection utilities.

Phase 1 MUST NOT require:

  • production code changes solely to satisfy tests,
  • weakening or removing tests to allow progress.

D5. Phase 2 enforcement

Phase 2 (Apply) MUST:

  • run the Phase 1 verification cases,
  • rollback all changes if any verification fails,
  • preserve tests as authoritative contracts.

Consequences

  • Architectural correctness is enforced early.
  • Tests serve as executable documentation of system behavior.
  • Implementation remains flexible without losing rigor.

  • SPEC 0.1, R2, R6
  • ADR-0011 (Memory Addressing — PA / VA / LA)
  • ADR-0012 (Host ↔ IO_CPU message schema)
  • ADR-0009 (Kernel execution semantics)