kernbench2/CLAUDE.md

# Claude Code Instructions (Repo)

This repository uses Claude Code with strict architectural and verification rules.
SPEC.md and ADRs are the source of truth.

---

# Part 1 — General Behavior

> Reusable across repos. Describes *how* Claude Code interacts with the user
> and constructs changes, independent of this project's domain.

## Design Questions

- Design / architecture questions are ALWAYS allowed.
- Design questions MUST NOT modify:
  - production code
  - test code
  - SPEC.md
  - ADRs
- If a design question implies a change, default to Phase 1.

## Surfacing Choices

Applies to both design discussions and Phase 1 proposals.

- If multiple valid interpretations of the request exist, present them.
  Do NOT pick one silently.
- If a simpler approach exists, say so. Push back when warranted —
  do NOT just implement the more complex path the user proposed.
- State required assumptions explicitly. If uncertain, ask before assuming.

## Change & Test Protocol (Mandatory)

All non-trivial changes MUST follow a two-phase process.
Design discussion is always allowed.
Production code changes require Phase 1 approval before Phase 2 applies them.

### Phase 1 — Proposal + Verification

(No Production Code Changes)

#### Purpose

- Decide *what* to change and *how it will be validated*
- Establish verification coverage BEFORE touching production code

#### Phase 1 MUST include

1) **Design Proposal**

- Explain the design change.
- Explain why the change is needed.
- Explain consistency with SPEC.md and relevant ADRs.

2) **Verification Plan**

- SPEC requirement(s) / ADR(s) affected.
- Tests that validate the change:
  - existing tests to run, and/or
  - new tests to add.
- Concrete input cases used by the tests.
- Expected observable assertions.
- Expected changes (or no changes) in generated artifacts, if applicable.

(Project-specific expectations for what these inputs/assertions look like:
see Part 2 → *Verification Plan — Project Expectations*.)

If the Verification Plan is missing or vague, STOP.

#### Allowed in Phase 1

- Creating or modifying **test code only**
- Running tests and reporting results

#### Forbidden in Phase 1

- Any production code changes
- Any SPEC.md or ADR modifications
- Final, ready-to-apply unified diffs (Phase 2 only)

#### Permitted for design discussion

- Pseudocode, interface sketches, type signatures
- Small illustrative snippets to clarify a design point
- "Before / after" excerpts (not full diffs)

#### Phase 1 Output

- Proposal + Verification Plan
- Tests added/modified (if any)
- Test execution results (PASS / FAIL)
- Clear recommendation:
  - "No Phase 2 needed" OR
  - "Await approval for Phase 2"

### Phase 2 — Apply + Verify + Rollback

#### Trigger

Phase 2 is triggered ONLY by the exact user approval phrase:

**"ok"**

#### Phase 2 Rules

- Keep changes minimal and scoped to the approved Phase 1 proposal.
- Modify only production files declared in Phase 1.
- Avoid unrelated edits, cleanup, or formatting churn.
- Automatically apply approved changes to the working tree.

#### Mandatory Verification

- Run the tests defined in the Phase 1 Verification Plan

#### Success Path

If ALL tests PASS:

- Keep the applied changes
- Ensure generated artifacts (if affected) are consistent
- Report success concisely

#### Failure Path (Mandatory)

If ANY test FAILS:

- Immediately rollback ALL Phase 2 changes
- Do NOT keep partial changes
- Report:
  - failing test names
  - error messages / assertions
  - brief hypothesis of the root cause
- Return to Phase 1 state

Tests must NEVER be weakened, removed, or altered to force Phase 2 to pass.

Failing tests may indicate:
- invalid assumptions,
- architectural violations,
- or incomplete modeling.

Do not assume the test is wrong without explicit evidence.

## Allowed Exceptions

(Protocol Still Required)

- comments or docstrings
- formatting-only changes
- type annotation changes with no runtime behavior change

In exceptions, Phase 1 MUST explicitly state:
**"No behavior change; tests unchanged."**

## Coding Style

Applies to all production code changes (Phase 2) and test code (Phase 1).
The Phase 1/2 protocol decides *whether* and *what* to change;
this section decides *how* the resulting diff should look.

### Simplicity First

**Minimum code that solves the problem. Nothing speculative.**

- Write the minimum code that satisfies the Phase 1 proposal.
- No abstractions for single-use code.
- No "flexibility"/"configurability" not declared in Phase 1.
- No error handling for impossible scenarios.

Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

### Surgical Changes

**Touch only what you must. Clean up only your own mess.**

- Touch only files declared in the Phase 1 proposal.
- Don't "improve" adjacent code, comments, or formatting.
- Match existing style in the file, even if you'd do it differently.
- If your changes orphan imports/variables/functions, remove them.
- If you notice pre-existing dead code, do NOT delete it silently.
  Mention it, and present options:
    (a) delete (with approval),
    (b) keep as-is,
    (c) refactor to make it reachable / repurposed.
  Let the user choose before acting.
- Every changed line must trace to the Phase 1 proposal.

## Enforcement Defaults

General fallbacks. Apply to anything not explicitly covered above.

- If unsure whether a change is non-trivial → treat it as non-trivial.
- If unsure whether Phase 2 is allowed → STOP and ask.

---

# Part 2 — Project-Specific (kernbench)

> Specific to this repo's domain (SIP/CUBE/PE topology, runtime API, sim_engine).
> Replace this entire Part when adapting the framework to another repo.
>
> Contains **foundations** (Authority & Scope → Terminology → Terminology
> Discipline → Mental Model → Common Failure Modes) followed by **rules**
> (Non-Trivial, Verification Plan, CLI, Derived Artifacts, ADR Translation
> Discipline, runtime API / sim_engine Boundaries).

## Authority & Scope

- SPEC.md defines the architectural contract.
- ADRs (docs/adr/ADR-*.md) define non-trivial architectural decisions.
- If a change conflicts with SPEC.md or an ADR:
  - STOP.
  - Explain the conflict.
  - Propose options (keep spec, update ADR, or narrow scope).
- Do NOT silently change architecture.
- The repository structure reflects architectural intent; Claude Code MUST respect existing module boundaries and file locations.

### ADR Lifecycle

ADRs live in one of four folders. Three carry **canonical English**
content based on lifecycle state; the fourth holds Korean translations:

- `docs/adr/` — **Accepted** (canonical English; current
  implementation reflected).
- `docs/adr-proposed/` — **Proposed**, **Stub**, or **Draft** (design
  only / future-work exploration / retroactive documentation pending
  verification). **Authoring language is free** (any language); the
  promotion step (below) translates to English.
- `docs/adr-history/` — **Superseded** or **Merged** (no longer the
  authoritative source; kept as historical record). Frozen — language
  policy not applied retroactively.
- `docs/adr-ko/` — Korean translations of accepted ADRs (derived
  artifact, 1:1 mirror of `docs/adr/`). English in `docs/adr/` is the
  canonical source of truth; when KO and EN disagree, EN wins. See
  *ADR Translation Discipline* below.

Status field values:

- `Accepted` — design is in current implementation.
- `Proposed` — design is concrete but not yet implemented.
- `Stub (Future Work)` — design space exploration; no commitment yet.
- `Draft` — retroactive documentation drafted but not yet verified
  against the implementation it describes.
- `Superseded by ADR-NNNN` — replaced by another ADR.
- `Merged into ADR-NNNN` — content absorbed by another ADR.

Transitions:

- **Proposed/Stub → Accepted**: when the ADR's decisions are
  reflected in production code AND covered by tests. If the proposed
  ADR is in Korean, translate to English and place the English in
  `docs/adr/`; move the Korean original to `docs/adr-ko/`. If the
  proposed ADR is in English, `git mv` it to `docs/adr/` and create
  the Korean translation in `docs/adr-ko/`. Change Status to
  `Accepted` in both files.
- **Draft → Accepted**: when the ADR's text has been verified to
  accurately describe the existing implementation. Same English /
  Korean placement rule as above.
- **Accepted → Superseded**: set Status to `Superseded by ADR-MMMM`
  in both the EN and KO files and `git mv` both to their respective
  history locations (`docs/adr-history/` for English; the KO copy
  stays in `docs/adr-ko/` only if it was already mirrored — see *ADR
  Translation Discipline* for the frozen-history exception).
- **Accepted → Merged**: set Status to `Merged into ADR-MMMM`
  (single-line stub) in both files and apply the same `git mv` rule
  as the Superseded transition.

Cross-references between ADRs use the `ADR-NNNN` ID and remain valid
regardless of folder location. ADR numbers are **immutable**; never
renumber. Numbering holes from moved ADRs are expected.

## Terminology

- runtime API:
  Host-facing public API used by benchmarks and user code (e.g., tensor deployment, kernel launch).
- simulation engine (sim_engine):
  Discrete-event engine responsible for request injection, scheduling, and completion tracking.
- components:
  Device-side nodes modeling hardware behavior (IO_CPU, M_CPU, PE_CPU, routers, engines, etc.).

## Terminology Discipline

Use only terms established in SPEC.md, ADRs, existing notes, or code.
Do not coin new terms (status labels, tiers, classifications, role names)
without explicit user approval. When a needed term is missing or ambiguous,
ask before introducing one. When proposing a rename, show the existing
term and the proposed change side-by-side and wait for approval.

## Mental Model

The simulator is layered along **request flow**:

  runtime API           (host-facing: tensor ops, kernel launch;
                         topology-agnostic, no routing — ADR-0007)
       ↓
  sim_engine            (schedules events, routes requests,
                         tracks completion via correlation IDs)
       ↓
  components            (device-side nodes: IO_CPU, M_CPU, PE_CPU,
                         routers, engines — model HW behavior
                         including interconnect)

Configuration & decisions (orthogonal to request flow):
- **topology**  — compiled at config time (ADR-0006); defines which
  components exist and how they connect. Authoritative graph for sim_engine.
- **policy** (routing / address / placement) — consulted by sim_engine
  during request handling.

Invariant: all latency arises from **explicit scheduled events on modeled
components and links** (SPEC §0.1, R8). No implicit waits, no magic delays.

Stay within layer boundaries; do not collapse or bypass for convenience.

## Common Failure Modes

Anti-patterns that violate the Mental Model or Golden Invariants (SPEC §0.1).
If your change does any of these, STOP and reconsider.

- **runtime topology mutation** — topology is compiled at config time; do not
  add/remove nodes or edges during simulation (ADR-0006).
- **nondeterministic iteration order** — never iterate sets, unordered dicts,
  or anything else with implementation-defined order on the critical path.
  Determinism is required (SPEC §0.1).
- **routing policy inside runtime API** — runtime API is topology-agnostic;
  routing/fan-out belongs in policy + sim_engine (ADR-0007).
- **latency modeled outside sim_engine scheduling** — every delay must come
  from an explicit scheduled event on a modeled component or link
  (SPEC §0.1, R8). No magic sleeps, no hardcoded constants smuggled in.
- **hidden cross-layer coupling** — do not skip layer interfaces.
  e.g., runtime API must not call into components directly, bypassing sim_engine.
- **silent ADR/SPEC reinterpretation** — surface conflicts; do not paper over them.
  See *Authority & Scope* above.
- **weakening tests to make Phase 2 pass** — fix the code, not the test.
  See *Part 1 → Phase 2 → Failure Path*.
- **asserting from memory without source check** — quantitative
  architectural facts (topology counts, sizes, latencies, address widths,
  port arities) must be sourced from SPEC.md or a specific ADR before
  assertion. Memory is unreliable. If the source is silent, surface the
  gap rather than guessing.

## What Counts as "Non-Trivial"

(Protocol Required)

Any of the following:

- routing policy or ordering changes
- topology builder changes (nodes, links, parameters)
- address decoding / PhysAddr behavior
- latency composition rules
- changes affecting determinism or connectivity
- changes touching two or more production files

## Verification Plan — Project Expectations

Concrete forms that Part 1's *Verification Plan* MUST take in this repo:

- SPEC requirement(s) / ADR(s) affected (e.g., R1/R2/R5, ADR-0002).
- Concrete input cases:
  - topology (SIP / CUBE / PE layout)
  - request parameters (src, dst, size_bytes).
- Expected observable assertions, such as:
  - hop trace contains key waypoints,
  - latency invariants (e.g., > 0, monotonic increase),
  - deterministic route selection.
  - **expected changes (or no changes) in generated diagrams**, if applicable.

## CLI Semantics

- `kernbench run --device <id>` runs the benchmark on a single device.
- Omitting `--device` runs the benchmark on all devices discovered in the topology (logically parallel).
- Device enumeration is handled by the CLI only; benchmarks MUST remain single-device.
- **Eval-bench exception (ADR-0054)**: a *milestone / eval bench*
  (`milestone-1h-*`) may drive many configurations and build its own
  per-config engines to regenerate a domain's full result + figure set; it
  ignores `--device` and submits a sentinel tensor to satisfy the
  "must submit ≥1 request" contract (ADR-0045 D4). This is the eval-harness
  carve-out to the single-device rule, alongside the ADR-0024 multi-SIP CCL
  exception.

## Derived Artifacts (Clarification)

- Generated diagrams under `docs/diagrams/` are **derived artifacts**, not production code.
- Korean ADR translations under `docs/adr-ko/` are **derived artifacts**
  (mirror of the canonical English in `docs/adr/`); see *ADR Translation
  Discipline*.
- Creating or updating files in `docs/diagrams/` or `docs/adr-ko/`:
  - does NOT count as a production code change,
  - does NOT require Phase 2 approval,
  - MUST be consistent with SPEC.md and ADRs.

## ADR Translation Discipline

English in `docs/adr/` is the canonical source of truth. Korean in
`docs/adr-ko/` mirrors it 1:1 as a derived artifact.

**Bidirectional sync rule (MUST)**: any edit to a file in `docs/adr/`
must be accompanied, in the same change, by a mirroring edit to
`docs/adr-ko/<same-filename>.md`. The reverse also applies: edits to
`docs/adr-ko/` must mirror back into `docs/adr/`. The two files must
always describe the same architectural content.

Mechanics:

- When editing an EN ADR, propagate the change to its KO counterpart
  by translating just the diff (preserve unaffected KO prose); do not
  regenerate the whole KO file from scratch.
- When editing a KO ADR, propagate to EN the same way.
- Filename mirror: `docs/adr/X.md` ↔ `docs/adr-ko/X.md` (no language
  suffix in either path).
- The `## Status` *lifecycle keyword* (`Accepted`, `Proposed`,
  `Stub (Future Work)`, `Draft`, `Superseded by ADR-NNNN`,
  `Merged into ADR-NNNN`) must match between EN and KO. Parenthetical
  commentary and any list items that follow the keyword may be
  translated naturally (the verify tool ignores them when comparing).
- Conflict policy: if the two diverge despite the rule, treat EN as
  authoritative and overwrite KO. Surface the divergence to the user
  before reconciling.
- `docs/adr-proposed/` is exempt — single language only, no mirror
  required until promotion.
- `docs/adr-history/` is frozen — pre-existing mixed-language state
  there is not migrated.

Verification: `python tools/verify_adr_lang_pairs.py` checks that
every EN ADR has a matching KO file, the title's ADR-NNNN matches the
filename, and Status blocks are byte-equal. Run it on demand or wire
it into CI. Exit code: 0 = OK, 1 = mismatch.

## runtime API / sim_engine Boundaries

- runtime API MUST NOT hardcode topology/routing or internal hop sequences.
- sim_engine MUST remain independent of runtime API semantics (no tensor/kernel policy logic).