687c98086d
Filename + lifecycle:
- ADR rename to ADR-NNNN-<cat>-title.md with 8 3-letter category prefixes
(dev / mem / lat / prog / algo / par / api / ver). Numbers stay immutable.
- ADR Lifecycle split into 3 folders, documented in CLAUDE.md Part 2:
docs/adr/ (Accepted), docs/adr-proposed/ (Proposed/Stub/Draft),
docs/adr-history/ (Superseded/Merged). Status field gains "Draft" for
retroactive docs pending verification.
Merges (one ADR per topic, no change-history annotations):
- ADR-0017 absorbs ADR-0019 (Cube NOC + per-PE HBM connectivity, 10 D-items)
- ADR-0014 absorbs ADR-0021 (PE pipeline execution model, 8 D-items incl.
TileToken self-routing and multi-op composite epilogue scope)
- ADR-0023 absorbs docs/ipcq-dma-codesign-hw.md as new "HW Realization
Notes (Informative)" section (D16-D23 + Open HW Questions). codesign-hw.md
deleted; ADR-0019/0021 moved to adr-history with one-line stub status
Retroactive documentation (G4 closures, code-verified):
- ADR-0037 forwarding component (TransitComponent: first-flit overhead,
serial worker, path-based routing, single impl/multiple names)
- ADR-0036 IO_CPU component (target_start_ns global barrier stamping,
per-cube fan-out, response aggregation)
- ADR-0035 M_CPU & M_CPU.DMA component (3 fan-out paths, DMA Resources,
target_start_ns passthrough)
- ADR-0034 HBM controller internal design (per-PC state, address-based
selection, flit-aware per-flit commit, async finalize, command-only
fallback path)
Content updates:
- ADR-0010 expanded to full CLI surface (run/probe/web), retitled
"Command Line Interface and Execution Semantics"
- ADR-0007 D2 rewritten to current state; ADR-0015 supersession notes pruned
- ADR-0005 wrapped in Decision header with D1-D5; ADR-0022 metadata
block replaced with standard Status header
- ADR-0024 trimmed to rank=SIP launcher essentials (D1-D4);
ADR-0027 cleaned of supersession history
- ADR-0033 D6 cleanup: address-based PC selection moved out of future-work
(now documented in ADR-0034 D3); related D1/D3 wording realigned
- Cross-references back-filled in 5 ADRs (G3 gaps closed)
Onboarding docs split:
- docs/onboarding/ created
- moved: hw-architecture-overview.md, latency-model.md, di-presentation.md,
ccl-author-guide{,.en}.md
- references updated in README, ADR-0023{,.en}, src/kernbench/ccl/__init__.py
Source / test / yaml: ADR-NNNN cross-references in docstrings and YAML
comments updated after the merges (ADR-0021->0014 D6, ADR-0019->0017 D8).
No behavior change.
Tooling:
- tools/verify_adr_lang_pairs.py + tests/test_verify_adr_lang_pairs.py
(ADR EN/KO pair invariant checker)
- .claude/commands/report.md tracked (/report slash command)
- .gitignore: allow .claude/commands/*.md while keeping settings files ignored
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
373 lines
13 KiB
Markdown
373 lines
13 KiB
Markdown
# Claude Code Instructions (Repo)
|
|
|
|
This repository uses Claude Code with strict architectural and verification rules.
|
|
SPEC.md and ADRs are the source of truth.
|
|
|
|
---
|
|
|
|
# Part 1 — General Behavior
|
|
|
|
> Reusable across repos. Describes *how* Claude Code interacts with the user
|
|
> and constructs changes, independent of this project's domain.
|
|
|
|
## Design Questions
|
|
|
|
- Design / architecture questions are ALWAYS allowed.
|
|
- Design questions MUST NOT modify:
|
|
- production code
|
|
- test code
|
|
- SPEC.md
|
|
- ADRs
|
|
- If a design question implies a change, default to Phase 1.
|
|
|
|
## Surfacing Choices
|
|
|
|
Applies to both design discussions and Phase 1 proposals.
|
|
|
|
- If multiple valid interpretations of the request exist, present them.
|
|
Do NOT pick one silently.
|
|
- If a simpler approach exists, say so. Push back when warranted —
|
|
do NOT just implement the more complex path the user proposed.
|
|
- State required assumptions explicitly. If uncertain, ask before assuming.
|
|
|
|
## Change & Test Protocol (Mandatory)
|
|
|
|
All non-trivial changes MUST follow a two-phase process.
|
|
Design discussion is always allowed.
|
|
Production code changes require Phase 1 approval before Phase 2 applies them.
|
|
|
|
### Phase 1 — Proposal + Verification
|
|
|
|
(No Production Code Changes)
|
|
|
|
#### Purpose
|
|
|
|
- Decide *what* to change and *how it will be validated*
|
|
- Establish verification coverage BEFORE touching production code
|
|
|
|
#### Phase 1 MUST include
|
|
|
|
1) **Design Proposal**
|
|
|
|
- Explain the design change.
|
|
- Explain why the change is needed.
|
|
- Explain consistency with SPEC.md and relevant ADRs.
|
|
|
|
2) **Verification Plan**
|
|
|
|
- SPEC requirement(s) / ADR(s) affected.
|
|
- Tests that validate the change:
|
|
- existing tests to run, and/or
|
|
- new tests to add.
|
|
- Concrete input cases used by the tests.
|
|
- Expected observable assertions.
|
|
- Expected changes (or no changes) in generated artifacts, if applicable.
|
|
|
|
(Project-specific expectations for what these inputs/assertions look like:
|
|
see Part 2 → *Verification Plan — Project Expectations*.)
|
|
|
|
If the Verification Plan is missing or vague, STOP.
|
|
|
|
#### Allowed in Phase 1
|
|
|
|
- Creating or modifying **test code only**
|
|
- Running tests and reporting results
|
|
|
|
#### Forbidden in Phase 1
|
|
|
|
- Any production code changes
|
|
- Any SPEC.md or ADR modifications
|
|
- Final, ready-to-apply unified diffs (Phase 2 only)
|
|
|
|
#### Permitted for design discussion
|
|
|
|
- Pseudocode, interface sketches, type signatures
|
|
- Small illustrative snippets to clarify a design point
|
|
- "Before / after" excerpts (not full diffs)
|
|
|
|
#### Phase 1 Output
|
|
|
|
- Proposal + Verification Plan
|
|
- Tests added/modified (if any)
|
|
- Test execution results (PASS / FAIL)
|
|
- Clear recommendation:
|
|
- "No Phase 2 needed" OR
|
|
- "Await approval for Phase 2"
|
|
|
|
### Phase 2 — Apply + Verify + Rollback
|
|
|
|
#### Trigger
|
|
|
|
Phase 2 is triggered ONLY by the exact user approval phrase:
|
|
|
|
**"ok"**
|
|
|
|
#### Phase 2 Rules
|
|
|
|
- Keep changes minimal and scoped to the approved Phase 1 proposal.
|
|
- Modify only production files declared in Phase 1.
|
|
- Avoid unrelated edits, cleanup, or formatting churn.
|
|
- Automatically apply approved changes to the working tree.
|
|
|
|
#### Mandatory Verification
|
|
|
|
- Run the tests defined in the Phase 1 Verification Plan
|
|
|
|
#### Success Path
|
|
|
|
If ALL tests PASS:
|
|
|
|
- Keep the applied changes
|
|
- Ensure generated artifacts (if affected) are consistent
|
|
- Report success concisely
|
|
|
|
#### Failure Path (Mandatory)
|
|
|
|
If ANY test FAILS:
|
|
|
|
- Immediately rollback ALL Phase 2 changes
|
|
- Do NOT keep partial changes
|
|
- Report:
|
|
- failing test names
|
|
- error messages / assertions
|
|
- brief hypothesis of the root cause
|
|
- Return to Phase 1 state
|
|
|
|
Tests must NEVER be weakened, removed, or altered to force Phase 2 to pass.
|
|
|
|
Failing tests may indicate:
|
|
- invalid assumptions,
|
|
- architectural violations,
|
|
- or incomplete modeling.
|
|
|
|
Do not assume the test is wrong without explicit evidence.
|
|
|
|
## Allowed Exceptions
|
|
|
|
(Protocol Still Required)
|
|
|
|
- comments or docstrings
|
|
- formatting-only changes
|
|
- type annotation changes with no runtime behavior change
|
|
|
|
In exceptions, Phase 1 MUST explicitly state:
|
|
**"No behavior change; tests unchanged."**
|
|
|
|
## Coding Style
|
|
|
|
Applies to all production code changes (Phase 2) and test code (Phase 1).
|
|
The Phase 1/2 protocol decides *whether* and *what* to change;
|
|
this section decides *how* the resulting diff should look.
|
|
|
|
### Simplicity First
|
|
|
|
**Minimum code that solves the problem. Nothing speculative.**
|
|
|
|
- Write the minimum code that satisfies the Phase 1 proposal.
|
|
- No abstractions for single-use code.
|
|
- No "flexibility"/"configurability" not declared in Phase 1.
|
|
- No error handling for impossible scenarios.
|
|
|
|
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
|
|
|
|
### Surgical Changes
|
|
|
|
**Touch only what you must. Clean up only your own mess.**
|
|
|
|
- Touch only files declared in the Phase 1 proposal.
|
|
- Don't "improve" adjacent code, comments, or formatting.
|
|
- Match existing style in the file, even if you'd do it differently.
|
|
- If your changes orphan imports/variables/functions, remove them.
|
|
- If you notice pre-existing dead code, do NOT delete it silently.
|
|
Mention it, and present options:
|
|
(a) delete (with approval),
|
|
(b) keep as-is,
|
|
(c) refactor to make it reachable / repurposed.
|
|
Let the user choose before acting.
|
|
- Every changed line must trace to the Phase 1 proposal.
|
|
|
|
## Enforcement Defaults
|
|
|
|
General fallbacks. Apply to anything not explicitly covered above.
|
|
|
|
- If unsure whether a change is non-trivial → treat it as non-trivial.
|
|
- If unsure whether Phase 2 is allowed → STOP and ask.
|
|
|
|
---
|
|
|
|
# Part 2 — Project-Specific (kernbench)
|
|
|
|
> Specific to this repo's domain (SIP/CUBE/PE topology, runtime API, sim_engine).
|
|
> Replace this entire Part when adapting the framework to another repo.
|
|
>
|
|
> Contains **foundations** (Authority & Scope → Terminology → Terminology
|
|
> Discipline → Mental Model → Common Failure Modes) followed by **rules**
|
|
> (Non-Trivial, Verification Plan, CLI, Derived Artifacts, runtime API /
|
|
> sim_engine Boundaries).
|
|
|
|
## Authority & Scope
|
|
|
|
- SPEC.md defines the architectural contract.
|
|
- ADRs (docs/adr/ADR-*.md) define non-trivial architectural decisions.
|
|
- If a change conflicts with SPEC.md or an ADR:
|
|
- STOP.
|
|
- Explain the conflict.
|
|
- Propose options (keep spec, update ADR, or narrow scope).
|
|
- Do NOT silently change architecture.
|
|
- The repository structure reflects architectural intent; Claude Code MUST respect existing module boundaries and file locations.
|
|
|
|
### ADR Lifecycle
|
|
|
|
ADRs live in one of three folders based on lifecycle state:
|
|
|
|
- `docs/adr/` — **Accepted** (current implementation reflected).
|
|
- `docs/adr-proposed/` — **Proposed**, **Stub**, or **Draft** (design
|
|
only / future-work exploration / retroactive documentation pending
|
|
verification).
|
|
- `docs/adr-history/` — **Superseded** or **Merged** (no longer the
|
|
authoritative source; kept as historical record).
|
|
|
|
Status field values:
|
|
|
|
- `Accepted` — design is in current implementation.
|
|
- `Proposed` — design is concrete but not yet implemented.
|
|
- `Stub (Future Work)` — design space exploration; no commitment yet.
|
|
- `Draft` — retroactive documentation drafted but not yet verified
|
|
against the implementation it describes.
|
|
- `Superseded by ADR-NNNN` — replaced by another ADR.
|
|
- `Merged into ADR-NNNN` — content absorbed by another ADR.
|
|
|
|
Transitions:
|
|
|
|
- **Proposed/Stub → Accepted**: when the ADR's decisions are
|
|
reflected in production code AND covered by tests. `git mv` from
|
|
`docs/adr-proposed/` to `docs/adr/`, change Status to `Accepted`.
|
|
- **Draft → Accepted**: when the ADR's text has been verified to
|
|
accurately describe the existing implementation. `git mv` from
|
|
`docs/adr-proposed/` to `docs/adr/`, change Status to `Accepted`.
|
|
- **Accepted → Superseded**: set Status to `Superseded by ADR-MMMM`
|
|
and `git mv` to `docs/adr-history/`. The superseding ADR includes
|
|
a "Supersedes ADR-NNNN" reference (or, for partial supersession of
|
|
clauses, documents this in its own body).
|
|
- **Accepted → Merged**: set Status to `Merged into ADR-MMMM`
|
|
(single-line stub) and `git mv` to `docs/adr-history/`.
|
|
|
|
Cross-references between ADRs use the `ADR-NNNN` ID and remain valid
|
|
regardless of folder location. ADR numbers are **immutable**; never
|
|
renumber. Numbering holes from moved ADRs are expected.
|
|
|
|
## Terminology
|
|
|
|
- runtime API:
|
|
Host-facing public API used by benchmarks and user code (e.g., tensor deployment, kernel launch).
|
|
- simulation engine (sim_engine):
|
|
Discrete-event engine responsible for request injection, scheduling, and completion tracking.
|
|
- components:
|
|
Device-side nodes modeling hardware behavior (IO_CPU, M_CPU, PE_CPU, routers, engines, etc.).
|
|
|
|
## Terminology Discipline
|
|
|
|
Use only terms established in SPEC.md, ADRs, existing notes, or code.
|
|
Do not coin new terms (status labels, tiers, classifications, role names)
|
|
without explicit user approval. When a needed term is missing or ambiguous,
|
|
ask before introducing one. When proposing a rename, show the existing
|
|
term and the proposed change side-by-side and wait for approval.
|
|
|
|
## Mental Model
|
|
|
|
The simulator is layered along **request flow**:
|
|
|
|
runtime API (host-facing: tensor ops, kernel launch;
|
|
topology-agnostic, no routing — ADR-0007)
|
|
↓
|
|
sim_engine (schedules events, routes requests,
|
|
tracks completion via correlation IDs)
|
|
↓
|
|
components (device-side nodes: IO_CPU, M_CPU, PE_CPU,
|
|
routers, engines — model HW behavior
|
|
including interconnect)
|
|
|
|
Configuration & decisions (orthogonal to request flow):
|
|
- **topology** — compiled at config time (ADR-0006); defines which
|
|
components exist and how they connect. Authoritative graph for sim_engine.
|
|
- **policy** (routing / address / placement) — consulted by sim_engine
|
|
during request handling.
|
|
|
|
Invariant: all latency arises from **explicit scheduled events on modeled
|
|
components and links** (SPEC §0.1, R8). No implicit waits, no magic delays.
|
|
|
|
Stay within layer boundaries; do not collapse or bypass for convenience.
|
|
|
|
## Common Failure Modes
|
|
|
|
Anti-patterns that violate the Mental Model or Golden Invariants (SPEC §0.1).
|
|
If your change does any of these, STOP and reconsider.
|
|
|
|
- **runtime topology mutation** — topology is compiled at config time; do not
|
|
add/remove nodes or edges during simulation (ADR-0006).
|
|
- **nondeterministic iteration order** — never iterate sets, unordered dicts,
|
|
or anything else with implementation-defined order on the critical path.
|
|
Determinism is required (SPEC §0.1).
|
|
- **routing policy inside runtime API** — runtime API is topology-agnostic;
|
|
routing/fan-out belongs in policy + sim_engine (ADR-0007).
|
|
- **latency modeled outside sim_engine scheduling** — every delay must come
|
|
from an explicit scheduled event on a modeled component or link
|
|
(SPEC §0.1, R8). No magic sleeps, no hardcoded constants smuggled in.
|
|
- **hidden cross-layer coupling** — do not skip layer interfaces.
|
|
e.g., runtime API must not call into components directly, bypassing sim_engine.
|
|
- **silent ADR/SPEC reinterpretation** — surface conflicts; do not paper over them.
|
|
See *Authority & Scope* above.
|
|
- **weakening tests to make Phase 2 pass** — fix the code, not the test.
|
|
See *Part 1 → Phase 2 → Failure Path*.
|
|
- **asserting from memory without source check** — quantitative
|
|
architectural facts (topology counts, sizes, latencies, address widths,
|
|
port arities) must be sourced from SPEC.md or a specific ADR before
|
|
assertion. Memory is unreliable. If the source is silent, surface the
|
|
gap rather than guessing.
|
|
|
|
## What Counts as "Non-Trivial"
|
|
|
|
(Protocol Required)
|
|
|
|
Any of the following:
|
|
|
|
- routing policy or ordering changes
|
|
- topology builder changes (nodes, links, parameters)
|
|
- address decoding / PhysAddr behavior
|
|
- latency composition rules
|
|
- changes affecting determinism or connectivity
|
|
- changes touching two or more production files
|
|
|
|
## Verification Plan — Project Expectations
|
|
|
|
Concrete forms that Part 1's *Verification Plan* MUST take in this repo:
|
|
|
|
- SPEC requirement(s) / ADR(s) affected (e.g., R1/R2/R5, ADR-0002).
|
|
- Concrete input cases:
|
|
- topology (SIP / CUBE / PE layout)
|
|
- request parameters (src, dst, size_bytes).
|
|
- Expected observable assertions, such as:
|
|
- hop trace contains key waypoints,
|
|
- latency invariants (e.g., > 0, monotonic increase),
|
|
- deterministic route selection.
|
|
- **expected changes (or no changes) in generated diagrams**, if applicable.
|
|
|
|
## CLI Semantics
|
|
|
|
- `kernbench run --device <id>` runs the benchmark on a single device.
|
|
- Omitting `--device` runs the benchmark on all devices discovered in the topology (logically parallel).
|
|
- Device enumeration is handled by the CLI only; benchmarks MUST remain single-device.
|
|
|
|
## Derived Artifacts (Clarification)
|
|
|
|
- Generated diagrams under `docs/diagrams/` are **derived artifacts**, not production code.
|
|
- Creating or updating files in `docs/diagrams/`:
|
|
- does NOT count as a production code change,
|
|
- does NOT require Phase 2 approval,
|
|
- MUST be consistent with SPEC.md and ADRs.
|
|
|
|
## runtime API / sim_engine Boundaries
|
|
|
|
- runtime API MUST NOT hardcode topology/routing or internal hop sequences.
|
|
- sim_engine MUST remain independent of runtime API semantics (no tensor/kernel policy logic).
|