ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

Filename + lifecycle:
- ADR rename to ADR-NNNN-<cat>-title.md with 8 3-letter category prefixes
  (dev / mem / lat / prog / algo / par / api / ver). Numbers stay immutable.
- ADR Lifecycle split into 3 folders, documented in CLAUDE.md Part 2:
  docs/adr/ (Accepted), docs/adr-proposed/ (Proposed/Stub/Draft),
  docs/adr-history/ (Superseded/Merged). Status field gains "Draft" for
  retroactive docs pending verification.

Merges (one ADR per topic, no change-history annotations):
- ADR-0017 absorbs ADR-0019 (Cube NOC + per-PE HBM connectivity, 10 D-items)
- ADR-0014 absorbs ADR-0021 (PE pipeline execution model, 8 D-items incl.
  TileToken self-routing and multi-op composite epilogue scope)
- ADR-0023 absorbs docs/ipcq-dma-codesign-hw.md as new "HW Realization
  Notes (Informative)" section (D16-D23 + Open HW Questions). codesign-hw.md
  deleted; ADR-0019/0021 moved to adr-history with one-line stub status

Retroactive documentation (G4 closures, code-verified):
- ADR-0037 forwarding component (TransitComponent: first-flit overhead,
  serial worker, path-based routing, single impl/multiple names)
- ADR-0036 IO_CPU component (target_start_ns global barrier stamping,
  per-cube fan-out, response aggregation)
- ADR-0035 M_CPU & M_CPU.DMA component (3 fan-out paths, DMA Resources,
  target_start_ns passthrough)
- ADR-0034 HBM controller internal design (per-PC state, address-based
  selection, flit-aware per-flit commit, async finalize, command-only
  fallback path)

Content updates:
- ADR-0010 expanded to full CLI surface (run/probe/web), retitled
  "Command Line Interface and Execution Semantics"
- ADR-0007 D2 rewritten to current state; ADR-0015 supersession notes pruned
- ADR-0005 wrapped in Decision header with D1-D5; ADR-0022 metadata
  block replaced with standard Status header
- ADR-0024 trimmed to rank=SIP launcher essentials (D1-D4);
  ADR-0027 cleaned of supersession history
- ADR-0033 D6 cleanup: address-based PC selection moved out of future-work
  (now documented in ADR-0034 D3); related D1/D3 wording realigned
- Cross-references back-filled in 5 ADRs (G3 gaps closed)

Onboarding docs split:
- docs/onboarding/ created
- moved: hw-architecture-overview.md, latency-model.md, di-presentation.md,
  ccl-author-guide{,.en}.md
- references updated in README, ADR-0023{,.en}, src/kernbench/ccl/__init__.py

Source / test / yaml: ADR-NNNN cross-references in docstrings and YAML
comments updated after the merges (ADR-0021->0014 D6, ADR-0019->0017 D8).
No behavior change.

Tooling:
- tools/verify_adr_lang_pairs.py + tests/test_verify_adr_lang_pairs.py
  (ADR EN/KO pair invariant checker)
- .claude/commands/report.md tracked (/report slash command)
- .gitignore: allow .claude/commands/*.md while keeping settings files ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 01:15:55 -07:00
parent 22fd0d2b9d
commit 687c98086d
97 changed files with 3286 additions and 3766 deletions
+1 -1
View File
@@ -1,4 +1,4 @@
"""End-to-end pipeline tests (ADR-0020 + ADR-0021).
"""End-to-end pipeline tests (ADR-0020 + ADR-0014).
Verifies:
1. Actual benchmark kernel → greenlet mode → op_log → DataExecutor → accuracy
+1 -1
View File
@@ -68,7 +68,7 @@ def _path_drain_for_write(eng: GraphEngine, msg: MemoryWriteMsg) -> float:
def test_builder_derives_pc_bw_gbs():
"""Topology builder must inject `pc_bw_gbs = hbm_to_router_bw_gbs / num_pcs`
as an attr on every hbm_ctrl node. Enforces ADR-0019 D9 invariant
as an attr on every hbm_ctrl node. Enforces ADR-0017 D8 invariant
(channels_per_PE × per-PC BW = aggregated link BW) at build time.
"""
handle = resolve_topology(str(TOPOLOGY_PATH))
+4 -7
View File
@@ -192,13 +192,10 @@ def test_hbm_pe_hop_charged_at_large_payload(tmp_path):
chunk of latency from the PE↔HBM hop on send and recv, so the
total HBM/TCM gap should clearly clear the threshold below.
Threshold history: the gap was 4 µs under the over-consolidated
single-hbm_ctrl model (commit 5917b34), inflated by serialization
on the shared HBM controller. With ADR-0019 D1 per-PE HBM CTRL
restored, each PE's slice runs on its own controller with no
cross-PE contention, so the IPCQ pattern (each PE writes its own
slice) drops the gap to ≈ 1.7 µs — still well above the bare
slot-IO term, confirming the PE↔HBM hop is being charged.
Under ADR-0017 D4 per-PE HBM CTRL, each PE's slice runs on its own
controller with no cross-PE contention, so the IPCQ pattern (each
PE writes its own slice) yields a gap of ≈ 1.7 µs — well above the
bare slot-IO term, confirming the PE↔HBM hop is being charged.
"""
n_elem = 16384 # 32 KB / PE
lat_tcm = _run_allreduce_with_buffer_kind(
+11 -14
View File
@@ -1,4 +1,4 @@
"""Tests for CUBE NOC Explicit Router Mesh (ADR-0019).
"""Tests for CUBE NOC Explicit Router Mesh (ADR-0017).
Key changes verified:
- Explicit router nodes per cube from cube_mesh.yaml (6×6 grid)
@@ -125,14 +125,14 @@ def test_mesh_file_pe_corner_positions():
def test_mesh_file_no_xbar_section():
"""mesh output must not contain xbar section (ADR-0019 D2)."""
"""mesh output must not contain xbar section (ADR-0017 D1)."""
_graph()
mesh = yaml.safe_load(MESH_PATH.read_text())
assert "xbar" not in mesh, "xbar section should be removed from cube_mesh.yaml"
def test_mesh_file_pe_hbm_attached():
"""PE routers must have pe{idx}.hbm in attach list (ADR-0019 D1)."""
"""PE routers must have pe{idx}.hbm in attach list (ADR-0017 D4)."""
_graph()
mesh = yaml.safe_load(MESH_PATH.read_text())
for rid, rdata in mesh["routers"].items():
@@ -235,7 +235,7 @@ def test_mesh_ucie_all_four_directions():
# ══════════════════════════════════════════════════════════════════
# 2. Topology Graph: Explicit Router Mesh (ADR-0019)
# 2. Topology Graph: Explicit Router Mesh (ADR-0017)
# ══════════════════════════════════════════════════════════════════
@@ -247,7 +247,7 @@ def test_router_nodes_exist():
def test_no_xbar_or_bridge_nodes():
"""xbar/bridge nodes must not exist (ADR-0019 D2)."""
"""xbar/bridge nodes must not exist (ADR-0017 D1)."""
graph = _graph()
bad = [n for n in graph.nodes if "xbar" in n or "bridge" in n]
assert len(bad) == 0, f"Old xbar/bridge nodes found: {bad[:5]}"
@@ -260,11 +260,10 @@ def test_no_single_noc_node():
def test_per_pe_hbm_ctrl_nodes():
"""Each cube has 8 per-PE HBM CTRL instances (ADR-0019 D1).
"""Each cube has 8 per-PE HBM CTRL instances (ADR-0017 D4).
Restored from over-consolidation in commit 5917b34. The legacy
single ``sip0.cube0.hbm_ctrl`` is gone; each PE owns its own
``hbm_ctrl.pe{X}`` reachable through that PE's attaching router.
Each PE owns its own ``hbm_ctrl.pe{X}`` reachable through that PE's
attaching router. No cube-wide single ``hbm_ctrl`` node exists.
"""
graph = _graph()
for pe in range(8):
@@ -272,7 +271,7 @@ def test_per_pe_hbm_ctrl_nodes():
# Legacy single hbm_ctrl must not exist
legacy_id = "sip0.cube0.hbm_ctrl"
assert legacy_id not in graph.nodes, (
f"legacy {legacy_id} must be removed (per-PE partitioning, ADR-0019 D1)"
f"legacy {legacy_id} must not exist (per-PE partitioning, ADR-0017 D4)"
)
@@ -297,9 +296,7 @@ def test_pe_dma_connects_to_router():
def test_each_hbm_ctrl_connects_only_to_owning_router():
"""Each ``hbm_ctrl.pe{X}`` must have exactly one router edge
(router_to_hbm + hbm_to_router) to its owning PE's attaching
router (ADR-0019 D4). Replaces a prior test that asserted the
single hbm_ctrl was connected to all routers — that asserted the
spec-violating consolidation introduced in commit 5917b34.
router (ADR-0017 D7).
"""
graph = _graph()
pe_router = {0: "r0c0", 1: "r0c1", 2: "r1c4", 3: "r1c5",
@@ -513,7 +510,7 @@ def test_null_routers_excluded():
# ══════════════════════════════════════════════════════════════════
# 7. Router Mesh Latency (ADR-0019)
# 7. Router Mesh Latency (ADR-0017)
# ══════════════════════════════════════════════════════════════════
+1 -1
View File
@@ -1,4 +1,4 @@
"""Tests for ADR-0021 PE pipeline: TileToken self-routing, pipeline overlap, e2e accuracy.
"""Tests for ADR-0014 D6 PE pipeline: TileToken self-routing, pipeline overlap, e2e accuracy.
Test plan items:
3. Phase 1 → Phase 2 end-to-end (op_log → DataExecutor → verify)
+9 -16
View File
@@ -1,18 +1,13 @@
"""Tests for ADR-0019 D1/D4 per-PE HBM partitioning.
"""Tests for ADR-0017 D4/D7 per-PE HBM partitioning.
Restores the architectural property that was lost in commit 5917b34
(2026-04-04 "Replace xbar/bridge/single-NOC with explicit router mesh"),
which over-consolidated 8 per-slice HBM CTRL nodes into one cube-wide
HBM CTRL connected to every router. ADR-0019 D1/D4 specifies:
ADR-0017 D4/D7 specifies:
- Each PE owns 8 of the cube's 64 pseudo-channels (PE_X → PCs 8X..8X+7).
- HBM CTRL is split per-PE: ``hbm_ctrl.pe{X}`` is reachable ONLY through
PE_X's attaching router. Accessing PE_Y's slice from PE_X requires
mesh routing to r_Y_attach before entering hbm_ctrl.pe{Y}.
These tests are written BEFORE the production change and are expected
to FAIL on current code (HBM CTRL is a single ``hbm_ctrl`` node attached
to all routers). Phase 2 must make them PASS without weakening
These tests enforce that property without weakening
assertions.
"""
from __future__ import annotations
@@ -66,16 +61,16 @@ def test_topology_has_8_hbm_ctrl_per_cube():
for pe in range(8):
nid = f"sip0.cube0.hbm_ctrl.pe{pe}"
assert nid in graph.nodes, (
f"Expected per-PE HBM CTRL node {nid!r} (ADR-0019 D1)"
f"Expected per-PE HBM CTRL node {nid!r} (ADR-0017 D4)"
)
node = graph.nodes[nid]
assert int(node.attrs.get("num_pcs", 0)) == 8, (
f"{nid} must have num_pcs=8; got {node.attrs.get('num_pcs')}"
)
# Legacy single hbm_ctrl must not exist
# Cube-wide single hbm_ctrl must not exist
assert "sip0.cube0.hbm_ctrl" not in graph.nodes, (
"Legacy single sip0.cube0.hbm_ctrl must be removed in favor of "
"per-PE hbm_ctrl.pe{X} (ADR-0019 D1)"
"Cube-wide single sip0.cube0.hbm_ctrl must not exist; only "
"per-PE hbm_ctrl.pe{X} (ADR-0017 D4)"
)
@@ -199,10 +194,8 @@ def test_probe_cli_intra_cube_cases_are_monotonic():
"""Probe CLI cases must show monotonic latency:
pe-local-hbm < pe-same-half-hbm < pe-cross-half-hbm.
Prior to per-PE partitioning these three return identical latency
because all roads lead to the same hbm_ctrl. With ADR-0019 D4
restored, same-half (pe0→pe1) is 1 mesh hop further than local,
and cross-half (pe0→pe4) is several hops further.
Per ADR-0017 D7, same-half (pe0→pe1) is 1 mesh hop further than
local, and cross-half (pe0→pe4) is several hops further.
"""
graph = _graph()
spec = graph.spec
+5 -8
View File
@@ -17,7 +17,7 @@ def _graph():
def test_resolve_hbm_addr():
"""HBM address -> sip{S}.cube{C}.hbm_ctrl.pe{X} (per-PE controller, ADR-0019 D1)."""
"""HBM address -> sip{S}.cube{C}.hbm_ctrl.pe{X} (per-PE controller, ADR-0017 D9)."""
g = _graph()
resolver = AddressResolver(g)
# offset 0x1000 falls inside PE0's slice (slice_size = 6 GB)
@@ -102,16 +102,13 @@ def test_path_remote_pe_hbm():
assert not any("xbar" in n or "bridge" in n for n in path)
# ── PathRouter: cross-PE HBM distance reflects mesh hops (ADR-0019 D4) ─
# ── PathRouter: cross-PE HBM distance reflects mesh hops (ADR-0017 D7) ─
def test_cross_pe_hbm_distance_increases_with_mesh_hops():
"""Restored ADR-0019 D4 behavior: accessing another PE's HBM slice
must take more routing distance than accessing one's own slice,
because each per-PE hbm_ctrl is reachable only via its PE's router.
Replaces a previous ``test_all_pe_hbm_equidistant`` that asserted the
over-consolidated (spec-violating) behavior introduced in 5917b34.
"""ADR-0017 D7: accessing another PE's HBM slice must take more
routing distance than accessing one's own slice, because each
per-PE hbm_ctrl is reachable only via its PE's router.
"""
g = _graph()
router = PathRouter(g)
+7 -8
View File
@@ -21,7 +21,7 @@ def test_full_graph_node_count():
# + 20 ucie (4 ports x (1 port + 4 conn))
# + 8 PEs x 9 pe_comps)) (ADR-0023: +pe_ipcq)
# IO: pcie_ep + io_cpu + noc + 4 io_ucie_ports + 4*4 io_ucie_conn = 23
# cube: 32 + 10 + 20 + 72 = 134 (was 127; ADR-0019 D1 per-PE HBM CTRL)
# cube: 32 + 10 + 20 + 72 = 134 (per-PE HBM CTRL, ADR-0017 D4)
# = 1 + 2*(23 + 16*134) = 1 + 2*(23+2144) = 1 + 4334 = 4335
assert len(g.nodes) == 4335
@@ -29,9 +29,9 @@ def test_full_graph_node_count():
def test_full_graph_edge_count():
g = _graph()
# ADR-0023: +3 IPCQ edges per PE
# ADR-0019 D1 (restored): HBM↔router edges drop from 32 routers × 2
# to 8 PE-routers × 2 per cube. 32 cubes × (16-64) = -1536 edges.
# Multi-op composite (ADR-0021): +1 gemm→math edge per PE for
# ADR-0017 D4: HBM↔router edges = 8 PE-routers × 2 per cube
# (per-PE partition; not all 32 routers).
# Multi-op composite (ADR-0014 D3.3): +1 gemm→math edge per PE for
# epilogue chaining = 2 SIPs × 16 cubes × 8 PEs = +256 edges.
assert len(g.edges) == 12412
@@ -73,7 +73,7 @@ def test_cube_component_nodes_exist():
# Null holes must not exist
for null_rc in ("r2c2", "r2c3", "r3c2", "r3c3"):
assert f"{cp}.{null_rc}" not in g.nodes
# Per-PE HBM CTRL (ADR-0019 D1) — 8 instances, no legacy single node
# Per-PE HBM CTRL (ADR-0017 D4) — 8 instances; no cube-wide single node
for pe in range(8):
nid = f"{cp}.hbm_ctrl.pe{pe}"
assert g.nodes[nid].kind == "hbm_ctrl"
@@ -94,7 +94,7 @@ def test_pe_component_nodes_exist():
def test_hbm_ctrl_at_cube_center():
g = _graph()
# Per-PE hbm_ctrl nodes share the cube's HBM placement (ADR-0019 D1)
# Per-PE hbm_ctrl nodes share the cube's HBM placement (ADR-0017 D4)
# cube0 origin = (0, 0), hbm at (6.5, 7.0)
for pe in range(8):
node = g.nodes[f"sip0.cube0.hbm_ctrl.pe{pe}"]
@@ -190,8 +190,7 @@ def test_pe_internal_edges():
def test_per_pe_hbm_ctrl_connects_only_to_owning_router():
"""Each hbm_ctrl.pe{X} connects ONLY to PE_X's attaching router
(ADR-0019 D4). Replaces a prior test that asserted the
spec-violating all-routers consolidation (commit 5917b34)."""
(ADR-0017 D7)."""
g = _graph()
es = _edge_set(g)
cp = "sip0.cube0"
+1 -1
View File
@@ -56,7 +56,7 @@ def test_initialize_mismatched_ws_raises(topology):
def test_get_tp_rank_is_greenlet_local(topology):
"""D3: get_tensor_model_parallel_rank returns greenlet-local rank
(delegates to torch.distributed.get_rank, ADR-0024 D9)."""
(delegates to torch.distributed.get_rank, ADR-0024 D2)."""
import kernbench.tp as tp
with _make_ctx(topology) as ctx:
+107
View File
@@ -0,0 +1,107 @@
"""Tests for tools/verify_adr_lang_pairs.py."""
from __future__ import annotations
import sys
from pathlib import Path
_REPO_ROOT = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(_REPO_ROOT / "tools"))
import verify_adr_lang_pairs as v # noqa: E402
def _make_adr(
path: Path,
title_id: str,
title_text: str = "Some Title",
status: str = "Accepted",
) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
f"# ADR-{title_id}: {title_text}\n\n"
f"## Status\n\n{status}\n\n"
f"## Context\n\nbody\n",
encoding="utf-8",
)
def test_complete_pairs_pass(tmp_path: Path) -> None:
_make_adr(tmp_path / "docs/adr/ADR-0001-foo-bar.md", "0001", "Foo EN")
_make_adr(tmp_path / "docs/adr-ko/ADR-0001-foo-bar.md", "0001", "Foo KO")
assert v.verify(tmp_path) == []
def test_empty_dirs_pass(tmp_path: Path) -> None:
assert v.verify(tmp_path) == []
def test_missing_ko_fails(tmp_path: Path) -> None:
_make_adr(tmp_path / "docs/adr/ADR-0001-foo-bar.md", "0001")
errs = v.verify(tmp_path)
assert any("missing KO" in e and "ADR-0001-foo-bar.md" in e for e in errs)
def test_orphan_ko_fails(tmp_path: Path) -> None:
_make_adr(tmp_path / "docs/adr-ko/ADR-0001-foo-bar.md", "0001")
errs = v.verify(tmp_path)
assert any("orphan KO" in e and "ADR-0001-foo-bar.md" in e for e in errs)
def test_status_mismatch_fails(tmp_path: Path) -> None:
_make_adr(tmp_path / "docs/adr/ADR-0001-foo-bar.md", "0001", status="Accepted")
_make_adr(tmp_path / "docs/adr-ko/ADR-0001-foo-bar.md", "0001", status="Proposed")
errs = v.verify(tmp_path)
assert any("Status block mismatch" in e for e in errs)
def test_title_id_mismatch_fails(tmp_path: Path) -> None:
_make_adr(tmp_path / "docs/adr/ADR-0001-foo-bar.md", "0002")
_make_adr(tmp_path / "docs/adr-ko/ADR-0001-foo-bar.md", "0001")
errs = v.verify(tmp_path)
assert any("EN title ADR-ID" in e for e in errs)
def test_multiline_status_with_parenthetical_passes(tmp_path: Path) -> None:
"""Real ADRs like ADR-0001 have multi-line Status with revision notes."""
multiline_status = (
"Accepted (Revision 2 - 2026-04-27: concrete bit layout,\n"
"Supersedes ADR-0031.)"
)
_make_adr(
tmp_path / "docs/adr/ADR-0001-foo-bar.md", "0001", status=multiline_status
)
_make_adr(
tmp_path / "docs/adr-ko/ADR-0001-foo-bar.md", "0001", status=multiline_status
)
assert v.verify(tmp_path) == []
def test_crlf_normalization(tmp_path: Path) -> None:
"""KO has CRLF, EN has LF; Status content is otherwise identical -> pass."""
en = tmp_path / "docs/adr/ADR-0001-foo-bar.md"
ko = tmp_path / "docs/adr-ko/ADR-0001-foo-bar.md"
en.parent.mkdir(parents=True, exist_ok=True)
ko.parent.mkdir(parents=True, exist_ok=True)
en.write_bytes(
b"# ADR-0001: Foo\n\n## Status\n\nAccepted\n\n## Context\n\nbody\n"
)
ko.write_bytes(
b"# ADR-0001: Foo\r\n\r\n## Status\r\n\r\nAccepted\r\n\r\n## Context\r\n\r\nbody\r\n"
)
assert v.verify(tmp_path) == []
def test_underscore_in_slug_recognized(tmp_path: Path) -> None:
"""ADR-0013 uses an underscore in its slug; the regex must accept it."""
_make_adr(tmp_path / "docs/adr/ADR-0013-ver-verification_strategy.md", "0013")
_make_adr(tmp_path / "docs/adr-ko/ADR-0013-ver-verification_strategy.md", "0013")
assert v.verify(tmp_path) == []
def test_main_exit_codes(tmp_path: Path, capsys) -> None:
assert v.main(["--root", str(tmp_path)]) == 0
_make_adr(tmp_path / "docs/adr/ADR-0001-foo-bar.md", "0001")
assert v.main(["--root", str(tmp_path)]) == 1
out = capsys.readouterr().out
assert "FAILED" in out