# ADR-0010: Command Line Interface and Execution Semantics ## Status Accepted ## Context The `kernbench` CLI is the user-facing entry point of the simulator. It exposes three subcommands: - `run` — execute a benchmark against a topology. - `probe` — diagnostic utility for latency / BW measurement. - `web` — interactive topology viewer. Device enumeration is centralized in the CLI; neither the runtime API nor the simulation engine enumerates devices. Benchmarks remain single-device by design and accept a device identifier as input. ## Decision ### D1. Benchmark contract — single-device by design - A benchmark MUST define behavior for a single device only. - A benchmark MUST accept a device identifier as input. - Benchmarks MUST NOT enumerate or loop over multiple devices. Multi-device execution is the CLI's concern (D3), not the benchmark's. ### D2. `kernbench run` — benchmark execution Required arguments: - `--topology `: topology YAML file path. Loaded via `resolve_topology()`. - `--bench `: benchmark name. Resolved via `benches.loader.resolve_bench()`. Optional arguments: - `--device ` (default: `all`): - `all` — run once per discovered SIP (see D3). - `sip:` — run only on SIP N. - Parsed via `resolve_device()`. - `--verify-data` (default: off) — enable Phase 2 data verification (see ADR-0020). When set, `engine_factory` constructs the engine with `enable_data=True`. After the benchmark runs, a diagnostic summary of recorded ops is printed. Each invocation runs the benchmark once within a single simulation instance. ### D3. Multi-device execution is logically parallel When `--device all` (or omitted) and the topology has multiple SIPs: - Benchmark executions are submitted to a single simulation engine instance. - Executions are logically parallel in simulation time. - Inter-device contention is naturally modeled (shared fabric bandwidth, cross-SIP traffic, etc.). The CLI does NOT spawn multiple OS processes or independent simulation runs — parallelism is internal to one simulation instance. ### D4. `kernbench probe` — latency / BW diagnostic utility Required argument: - `--topology `: topology YAML file path. Optional argument: - `--case ` (default: `all`) — run a predefined traffic pattern, or `all` to run every defined case. Probe runs each pattern through the simulation engine and reports per case: - End-to-end latency (ns). - Effective bandwidth (nbytes / total_ns). - Bottleneck bandwidth (min edge BW along the chosen path). - Utilization (effective / bottleneck). Probe additionally validates monotonicity invariants — for example that local-HBM access ≤ cross-PE-within-cube ≤ cross-cube ≤ cross-SIP — and reports violations. Probe is a developer tool for verifying the latency / BW model; it is not a benchmark. ### D5. `kernbench web` — topology viewer Optional arguments: - `--port ` (default: `8765`) — HTTP port. - `--no-open` — do not auto-open the browser. Launches a local HTTP server that renders the compiled topology in the browser. Distinct from the static `docs/diagrams/` artifacts: - `docs/diagrams/` files are derived at topology-compile time (ADR-0006). - `kernbench web` is interactive — pan/zoom, hover for component attributes, switch between SIP / CUBE / PE views. ### D6. Runtime API and simulation engine remain device-scoped - Runtime API calls operate on one device per invocation. - The simulation engine schedules all requests deterministically. - Neither layer enumerates devices. This invariant keeps each layer testable in isolation; device enumeration and multi-device fan-out live only in the CLI's `run` command (D3). ## Consequences - Benchmark authors write single-device logic; multi-device behavior emerges from the CLI dispatching across SIPs. - Adding a new subcommand (e.g., trace export, replay) does not require benchmark or runtime-API changes — the CLI is the extension point. - `probe` and `web` are diagnostic / visualization tools, not benchmarks; they bypass the benchmark loader path. ## Links - SPEC R7, R8, R9 - ADR-0007 (Runtime API and Simulation Engine Boundaries) - ADR-0020 (Two-pass data execution — `--verify-data`) - ADR-0006 (Topology compilation and diagram generation — background for `kernbench web`)