kernbench2/docs/adr/ADR-0033-latency-model-assumptions.md

# ADR-0033 — Latency Model: Assumptions and Known Simplifications

## Status

Accepted

## Context

The simulator is an analytical, event-driven performance model — not a
cycle-accurate or RTL-level simulator. Many real-HW effects are approximated
or omitted by design. To keep the model auditable and reviewable as a whole,
this ADR consolidates the assumptions in one place. Individual component ADRs
(ADR-0015, ADR-0019, ADR-0004) define the *mechanisms*; this document defines
the *limits of fidelity*.

## Decisions

### D1. Modeled precisely

- **Per-directed-edge BW occupancy** (FIFO serialization via `available_at`) —
  ADR-0015 D2.
- **Per-component switching/overhead latency** (`overhead_ns` attr).
- **HBM per-pseudo-channel parallelism** via stateless `pc_avail[N]` array
  with global round-robin chunking. Burst granularity tunable
  (`burst_bytes`, default 256B). Read and write share each PC's
  `available_at` (real HW command bus is per-PC shared).
- **HBM direction switching penalty mechanism**: per-PC last-direction
  tracking + configurable `switch_penalty_ns`. Default 0 — see D2.
- **Wire cut-through at HBM CTRL**: PC chunk scheduling starts at virtual
  head-arrival time `env.now - txn.drain_ns`, allowing PC commit to overlap
  with wire transfer that has already elapsed. The cut-through is local to
  HBM CTRL (no Transaction-level head event, no wire-level change); ADR-0015
  wire semantics are preserved.

### D2. Approximated (with known directional error)

| Effect | Real HW | Our model | Error direction |
|--------|---------|-----------|----------------|
| Router output port arbitration | Round-robin / weighted | Wire edge FIFO | HoL blocking exaggerated; fairness not modeled |
| Multi-flow BW sharing | Per-flow fair share | FIFO atomic occupancy | Per-txn latency dist. differs; makespan correct |
| HBM scheduler / write buffer | FR-FCFS + watermark drain | FIFO, no reordering | Switching penalty over-charged when alternations are dense — but default `switch_penalty_ns = 0` assumes ideal scheduler amortizes it (Tier 0) |
| Flit/cycle granularity | Discrete flits @ cycle rate | Continuous nbytes | Sub-flit small-message noise |
| Wire cut-through scope | Wormhole at every hop | Cut-through absorbed at HBM CTRL only | Intermediate hops still store-and-forward semantics; acceptable because component overheads at intermediate nodes are size-independent |

### D3. Ignored (out of scope)

- Bank-level row buffer conflict penalty (assume no conflicts — best case;
  round-robin chunk assignment is address-blind so we cannot detect same-bank
  reuse).
- HBM tRP / tRCD / tFAW / tRC timing constraints (absorbed into the steady-state
  `burst_time = burst_bytes / pc_bw_gbs`).
- Refresh, ECC, thermal throttling, power gating.
- Clock domain crossings, PLL lock time.
- Flit-level discrete interleaving on links.
- Upstream backpressure due to downstream buffer occupancy (input ports use
  unbounded `simpy.Store`).

### D4. Workload sensitivity

Workloads where the above simplifications meaningfully affect results:

- **Random scatter/gather**: bank conflict ignored → model optimistic.
- **Heavy mixed R/W intensive** (e.g., GEMM bias accumulation): HBM scheduler
  absent. With default `switch_penalty_ns = 0` we assume ideal amortization;
  setting it non-zero models pessimistic per-alternation cost.
- **High concurrency (>10 active flows on one link)**: HoL blocking and VC
  limits not modeled → model optimistic.
- **Very small (sub-flit) transactions**: flit quantization noise.

### D5. Verification policy

For workloads in D4, cross-check against real HW or a cycle-accurate
simulator before drawing absolute-magnitude conclusions. The model remains
accurate for **relative comparisons** within the modeled regime.

### D6. Future work

- [ ] Bank-level conflict modeling (opt-in via `track_banks: true`).
- [ ] HBM scheduler with write buffer + watermark drain (Tier 2 from the
  design discussion).
- [ ] Fluid wire model for multi-flow router contention.
- [ ] Wire-level cut-through at intermediate routers (currently destination
  HBM CTRL only).
- [ ] Backpressure modeling for finite component buffers.

## Consequences

- Single review point for all model fidelity questions. Each future PR
  touching latency must update the relevant section here.
- Workload-specific magnitude error envelopes are explicit.
- Builder-side derivation of `pc_bw_gbs = hbm_to_router_bw_gbs / num_pcs`
  enforces the ADR-0019 D9 invariant in code rather than relying on yaml
  manual consistency.

## Cross-references

- ADR-0015 — component / port / wire model.
- ADR-0019 — NoC and local HBM topology.
- ADR-0004 — memory semantics, local HBM.