# ADR-0033 — Latency Model: Assumptions and Known Simplifications ## Status Accepted ## Context The simulator is an analytical, event-driven performance model — not a cycle-accurate or RTL-level simulator. Many real-HW effects are approximated or omitted by design. To keep the model auditable and reviewable as a whole, this ADR consolidates the assumptions in one place. Individual component ADRs (ADR-0015, ADR-0019, ADR-0004) define the *mechanisms*; this document defines the *limits of fidelity*. ## Decisions ### D1. Modeled precisely - **Per-directed-edge BW occupancy** (FIFO serialization via `available_at`) — ADR-0015 D2. - **Per-component switching/overhead latency** (`overhead_ns` attr). - **HBM per-pseudo-channel parallelism** via stateless `pc_avail[N]` array with global round-robin chunking. Burst granularity tunable (`burst_bytes`, default 256B). Read and write share each PC's `available_at` (real HW command bus is per-PC shared). - **HBM direction switching penalty mechanism**: per-PC last-direction tracking + configurable `switch_penalty_ns`. Default 0 — see D2. - **Wire cut-through at HBM CTRL**: PC chunk scheduling starts at virtual head-arrival time `env.now - txn.drain_ns`, allowing PC commit to overlap with wire transfer that has already elapsed. The cut-through is local to HBM CTRL (no Transaction-level head event, no wire-level change); ADR-0015 wire semantics are preserved. ### D2. Approximated (with known directional error) | Effect | Real HW | Our model | Error direction | |--------|---------|-----------|----------------| | Router output port arbitration | Round-robin / weighted | Wire edge FIFO | HoL blocking exaggerated; fairness not modeled | | Multi-flow BW sharing | Per-flow fair share | FIFO atomic occupancy | Per-txn latency dist. differs; makespan correct | | HBM scheduler / write buffer | FR-FCFS + watermark drain | FIFO, no reordering | Switching penalty over-charged when alternations are dense — but default `switch_penalty_ns = 0` assumes ideal scheduler amortizes it (Tier 0) | | Flit/cycle granularity | Discrete flits @ cycle rate | Continuous nbytes | Sub-flit small-message noise | | Wire cut-through scope | Wormhole at every hop | Cut-through absorbed at HBM CTRL only | Intermediate hops still store-and-forward semantics; acceptable because component overheads at intermediate nodes are size-independent | ### D3. Ignored (out of scope) - Bank-level row buffer conflict penalty (assume no conflicts — best case; round-robin chunk assignment is address-blind so we cannot detect same-bank reuse). - HBM tRP / tRCD / tFAW / tRC timing constraints (absorbed into the steady-state `burst_time = burst_bytes / pc_bw_gbs`). - Refresh, ECC, thermal throttling, power gating. - Clock domain crossings, PLL lock time. - Flit-level discrete interleaving on links. - Upstream backpressure due to downstream buffer occupancy (input ports use unbounded `simpy.Store`). ### D4. Workload sensitivity Workloads where the above simplifications meaningfully affect results: - **Random scatter/gather**: bank conflict ignored → model optimistic. - **Heavy mixed R/W intensive** (e.g., GEMM bias accumulation): HBM scheduler absent. With default `switch_penalty_ns = 0` we assume ideal amortization; setting it non-zero models pessimistic per-alternation cost. - **High concurrency (>10 active flows on one link)**: HoL blocking and VC limits not modeled → model optimistic. - **Very small (sub-flit) transactions**: flit quantization noise. ### D5. Verification policy For workloads in D4, cross-check against real HW or a cycle-accurate simulator before drawing absolute-magnitude conclusions. The model remains accurate for **relative comparisons** within the modeled regime. ### D6. Future work - [ ] Bank-level conflict modeling (opt-in via `track_banks: true`). - [ ] HBM scheduler with write buffer + watermark drain (Tier 2 from the design discussion). - [ ] Fluid wire model for multi-flow router contention. - [ ] Wire-level cut-through at intermediate routers (currently destination HBM CTRL only). - [ ] Backpressure modeling for finite component buffers. ## Consequences - Single review point for all model fidelity questions. Each future PR touching latency must update the relevant section here. - Workload-specific magnitude error envelopes are explicit. - Builder-side derivation of `pc_bw_gbs = hbm_to_router_bw_gbs / num_pcs` enforces the ADR-0019 D9 invariant in code rather than relying on yaml manual consistency. ## Cross-references - ADR-0015 — component / port / wire model. - ADR-0019 — NoC and local HBM topology. - ADR-0004 — memory semantics, local HBM.