ADR-0033 D6: clarify what multi-flow merging actually models

Earlier the future-work list mentioned "multi-flow fair sharing on a
single shared link" which was confusing — each wire has a single
source, so this isn't a real gap. The actual modeling story:

- Multi-stream merging at routers IS handled via per-in_port fan_in +
  shared inbox + FIFO worker forwarding. Flits from different
  upstream streams interleave at flit granularity naturally.
- What's NOT modeled: cycle-accurate arbitration policies (priority,
  iSLIP), address-based PC selection at HBM CTRL (round-robin is
  address-blind, so size-aligned concurrent transactions hit full
  PC contention even when real-HW address striping would diverge),
  sub-flit (32B) granularity, finite buffer backpressure, and bank
  conflict modeling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 23:18:19 -07:00
parent c6788788a4
commit 9beb140eaa
+33 -12
View File
@@ -102,18 +102,39 @@ accurate for **relative comparisons** within the modeled regime.
### D6. Future work
- [ ] Bank-level conflict modeling (opt-in via `track_banks: true`).
- [ ] HBM scheduler with write buffer + watermark drain (Tier 2 from the
design discussion).
- [ ] Fluid wire model for multi-flow fairness on a single shared link
(currently FIFO serial).
- [ ] Sub-flit (32B) granularity for cycle-accurate wire arbitration.
- [ ] Backpressure modeling for finite component buffers.
- [ ] Op_log integration with chunk-streaming (currently op_log fires on
PE-internal command messages — DmaReadCmd, DmaWriteCmd, GemmCmd,
MathCmd — which are not chunkified; integration would require
flit-aware components to also emit op_log start/end hooks per
transaction).
Note: multi-stream merging at routers IS modeled correctly — each
in_port has its own fan_in process, all push to a shared inbox, and
the router worker forwards in inbox FIFO order. Flits from different
upstream streams naturally interleave at flit granularity. The items
below are different concerns.
- [ ] **Cycle-accurate router arbitration policies** (RR with
priorities, age, iSLIP). Currently the inbox FIFO order is used as
a proxy for fair RR — works when flit arrival times differ slightly
between streams, but doesn't reflect intentional priority/QoS.
- [ ] **Sub-flit (32B) granularity** for finer wire arbitration
cycles. Our `flit_bytes` equals burst (256B); real HW arbitrates
per 32B flit. Effect is small for most workloads (sub-flit timing
noise).
- [ ] **Address-based PC selection at HBM CTRL** (replace the
address-blind global round-robin). When two transactions of size
`num_pcs × burst_bytes` (e.g., 2KB at 8 PCs × 256B) arrive
concurrently, both claim PCs 0..7 via global RR, producing full
per-PC contention. Real HW uses address bits to select PCs, so
different-address transactions hit different PC patterns. Address
modeling would let the simulator reflect cache-line/page-aware
layouts.
- [ ] **Bank-level conflict modeling** within a PC (opt-in via
`track_banks: true`). Currently we assume no same-bank reuse.
- [ ] **HBM scheduler** with write buffer + watermark drain (Tier 2
from the design discussion). Default `switch_penalty_ns=0` is the
ideal-amortization stand-in.
- [ ] **Backpressure** modeling for finite component buffers.
- [ ] **Op_log integration with chunk-streaming**: currently op_log
fires on PE-internal command messages (DmaReadCmd, DmaWriteCmd,
GemmCmd, MathCmd) which are not chunkified. Integration would
require flit-aware components to also emit op_log start/end hooks
per transaction (start on first flit, end on is_last).
## Consequences