Files
kernbench2/docs/adr-ko/ADR-0037-dev-forwarding-component.md
T
ywkang a796c1d2f7 ADR: bilingual structure — EN canonical in adr/, KO mirror in adr-ko/
Establish English as the canonical ADR language with Korean translations
held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror).
Promotion from adr-proposed/ to adr/ now writes English to adr/ and the
Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md.

- Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English,
  2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix
  dropped). ADR-0023 EN regenerated against KO source which had newer
  HW Realization Notes (D16-D23) section.
- docs/adr-history/ left frozen by design (transitional state).
- CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark
  docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline
  section covering bidirectional sync, conflict resolution (EN wins),
  and proposed-language freedom.
- tools/verify_adr_lang_pairs.py: new verification tool checking pair
  completeness, filename mirroring, ADR-ID match, Status byte-equality.
  Pre-commit hook intentionally not added; run on demand or in CI.
- tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF
  normalization, em-dash title separator, underscore-slug edge case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:38:44 -07:00

201 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0037: Forwarding Component (forwarding_v1)
## Status
Accepted
## Context
The simulation graph has many node positions that exist purely to model
fabric traversal — NOC mesh routers, switches, UCIe protocol endpoints,
IO chiplet io_noc, transit cubes. These share a common pattern: receive
a message, apply per-component overhead (modeling header decode +
routing decision time), forward to the next hop along the pre-computed
path.
This ADR defines the contract for these transit nodes: a single
component type (`TransitComponent`) that handles flit-aware forwarding
with wormhole cut-through semantics, used under multiple impl names
according to the conceptual role each instance plays.
## Decision
### D1. Role
The Forwarding component (`TransitComponent` class) is a **stateless
transit node** in the simulation graph. It models any fabric position
where a message physically traverses but no semantic processing
happens.
Per traversal, the component:
1. Reads an incoming Transaction or Flit from an `in_port`.
2. Applies the configured per-component overhead (`overhead_ns`),
applied **once per Transaction** even across multi-flit payloads
(see D2).
3. Looks up the next hop along the Transaction's pre-computed `path`.
4. Forwards to the corresponding `out_port`; at the terminal node
(no next hop), signals `txn.done` once the `is_last` flit arrives.
The component **does NOT**:
- Decide routing — paths are pre-computed by the router (ADR-0002 /
ADR-0017 D2). Forwarding only executes the per-hop step.
- Model wire propagation or bandwidth occupancy — separate wire
processes between components handle that (ADR-0015 D2).
- Resolve addresses — the AddressResolver does that (ADR-0017 D9).
- Aggregate completion — terminal endpoints (IO_CPU, M_CPU, HBM_CTRL)
handle that.
### D2. First-flit overhead model (header decode)
Per-Transaction `overhead_ns` is applied **exactly once**, at first
flit arrival:
- `_txn_decoded: set[int]` tracks which Transactions have already
paid the overhead at this node.
- On first-flit arrival for a Transaction: `yield self.run(env,
msg.txn.nbytes)` — pays the overhead.
- Subsequent flits of the same Transaction skip the overhead — they
pipeline through with no extra delay.
- On `is_last` flit: remove the Transaction from `_txn_decoded`.
This models the real-HW behavior where header decode and routing
decision happen once on first flit; payload flits then stream through
the same path (wormhole cut-through). Multi-hop pipelining emerges
naturally — each hop adds its own first-flit overhead, but flits
after the first do not re-pay overhead at any hop they have already
passed first.
### D3. Serial worker forwarding (preserves order)
The component's worker is a single SimPy process that consumes flits
from `_inbox` and forwards them serially in arrival order. The
component does NOT spawn `env.process(...)` per flit.
Rationale: if the first flit yields on `overhead_ns` while subsequent
flits run in parallel processes, the later flits can overtake the
first. This produces out-of-order delivery and lets the `is_last`
flit arrive at the destination before the first flit — corrupting
both the transaction's completion semantics and any flit-index-based
processing downstream.
### D4. Path-based next-hop routing
Routing is **not** a Forwarding-component concern. The Transaction
arrives with a pre-computed `path` (built by the router; ADR-0002 /
ADR-0017 D2). The component just looks up its own position in the
path and forwards to `path[index + 1]`:
```python
def _next_hop_in_path(self, txn):
my_id = self.node.id
path = txn.path
for i, n in enumerate(path):
if n == my_id and i + 1 < len(path):
return path[i + 1]
return None
```
If `next_hop` is found and present in `out_ports`, the flit is
forwarded. Otherwise (terminal node), `txn.done.succeed()` is
invoked when the `is_last` flit arrives.
### D5. Flit-aware mode with Non-Flit fallback
`_FLIT_AWARE = True` opts this component out of the base class's
flit-reassembly logic in `_fan_in`. Flits are placed directly on
`_inbox` (no reassembly), enabling per-flit handling in the worker
loop (D2, D3).
Non-Flit messages — zero-byte control Transactions and other
non-chunkified payloads — fall through to the base class's legacy
`_forward_txn` path via `env.process`. This preserves backward
compatibility for control-plane traffic that does not benefit from
flit-level processing.
### D6. Multi-stream merging at the base class
Multi-stream FIFO merging at routers is the base class's
responsibility, not Forwarding's. The base class's `_fan_in` spawns
one process per `in_port`; all push to a single shared `_inbox`.
Flits from different upstream streams therefore interleave at
flit granularity in `_inbox`'s FIFO order.
The Forwarding worker simply consumes `_inbox` in arrival order —
correctly modeling per-router multi-flow arbitration as
fair-FIFO over the shared inbox.
### D7. Single implementation under multiple impl names
A single `TransitComponent` class is registered under four impl names
in `components.yaml`:
- `builtin.forwarding` — generic forwarding (e.g., `io_noc`,
`noc_router`, UCIe conn bridges)
- `builtin.switch` — tray-level switch
- `builtin.noc` — cube-level NOC fabric (legacy singleton; current
NOC routers use `builtin.forwarding`)
- `builtin.ucie` — UCIe protocol endpoint
All four aliases instantiate the same class with the same behavior.
Per-instance differentiation lives only in `attrs.overhead_ns`.
Separate impl names exist as intent tags for readability and to
allow future divergence without backward-incompatible config
changes.
### D8. Configurable `overhead_ns`
A single attribute drives per-instance latency:
| Usage site | impl name | overhead_ns |
| --- | --- | --- |
| Tray-level switch | `builtin.switch` | 5.0 |
| Cube NOC router | `builtin.forwarding` | 2.0 |
| IO chiplet io_noc | `builtin.forwarding` | 0.0 |
| UCIe protocol endpoint (`ucie-{N,S,E,W}`) | `builtin.ucie` | 8.0 |
| UCIe conn bridge (`ucie-{PORT}.conn{N}`) | `builtin.forwarding` | 0.0 |
Default is 0.0. The attribute is read at each `run()` invocation, so
dynamic reconfiguration is possible but not currently used.
## Consequences
### Positive
- A single class handles all transit-node roles in the simulation
graph — minimal code surface for a high-population component type.
- Flit-aware processing + serial worker preserves wormhole semantics
across multi-hop paths without per-flit process overhead.
- `overhead_ns` is the only per-instance tunable; routing, BW, and
address resolution stay cleanly separated in their own components /
modules.
- Multi-stream merging emerges from the base-class structure; no
router-specific logic duplicates fair-FIFO arbitration.
- Non-Flit fallback path keeps control-plane traffic working without
forcing every message into the flit framework.
### Negative
- The single class hides usage-site intent inside `attrs.overhead_ns`
configuration; readers must consult `topology.yaml` +
`components.yaml` to see which impl name maps to which behavior
class.
- Per-flit serial worker is a bottleneck if `overhead_ns` is large
and many concurrent transactions arrive at the same router; current
values (08 ns) make this negligible.
## Links
- ADR-0002 (Routing distance — path computation)
- ADR-0015 D1 (Component port model)
- ADR-0015 D2 (Wire process — BW + propagation, separate from this
component)
- ADR-0015 D6 (Transit cube forwarding pattern)
- ADR-0016 D1 (IO chiplet io_noc — uses this component)
- ADR-0017 D1 (Cube NOC routers — use this component)
- ADR-0017 D6 (UCIe decomposition — `ucie-{PORT}` instances use this
component)
- ADR-0033 D1 (Flit-aware pass-through, first-flit overhead,
multi-stream merge semantics)