# ADR-0038: PCIE_EP Component Model ## Status Accepted (2026-05-20). Companion to ADR-0035 (M_CPU), ADR-0036 (IO_CPU), and ADR-0037 (Forwarding) at the same component-model level. ## First action Pull one Transaction from `_inbox` and let `_forward_txn` invoke `run()`, which applies a single `env.timeout(node.attrs["overhead_ns"])` for PCIe protocol handling. After that the standard `ComponentBase` worker rules take over: if `next_hop` exists, put the advanced Transaction on `out_ports[next_hop]`; otherwise consume `drain_ns` and call `txn.done.succeed()`. In other words, **PCIE_EP's first (and only) act is to spend the configured overhead as simulator time** — no routing decisions, no payload transformation, no MMIO decoding. ## Context PCIE_EP is the **host ↔ device boundary** in the topology graph. The builder (`topology/builder.py`) creates an IO chiplet instance per SIP that contains `pcie_ep`, `io_cpu`, and `io_noc`, and lays bidirectional edges between the external `fabric.switch0` and each `pcie_ep`: - `switch → pcie_ep`: host → device traffic (MemoryWrite, MemoryRead, KernelLaunch). - `pcie_ep → switch`: device-side outbound (e.g., cross-SIP IPCQ tokens). Inside the IO chiplet there are bidirectional `pcie_ep ↔ io_noc` edges, and from there traffic branches to `io_cpu` or to the cube-side `hbm_ctrl` path (see ADR-0036 IO_CPU model). The router and resolver already know — per SPEC R7 — that PCIE_EP is the endpoint for memory operations, so helpers like `find_pcie_ep(sip)` and `find_memory_path(pcie_ep, dst_node)` treat PCIE_EP as the start (or end) of the memory path. The problem is that all of this dependency lives in builder/router/resolver, while **PCIE_EP's own internal model has no ADR**. The consequence: - "What latency does PCIE_EP model?" requires reading the source. - The asymmetry with peer components (IO_CPU = ADR-0036, M_CPU = ADR-0035) is awkward. - Future decisions about a more detailed PCIe link-layer model (TLP credits, retry, MPS chunking) lack a documented baseline. This ADR pins down the current **thin PCIE_EP model** and records that this thinness is intentional (aligned with ADR-0033's latency-model simplification policy). ## Decision ### D1. PCIE_EP uses ComponentBase's generic forwarding worker as-is `PcieEpComponent` extends `ComponentBase` and does **not** override `_worker` or `_forward_txn`. Every Transaction flows through the standard sequence: 1. `_fan_in` accumulates inbound messages (and reassembles Flits, per ADR-0033 Phase 2c) into `_inbox`. 2. `_worker` pulls one message off `_inbox` and spawns `env.process(self._forward_txn(env, txn))` for per-message pipelining. 3. `_forward_txn` calls the op_log start hook → `run()` for latency → op_log end hook. 4. `run()` is a single line: `yield env.timeout(overhead_ns)`. 5. If a next hop exists, `out_ports[next_hop].put(txn.advance())`. Otherwise (terminal arrival) consume `drain_ns` and call `txn.done.succeed()`. ### D2. The only timing parameter is `overhead_ns` Only `node.attrs["overhead_ns"]` is accepted as a latency parameter. The code default is `0.0`; `topology.yaml`'s IOChiplet `components.pcie_ep.attrs` supplies the real value (current topology: `overhead_ns: 5.0` ns). No separate BW-serialization resource (`simpy.Resource`), no queue depth, no retry model is introduced. Link-level BW serialization is handled wire-side — inside the IOChiplet by `pcie_ep_to_noc_bw_gbs = 256.0 GB/s`, and externally by the system's `io_ep_to_switch` link BW (ADR-0015 port/wire model). PCIE_EP itself takes no part in that accounting. ### D3. PCIE_EP is direction-aware in topology but direction-blind in code The builder lays both `switch ↔ pcie_ep` and `pcie_ep ↔ io_noc` edges, so PCIE_EP serves: - inbound (host → device): forward Transactions arriving from the switch onto io_noc-side next-hop. - outbound (device → host): forward Transactions arriving from io_noc/io_cpu back to the switch. Both are handled by D1's generic forwarding worker; the component code never distinguishes direction (it just follows `txn.next_hop`). ### D4. PCIE_EP is not Flit-aware (legacy reassembly path) `_FLIT_AWARE` is left at the inherited `False`, so `_fan_in` reassembles upstream-chunkified Flits into the parent Transaction before delivery to `_inbox` (aligned with ADR-0033 Phase 2c incremental rollout). A future PCIe TLP-level credit model would revisit D4. ### D5. PCIE_EP is a **named node** for routing helpers `policy/routing/router.py` provides `find_pcie_ep(sip, io_id="io0")`, `find_all_pcie_eps()`, and `find_memory_path(pcie_ep, dst_node)` — all of which treat PCIE_EP as the start (or end) of the memory path. The component itself supplies no information to these helpers; the naming convention (`sip{S}.{io_id}.pcie_ep`) is guaranteed by the topology builder. ## Alternatives Considered ### A1. Full PCIe TLP-level model (credits, retry, MPS chunking) Rejected. Violates ADR-0033's "current latency model = abstract overhead + BW serialization" simplification. Host↔device protocol fidelity is explicitly out-of-scope in SPEC §5 "Non-Goals". ### A2. Per-PCIE_EP `simpy.Resource` for in-flight cap Rejected. Host traffic is not a contention bottleneck in current workloads. Defer to a separate ADR if it becomes one (in which case D1 stays and D2 is extended). ### A3. Merge PCIE_EP into IO_CPU Rejected. PCIE_EP is the protocol-boundary node first hit on the host side; IO_CPU is the device-side control-plane processing node (ADR-0036). Traffic fan-out and command decoding costs concentrate in IO_CPU, while PCIE_EP only expresses link-edge overhead. Merging them would mix two responsibilities and violate the spirit of ADR-0007 (runtime API/sim_engine boundaries). ## Consequences - PCIE_EP gets an explicit model ADR despite having near-zero code — consistent with peer component ADRs, lower maintenance friction. - Future PCIe-level refinement supersedes by extending D2/D4 in a new ADR. - D5 makes the named-node dependency explicit, so any future renaming of component IDs has a clearly bounded blast radius.