# ADR-0040: PE_TCM Component Model — Dual-Channel BW Serialization ## Status Accepted (2026-05-20). ADR-0014 (PE Pipeline Execution Model, D1) references PE_TCM as a "BW-based serialized scratchpad memory" but does not pin down the component's own model. This ADR fills that gap. ## First action When `start()` is invoked, immediately create two `simpy.Resource(env, capacity=1)` instances and store them in `self._read_res` / `self._write_res`. These two resources are the single decision points that serialize the **read channel** and **write channel** to one in-flight request each. The runtime first action: `_worker` pulls a message off `_inbox` and branches by type: - `TcmRequest` (from `pe_fetch_store`): spawn `env.process(self._handle_tcm_request)`. Hence **TCM's first act is "acquire the lock matching the direction (read/write)"**. After lock acquisition, if `bw > 0 and nbytes > 0`, yield `env.timeout(delay_ns = nbytes / bw)`, then `req.done.succeed()`. - Anything else (Transaction): spawn `env.process(self._forward_txn)` (legacy fabric pass-through). At construction, `node.attrs["read_bw_gbs"]` and `node.attrs["write_bw_gbs"]` (default `512.0 GB/s` each) are captured and held. ## Context In the PE pipeline (ADR-0014 D1, D6), PE_TCM receives two kinds of traffic: 1. **`TcmRequest` from PE_FETCH_STORE** — when moving data between TCM and the register file, PE_FETCH_STORE sends a short sideband request to obtain BW-serialized access latency (`direction = "read"` or `"write"`, `nbytes`, `done` event). 2. **Legacy Transaction forwarding** — a fallback in case TCM ends up as a pass-through node on the fabric graph (not used by the current critical path, but preserved). The problem: ADR-0014 only says "BW-based serialization" without specifying: - Read and write are **independent channels** running in parallel; only same-direction concurrency serializes at `capacity=1`. - BW is split into two configurable values (`read_bw_gbs` / `write_bw_gbs`). - The formula is `delay_ns = nbytes / bw_gbs` (loose unit convention: GB/s × ns ≈ B). - `nbytes == 0` still acquires the lock but skips the BW term. - `run()`'s `overhead_ns` (default `0.0`) is only used in the legacy fabric forwarding path. Each of these requires an ADR. In particular, "why are read and write separate channels" and "who owns the BW values" must be documented so that future changes (e.g., `capacity=2`) have a clear basis. ## Decision ### D1. Dual channel — read and write are independent resources `_read_res = simpy.Resource(env, capacity=1)`, `_write_res = simpy.Resource(env, capacity=1)`. Same-direction concurrent requests queue on the resource and serialize; opposite-direction requests proceed in parallel. This matches the hardware model where TCM has a dual-port (read + write) configuration, and it allows the simulator to express the GEMM-pipeline case where fetch (read) and store (write) overlap in time — modeled as BW-serialized inside each direction but independent across directions. ### D2. Per-channel BW model — `nbytes / bw_gbs` After lock acquisition, if `nbytes > 0 and bw > 0`, yield `env.timeout(nbytes / bw_gbs)`. The unit convention is GB/s × ns ≈ B, consistent with the simulator-wide loose convention (see ADR-0033). - `nbytes == 0`: BW term is zero, but the lock is acquired and released. This is intentional: when a plan generator emits an empty fetch/store on the PE_FETCH_STORE side, the op_log / channel accounting on the TCM side still records one consumption. - `bw == 0` (config error): the timeout call is skipped (0-time pass). Should not occur with normal settings. ### D3. BW values come from `node.attrs.read_bw_gbs` / `write_bw_gbs` Defaults `512.0 GB/s`. The topology builder (`topology/builder.py`) passes these attrs when instantiating TCM from `pe_template`. Default changes should coincide with related decisions in ADR-0014 D1 or ADR-0033. ### D4. TcmRequest schema is owned by PE_TCM `@dataclass TcmRequest(direction: str, nbytes: int, done: simpy.Event, tag: str = "")` lives in `components/builtin/pe_tcm.py`. PE_FETCH_STORE imports the dataclass and only constructs/sends it. The caller does not define the schema because: - The meaning of BW serialization is TCM's responsibility — TCM decides which fields drive serialization. - The valid-value check for `direction` (must be `"read"` or `"write"`) lives in `_handle_tcm_request`'s if/else branch. ### D5. Legacy Transaction forwarding path is preserved When `_worker` receives a non-`TcmRequest` message, it dispatches to `_forward_txn`, applying `run()`'s `overhead_ns`. The current standard PE pipeline does not route Transactions through TCM, but the path is kept to avoid breakage if fabric topology changes. This path is accounted for via standard Transaction op_log; the BW channel locks are **not** acquired (orthogonal to D1's usage). ### D6. PE_TCM is not a data store (timing only) TCM models **time only**. The actual data payload is held by sim_engine's `memory_store` (when present); the TCM component never updates it. PE_FETCH_STORE obtains BW delay through `TcmRequest`, and register contents are handled separately in the data path (ADR-0020 2-pass data execution — Phase 2). ## Alternatives Considered ### A1. Single channel (`capacity=2` for shared read+write) Rejected. Would artificially serialize the normal-case overlap of fetch (read) and store (write) and yield an incorrect BW upper bound for the PE pipeline. ### A2. `capacity > 1` (e.g., 2-banked TCM) Rejected. Current hardware model assumes a single bank. Multi-bank extension needs its own ADR that would supersede D1. Bumping capacity now would loosen the nominal serialization without raising the BW upper bound, producing less accurate modeling. ### A3. Generalize BW formula to `nbytes / bw + overhead_ns` Rejected. `overhead_ns` is reserved for the legacy forwarding path (D5). Additional fetch/store-path overhead, if needed, belongs in PE_FETCH_STORE's `run()` or in a register-file access model — closer to the responsibility boundary. ## Consequences - TCM's BW accounting is locked at ADR level. Questions arising from op_log in GEMM/Math sweeps — "why did fetch and store overlap?", "why do only same-direction requests serialize?" — resolve quickly to D1. - Future multi-bank TCM models or asymmetric read/write BW changes have a clear blast radius (D1 / D2 / D3 — pick one). - D6 ("TCM is not a data store") sharpens the responsibility boundary with ADR-0020 2-pass execution.