# ADR-0041: Cube SRAM Component Model — terminal scratchpad on cube NoC ## Status Accepted (2026-05-20). ADR-0017 (Cube NOC and HBM Connectivity) describes SRAM as a cube-NoC attachment but does not specify the SRAM component's own latency / response model. This ADR fills that gap. ## First action Inside `_worker`, immediately after pulling a Transaction off `_inbox`, the very first action is `yield from self.run(env, txn.nbytes)`. Inside `run()`, the component applies `env.timeout(node.attrs["overhead_ns"])` (default `0.0`). In short, **SRAM's first act is "express access overhead as simulator time"**. After overhead, the worker yields `drain_ns` (the terminal BW-serialization cost stamped on the Transaction) and then constructs and dispatches a `ResponseMsg` on the reverse path. This differs from a generic `ComponentBase._worker`: SRAM knows it is a **terminal node**, so it does not go through `_forward_txn`. Its own worker explicitly performs `run → drain → _send_response`. ## Context The cube topology (`topology/builder.py`) creates the following named nodes per cube: - `sip{S}.cube{C}.m_cpu` - `sip{S}.cube{C}.sram` - `sip{S}.cube{C}.hbm_ctrl` (per-PE partitions) - `sip{S}.cube{C}.pe{P}` (and its PE-internal sub-components) SRAM is one of the cube-NoC attachments — `topology/mesh_gen.py` assigns it to the nearest router by placement coordinates and adds `"sram"` to that router's `attach` list. The builder lays bidirectional `sram ↔ router` edges (BW: `sram_to_router_bw_gbs`, default `128.0 GB/s`). SRAM has two intertwined roles: 1. **Fabric terminal**: the endpoint for cube-NoC memory-access Transactions destined for SRAM. SRAM consumes access overhead + drain, then sends a response back on the reverse path. 2. **One of the IPCQ slot tiers**: ADR-0023 D9.7 defines `buffer_kind ∈ {tcm, sram, hbm}`; the `sram` tier's per-access cost is `(512.0 GB/s, 2.0 ns)` in `common/ipcq_types._BUFFER_KIND_BW`. This is separate from the SRAM node's `overhead_ns` attr; PE_DMA accounts for it directly at the IPCQ slot-write moment. Without an ADR covering both roles, the following questions are ambiguous: - "What latency does SRAM model?" — fabric drain + overhead, or the IPCQ tier slot latency? — answers scatter. - What does the `size_mb` (`32`) attr mean in the future? Currently it is not used; SRAM only models timing. - Which cube router does SRAM attach to? (placement-based; lives in topology code only.) ## Decision ### D1. SRAM is a terminal scratchpad node on the cube NoC `SramComponent` extends `ComponentBase` but overrides `_worker` to express terminal semantics directly: ``` while True: txn = yield self._inbox.get() yield from self.run(env, txn.nbytes) # overhead_ns if drain_ns > 0: yield env.timeout(drain_ns) yield from self._send_response(env, txn) ``` This pattern is necessary because SRAM must know the reverse path; the generic `_forward_txn` (which forwards to the next hop) does not fit a terminal. #### D1.1. Currently dormant — the `_worker` override is an unused path At the time of writing, **no component actually sends a Transaction to the SRAM node**. The verified references to the SRAM node ID are: - `policy/routing/router.py` and friends — guarantee path lookups. - `components/builtin/pe_dma.py::_handle_ipcq_inbound` — for `buffer_kind == "sram"`, computes the *path* to `bank_node = f"{cube_prefix}.sram"` via `compute_drain_ns(path, ...)` and yields a **local** timeout. The Transaction itself does not flow to the SRAM node (see D4). - `tests/test_routing.py` — checks connectivity via `find_path("sip0.cube0.pe0", "sip0.cube0.sram")`. So the `_worker` / `_send_response` override is currently a **dormant code path**. It is preserved deliberately: - Topology changes that route fabric Transactions to SRAM terminally (e.g., explicit M_CPU → SRAM accesses) would activate it immediately. - ADR-0017's "cube-attached scratchpad" semantics naturally implies terminal behavior; the override is an intentional placeholder. A future ADR (or a revision to this one) will mark dormancy resolved when an actual sender is added. ### D2. ResponseMsg construction and reverse-path dispatch `_send_response`: 1. `reverse_path = list(reversed(txn.path))` — derive the reverse path. 2. Construct `ResponseMsg(correlation_id=txn.request.correlation_id, request_id=..., src_cube=, src_pe=-1, success=True)`. 3. Wrap in `Transaction(request=resp_msg, path=reverse_path, step=0, nbytes=0, done=env.event(), is_response=True)` and put on `out_ports[reverse_path[1]]`. 4. If the reverse path is too short (`< 2 hops`) or `ctx` is absent, fall back to calling the original `txn.done.succeed()`. `src_pe = -1` means "SRAM is not PE-localized". `src_cube` is parsed from the node ID (`sip{S}.cube{C}.sram`). ### D3. Timing parameters: `overhead_ns` and wire-side `drain_ns` - **Component-side latency**: `node.attrs["overhead_ns"]`. Default topology uses `2.0 ns`. - **Link-side serialization**: `drain_ns` arrives stamped on the Transaction — the wire-side BW serialization result from ADR-0015. SRAM only yields it. - The `size_mb` (default `32 MiB`) attr is currently timing-neutral. If a capacity-aware model is added in the future, a separate ADR will give it meaning. ### D4. IPCQ slot accounting is not modeled by the SRAM component Per ADR-0023 D9.7, the IPCQ slot-write latency for the SRAM tier is incurred inside PE_DMA's `_handle_ipcq_inbound`, which calls `slot_io_latency_ns("sram", nbytes)` using `_BUFFER_KIND_BW["sram"]`. That is: - When SRAM receives a fabric Transaction (D1, D2, D3 apply), it processes normally. - When an IPCQ slot lives on SRAM, PE_DMA pays the slot-write time directly — independent of the SRAM component. This separation is intentional: IPCQ is a fast path (sub-cycle slot bookkeeping) and does not traverse fabric Transactions, so SRAM does not need to know about IPCQ. ### D5. SRAM's cube-NoC attachment is placement-driven `topology/mesh_gen.py` reads `placement.sram.pos_mm` (default `[1.5, 9.0]` in `topology.yaml`) and adds `"sram"` to the nearest router's `attach`. The builder (`topology/builder.py`'s attachment loop) then lays bidirectional edges between the `sram` node and that router. This decision lives outside the SRAM component (mesh_gen / builder); the component does not know which router it sits on. It only relies on `txn.path` / `reverse_path` to reach it via a router. ### D6. SRAM is not a data store (timing only) Same context as ADR-0040 D6: the SRAM component models time only; the data payload (if any) lives in sim_engine's `memory_store`. ## Alternatives Considered ### A1. Use `_forward_txn` and route responses via separate nodes (à la IO_CPU / HBM_CTRL) Rejected. SRAM is a terminal on the cube NoC; adding a response node would introduce meaningless hops and violate ADR-0017's simplification spirit. ### A2. Model BW serialization inside SRAM with its own resource Rejected. Wire-side BW serialization (`drain_ns`) already captures it. An internal `simpy.Resource` would double-count against ADR-0015 (port/wire model). ### A3. Handle IPCQ slot accounting in the SRAM component Rejected. As D4 makes explicit, IPCQ is a fast path that does not traverse fabric Transactions. If SRAM knew about IPCQ, the responsibility would split across two places and obscure reasoning. ### A4. Capacity-aware latency from `size_mb` Rejected for now. The capacity is currently a visualizer label; introducing a capacity-aware timing model requires a dedicated ADR. ## Consequences - SRAM's timing model is pinned at ADR level as `overhead_ns + drain_ns + ResponseMsg(reverse_path)`. Any proposal to push IPCQ slot latency into the SRAM component can be refused with D4. - D3 records that `size_mb` is timing-neutral today, so a future capacity-aware model has a narrow compatibility scope. - D5 documents the placement-driven attachment, so changes to the SRAM coordinate have a clearly bounded impact (`mesh_gen` only).