Files
kernbench2/docs/adr/ADR-0025-algo-ipcq-direction-addressing.md
ywkang a796c1d2f7 ADR: bilingual structure — EN canonical in adr/, KO mirror in adr-ko/
Establish English as the canonical ADR language with Korean translations
held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror).
Promotion from adr-proposed/ to adr/ now writes English to adr/ and the
Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md.

- Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English,
  2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix
  dropped). ADR-0023 EN regenerated against KO source which had newer
  HW Realization Notes (D16-D23) section.
- docs/adr-history/ left frozen by design (transitional state).
- CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark
  docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline
  section covering bidirectional sync, conflict resolution (EN wins),
  and proposed-language freedom.
- tools/verify_adr_lang_pairs.py: new verification tool checking pair
  completeness, filename mirroring, ADR-ID match, Status byte-equality.
  Pre-commit hook intentionally not added; run on demand or in CI.
- tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF
  normalization, em-dash title separator, underscore-slug edge case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:38:44 -07:00

310 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0025: IPCQ Direction Addressing — address-based matching
## Status
Accepted (Revision 2 — Address-based matching; peer_direction field dropped)
## Context
### Goal
In the IPCQ protocol of ADR-0023, make the **identification of "which
direction pair this transfer belongs to"** consistent and **address-based**,
without depending on topology / dict-order. It must work correctly in a
2-rank bidirectional ring (and more generally in any topology where
multiple directions point to the same peer).
### The bug surfaced — 2-rank bidirectional ring
`ring_1d(rank, world_size=2)``{"E": 1, "W": 1}` (rank 0). Both directions
point to the same peer.
**Bug 1 (install)**:
- `reverse_direction(0, 1)` → returns "E" by dict order (wrong; "W" is the
correct answer — opposite-direction convention)
- rank 0's E entry is set with `peer.rx_base_pa = rx_base(sip1, cube0, pe0, d="E")`
- tl.send(E) → data lands in sip1's E-rx buffer (should be W-rx)
**Bug 2 (runtime)**:
- Even if install set up the correct address, the receiver's
`_handle_meta_arrival` matches direction by sender coordinates only → the
first direction (E) wins
- peer_head_cache[E] is incremented; peer_head_cache[W] is unchanged
- The kernel's tl.recv(W) waits on peer_head_cache[W] → blocks forever →
IpcqDeadlock
### Root cause
The same issue along two axes:
1. **Install-time pairing**: deciding "which of my directions pairs with
which direction of the peer" depends on dict-iteration-order → fragile
when multiple directions point to the same peer
2. **Runtime identification**: deciding "which qp should be updated" is
based on sender coordinates alone → ambiguous when directions are
duplicated
### Solution direction — address-based matching
Each PE's rx buffer sits at a **unique address range per direction**
(rx_base_pa + direction_idx × bytes_per_direction). Therefore:
- **Runtime**: match by **dst_addr range** instead of sender coord →
unambiguous
- **Install**: prefer the opposite direction as a heuristic (the natural
symmetry of ring / mesh)
- No need for redundant metadata like `peer_direction` — **address is the
single source of truth**
This design works **independently of the PhysAddr transition (ADR-0030)**.
Whether the current addresses are synthetic or PhysAddr, the same approach
applies as long as the per-direction range uniqueness is preserved.
---
## Decision
### D1. Install — `reverse_direction` opposite-preference
`src/kernbench/ccl/install.py`:
```python
# Extended in ADR-0032 with global_* pairs for inter-SIP directions,
# which were introduced by configure_sfr_intercube_multisip to keep
# intercube (N/S/E/W) and inter-SIP (global_N/S/E/W) namespaces disjoint.
_OPPOSITE_DIR = {
"E": "W", "W": "E", "N": "S", "S": "N",
"global_E": "global_W", "global_W": "global_E",
"global_N": "global_S", "global_S": "global_N",
}
def reverse_direction(my_rank: int, peer_rank: int, my_dir: str) -> str | None:
"""Find peer's direction that reciprocates my_dir→peer_rank.
Prefer the OPPOSITE direction (E↔W, N↔S) when the peer has it
pointing back to us. This matters in 2-rank bidirectional rings
where both E and W on one side point to the same peer — without
the preference, the first-match-wins iteration would route data
into the wrong rx slot. Falls back to any direction pointing back
for topologies without an opposite convention (tree_binary's
parent/child).
"""
nt = neighbor_table[peer_rank]
opp = _OPPOSITE_DIR.get(my_dir)
if opp is not None and nt.get(opp) == my_rank:
return opp
for d, target in nt.items():
if target == my_rank:
return d
return None
```
Call site:
```python
for d, peer_rank in nbrs.items():
peer_dir = reverse_direction(r, peer_rank, d) # pass my_dir
if peer_dir is None:
continue
...
```
### D2. Runtime — `_handle_meta_arrival` dst_addr matching
`src/kernbench/components/builtin/pe_ipcq.py`:
```python
def _handle_meta_arrival(self, msg: IpcqMetaArrival) -> None:
"""Match incoming token to the receiver-side direction by dst_addr range.
Each direction has a unique rx buffer address range
(my_rx_base_pa + n_slots * slot_size). The token's dst_addr (set by
the sender's IPCQ when computing peer's slot address) falls within
exactly one such range. This address-based matching is unambiguous
even when multiple directions have the same peer (2-rank ring).
"""
token = msg.token
dst_addr = token.dst_addr
for d, qp in self._queue_pairs.items():
base = qp["my_rx_base_pa"]
size = qp["n_slots"] * qp["slot_size"]
if base <= dst_addr < base + size:
qp["peer_head_cache"] = max(qp["peer_head_cache"],
token.sender_seq + 1)
self._arrived_tokens.setdefault(d, []).append(token)
waiters = self._recv_waiters.get(d, [])
self._recv_waiters[d] = []
for ev in waiters:
if not ev.triggered:
ev.succeed()
any_waiters = self._any_recv_waiters
self._any_recv_waiters = []
for ev in any_waiters:
if not ev.triggered:
ev.succeed()
return
# Unknown dst_addr — diagnostic log (should not happen under correct install)
```
The sender-coordinate check is **removed**. `dst_addr` already determines
the direction.
### D3. Credit — add `dst_rx_base_pa` field
`src/kernbench/common/ipcq_types.py`:
```python
@dataclass(frozen=True)
class IpcqCreditMetadata:
consumer_seq: int
dst_rx_base_pa: int # NEW: matches the original sender's peer.rx_base_pa
# Existing fields (kept for diagnostic / logging purposes)
src_sip: int
src_cube: int
src_pe: int
src_direction: str
```
When the credit is generated (`_delayed_credit_send`): it carries this
direction's `my_rx_base_pa` as `dst_rx_base_pa` (this is the
`peer.rx_base_pa` the other side used when it was the sender).
Receiver side (`_credit_worker`):
```python
def _credit_worker(self, env):
while True:
credit = yield self._credit_inbox.get()
for d, qp in self._queue_pairs.items():
# Find the qp whose peer rx_base_pa matches the credit's dst_rx_base_pa
if qp["peer"].rx_base_pa == credit.dst_rx_base_pa:
qp["peer_tail_cache"] = max(qp["peer_tail_cache"],
credit.consumer_seq)
waiters = self._send_waiters.get(d, [])
self._send_waiters[d] = []
for ev in waiters:
if not ev.triggered:
ev.succeed()
break
```
Sender-coordinate check removed. Matching by `dst_rx_base_pa` is
unambiguous.
### D4. Do **not** add a `peer_direction` field to `IpcqInitEntry`
The `IpcqInitEntry.peer_direction` proposed in ADR-0025 rev 1 is
**unnecessary**. Reasons:
- Meta arrivals are matched by dst_addr (D2)
- Credits are matched by dst_rx_base_pa (D3)
- No need to store peer_direction on qp
- Install only uses peer_dir internally when computing rx_base_pa
(`reverse_direction`)
No change to the IpcqInitEntry schema. **Simpler** than rev 1.
### D5. Keep `IpcqDmaToken.src_direction` (diagnostic only)
The existing `src_direction` field is not removed. It is retained for:
- Logging / trace: the `(rank, t, dir, nbytes)` output of
`KERNBENCH_CCL_TRACE=1`
- Diagnostics: showing direction in pointer_dump, etc.
- Room for future extension
Runtime matching uses only `dst_addr`.
### D6. Invariants (strengthens ADR-0023 I3)
**I3 (strict)**: For each direction pair `(my_direction, peer_direction)`,
my rx_base and peer rx_base must point to **distinct direction slots**.
Install must guarantee this (reverse_direction opposite-preference).
**I3.1 (new)**: For every qp, `qp["my_rx_base_pa"]` and
`qp["peer"].rx_base_pa` occupy mutually disjoint address ranges (buffers
of different directions never overlap). This is the prerequisite for the
address-based matching of D2/D3.
Verifiable at install time:
```python
# ccl/install_plan.py: assertion at the end of build_install_plans
all_rx_ranges = set()
for plan in plans:
for pe_install in plan.pe_installs:
for entry in pe_install.neighbors:
r = (entry.my_rx_base_pa,
entry.my_rx_base_pa + plan.n_slots * plan.slot_size)
overlap = any(_ranges_overlap(r, e) for e in all_rx_ranges)
assert not overlap
all_rx_ranges.add(r)
```
---
## Dependencies
- **ADR-0023** (IPCQ protocol): this ADR modifies ADR-0023's runtime
matching logic (D2, D3) and improves the install heuristic (D1). No
change to the IPCQ protocol's semantic layer.
- **ADR-0024** (launcher): the case where a 2-rank bidirectional ring is
actually used is the ws=SIP_count model of ADR-0024. This ADR makes that
case work.
- **ADR-0030** (PhysAddr transition, stub): **independent** — ADR-0025's
address-based matching works identically whether the current addresses
are synthetic or PhysAddr.
---
## Non-goals
- **Migrating IPCQ addressing to PhysAddr**: ADR-0030 scope. This ADR is
agnostic to how addresses are encoded.
- **Multi-hop routing**: the single-hop DMA write assumption of ADR-0023
D5 still holds.
- **Unidir ring specialization**: `ring_1d_unidir` only has a single
direction, so the bug does not apply.
---
## Open questions
- **Address-matching performance**: `_handle_meta_arrival` and
`_credit_worker` iterate qp linearly (max 4 directions). The performance
impact is negligible. If it becomes an issue, this can be switched to a
dict lookup (`_qp_by_rx_base`).
- **Re-evaluating the need for `IpcqDmaToken.src_direction`**: whether to
keep this field, which is only kept for diagnostics, or to split it out
of logging. Currently retained.
- **Cost of install-time invariant verification**: the I3.1 verification
of D6 is O(N_PE × N_direction)^2. It could be slow on large topologies
→ improvable via data structures such as interval trees. Simple
implementation first.
---
## Consequences
### Positive
- **Simplicity**: redundant `peer_direction` metadata removed. Address is
the single source of truth.
- **Unambiguous matching**: works on every topology (including duplicate
directions).
- **Minimal schema changes**: `IpcqInitEntry` unchanged, one field added
to `IpcqCreditMetadata`.
- **Independent of PhysAddr transition (ADR-0030)**: address-based matching
is agnostic to the address encoding.
- **Diagnostics retained**: `IpcqDmaToken.src_direction` is kept for
logging.
### Negative
- Runtime matching is now by address comparison, so when debugging
questions like "why did peer_head_cache[W] update rather than [E]" one
has to follow the address range (previously the direction name was
enough). Mitigation: include a "direction ↔ rx_base_pa" mapping in
pointer_dump.
### Neutral
- The semantic layer of the IPCQ protocol (sender computes dst_addr,
receiver receives) is unchanged.