a796c1d2f7
Establish English as the canonical ADR language with Korean translations held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror). Promotion from adr-proposed/ to adr/ now writes English to adr/ and the Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md. - Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English, 2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix dropped). ADR-0023 EN regenerated against KO source which had newer HW Realization Notes (D16-D23) section. - docs/adr-history/ left frozen by design (transitional state). - CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline section covering bidirectional sync, conflict resolution (EN wins), and proposed-language freedom. - tools/verify_adr_lang_pairs.py: new verification tool checking pair completeness, filename mirroring, ADR-ID match, Status byte-equality. Pre-commit hook intentionally not added; run on demand or in CI. - tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF normalization, em-dash title separator, underscore-slug edge case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
310 lines
11 KiB
Markdown
310 lines
11 KiB
Markdown
# ADR-0025: IPCQ Direction Addressing — address-based matching
|
||
|
||
## Status
|
||
|
||
Accepted (Revision 2 — Address-based matching; peer_direction field dropped)
|
||
|
||
## Context
|
||
|
||
### Goal
|
||
|
||
In the IPCQ protocol of ADR-0023, make the **identification of "which
|
||
direction pair this transfer belongs to"** consistent and **address-based**,
|
||
without depending on topology / dict-order. It must work correctly in a
|
||
2-rank bidirectional ring (and more generally in any topology where
|
||
multiple directions point to the same peer).
|
||
|
||
### The bug surfaced — 2-rank bidirectional ring
|
||
|
||
`ring_1d(rank, world_size=2)` → `{"E": 1, "W": 1}` (rank 0). Both directions
|
||
point to the same peer.
|
||
|
||
**Bug 1 (install)**:
|
||
- `reverse_direction(0, 1)` → returns "E" by dict order (wrong; "W" is the
|
||
correct answer — opposite-direction convention)
|
||
- rank 0's E entry is set with `peer.rx_base_pa = rx_base(sip1, cube0, pe0, d="E")`
|
||
- tl.send(E) → data lands in sip1's E-rx buffer (should be W-rx)
|
||
|
||
**Bug 2 (runtime)**:
|
||
- Even if install set up the correct address, the receiver's
|
||
`_handle_meta_arrival` matches direction by sender coordinates only → the
|
||
first direction (E) wins
|
||
- peer_head_cache[E] is incremented; peer_head_cache[W] is unchanged
|
||
- The kernel's tl.recv(W) waits on peer_head_cache[W] → blocks forever →
|
||
IpcqDeadlock
|
||
|
||
### Root cause
|
||
|
||
The same issue along two axes:
|
||
1. **Install-time pairing**: deciding "which of my directions pairs with
|
||
which direction of the peer" depends on dict-iteration-order → fragile
|
||
when multiple directions point to the same peer
|
||
2. **Runtime identification**: deciding "which qp should be updated" is
|
||
based on sender coordinates alone → ambiguous when directions are
|
||
duplicated
|
||
|
||
### Solution direction — address-based matching
|
||
|
||
Each PE's rx buffer sits at a **unique address range per direction**
|
||
(rx_base_pa + direction_idx × bytes_per_direction). Therefore:
|
||
|
||
- **Runtime**: match by **dst_addr range** instead of sender coord →
|
||
unambiguous
|
||
- **Install**: prefer the opposite direction as a heuristic (the natural
|
||
symmetry of ring / mesh)
|
||
- No need for redundant metadata like `peer_direction` — **address is the
|
||
single source of truth**
|
||
|
||
This design works **independently of the PhysAddr transition (ADR-0030)**.
|
||
Whether the current addresses are synthetic or PhysAddr, the same approach
|
||
applies as long as the per-direction range uniqueness is preserved.
|
||
|
||
---
|
||
|
||
## Decision
|
||
|
||
### D1. Install — `reverse_direction` opposite-preference
|
||
|
||
`src/kernbench/ccl/install.py`:
|
||
|
||
```python
|
||
# Extended in ADR-0032 with global_* pairs for inter-SIP directions,
|
||
# which were introduced by configure_sfr_intercube_multisip to keep
|
||
# intercube (N/S/E/W) and inter-SIP (global_N/S/E/W) namespaces disjoint.
|
||
_OPPOSITE_DIR = {
|
||
"E": "W", "W": "E", "N": "S", "S": "N",
|
||
"global_E": "global_W", "global_W": "global_E",
|
||
"global_N": "global_S", "global_S": "global_N",
|
||
}
|
||
|
||
def reverse_direction(my_rank: int, peer_rank: int, my_dir: str) -> str | None:
|
||
"""Find peer's direction that reciprocates my_dir→peer_rank.
|
||
|
||
Prefer the OPPOSITE direction (E↔W, N↔S) when the peer has it
|
||
pointing back to us. This matters in 2-rank bidirectional rings
|
||
where both E and W on one side point to the same peer — without
|
||
the preference, the first-match-wins iteration would route data
|
||
into the wrong rx slot. Falls back to any direction pointing back
|
||
for topologies without an opposite convention (tree_binary's
|
||
parent/child).
|
||
"""
|
||
nt = neighbor_table[peer_rank]
|
||
opp = _OPPOSITE_DIR.get(my_dir)
|
||
if opp is not None and nt.get(opp) == my_rank:
|
||
return opp
|
||
for d, target in nt.items():
|
||
if target == my_rank:
|
||
return d
|
||
return None
|
||
```
|
||
|
||
Call site:
|
||
|
||
```python
|
||
for d, peer_rank in nbrs.items():
|
||
peer_dir = reverse_direction(r, peer_rank, d) # pass my_dir
|
||
if peer_dir is None:
|
||
continue
|
||
...
|
||
```
|
||
|
||
### D2. Runtime — `_handle_meta_arrival` dst_addr matching
|
||
|
||
`src/kernbench/components/builtin/pe_ipcq.py`:
|
||
|
||
```python
|
||
def _handle_meta_arrival(self, msg: IpcqMetaArrival) -> None:
|
||
"""Match incoming token to the receiver-side direction by dst_addr range.
|
||
|
||
Each direction has a unique rx buffer address range
|
||
(my_rx_base_pa + n_slots * slot_size). The token's dst_addr (set by
|
||
the sender's IPCQ when computing peer's slot address) falls within
|
||
exactly one such range. This address-based matching is unambiguous
|
||
even when multiple directions have the same peer (2-rank ring).
|
||
"""
|
||
token = msg.token
|
||
dst_addr = token.dst_addr
|
||
for d, qp in self._queue_pairs.items():
|
||
base = qp["my_rx_base_pa"]
|
||
size = qp["n_slots"] * qp["slot_size"]
|
||
if base <= dst_addr < base + size:
|
||
qp["peer_head_cache"] = max(qp["peer_head_cache"],
|
||
token.sender_seq + 1)
|
||
self._arrived_tokens.setdefault(d, []).append(token)
|
||
waiters = self._recv_waiters.get(d, [])
|
||
self._recv_waiters[d] = []
|
||
for ev in waiters:
|
||
if not ev.triggered:
|
||
ev.succeed()
|
||
any_waiters = self._any_recv_waiters
|
||
self._any_recv_waiters = []
|
||
for ev in any_waiters:
|
||
if not ev.triggered:
|
||
ev.succeed()
|
||
return
|
||
# Unknown dst_addr — diagnostic log (should not happen under correct install)
|
||
```
|
||
|
||
The sender-coordinate check is **removed**. `dst_addr` already determines
|
||
the direction.
|
||
|
||
### D3. Credit — add `dst_rx_base_pa` field
|
||
|
||
`src/kernbench/common/ipcq_types.py`:
|
||
|
||
```python
|
||
@dataclass(frozen=True)
|
||
class IpcqCreditMetadata:
|
||
consumer_seq: int
|
||
dst_rx_base_pa: int # NEW: matches the original sender's peer.rx_base_pa
|
||
# Existing fields (kept for diagnostic / logging purposes)
|
||
src_sip: int
|
||
src_cube: int
|
||
src_pe: int
|
||
src_direction: str
|
||
```
|
||
|
||
When the credit is generated (`_delayed_credit_send`): it carries this
|
||
direction's `my_rx_base_pa` as `dst_rx_base_pa` (this is the
|
||
`peer.rx_base_pa` the other side used when it was the sender).
|
||
|
||
Receiver side (`_credit_worker`):
|
||
|
||
```python
|
||
def _credit_worker(self, env):
|
||
while True:
|
||
credit = yield self._credit_inbox.get()
|
||
for d, qp in self._queue_pairs.items():
|
||
# Find the qp whose peer rx_base_pa matches the credit's dst_rx_base_pa
|
||
if qp["peer"].rx_base_pa == credit.dst_rx_base_pa:
|
||
qp["peer_tail_cache"] = max(qp["peer_tail_cache"],
|
||
credit.consumer_seq)
|
||
waiters = self._send_waiters.get(d, [])
|
||
self._send_waiters[d] = []
|
||
for ev in waiters:
|
||
if not ev.triggered:
|
||
ev.succeed()
|
||
break
|
||
```
|
||
|
||
Sender-coordinate check removed. Matching by `dst_rx_base_pa` is
|
||
unambiguous.
|
||
|
||
### D4. Do **not** add a `peer_direction` field to `IpcqInitEntry`
|
||
|
||
The `IpcqInitEntry.peer_direction` proposed in ADR-0025 rev 1 is
|
||
**unnecessary**. Reasons:
|
||
- Meta arrivals are matched by dst_addr (D2)
|
||
- Credits are matched by dst_rx_base_pa (D3)
|
||
- No need to store peer_direction on qp
|
||
- Install only uses peer_dir internally when computing rx_base_pa
|
||
(`reverse_direction`)
|
||
|
||
No change to the IpcqInitEntry schema. **Simpler** than rev 1.
|
||
|
||
### D5. Keep `IpcqDmaToken.src_direction` (diagnostic only)
|
||
|
||
The existing `src_direction` field is not removed. It is retained for:
|
||
- Logging / trace: the `(rank, t, dir, nbytes)` output of
|
||
`KERNBENCH_CCL_TRACE=1`
|
||
- Diagnostics: showing direction in pointer_dump, etc.
|
||
- Room for future extension
|
||
|
||
Runtime matching uses only `dst_addr`.
|
||
|
||
### D6. Invariants (strengthens ADR-0023 I3)
|
||
|
||
**I3 (strict)**: For each direction pair `(my_direction, peer_direction)`,
|
||
my rx_base and peer rx_base must point to **distinct direction slots**.
|
||
Install must guarantee this (reverse_direction opposite-preference).
|
||
|
||
**I3.1 (new)**: For every qp, `qp["my_rx_base_pa"]` and
|
||
`qp["peer"].rx_base_pa` occupy mutually disjoint address ranges (buffers
|
||
of different directions never overlap). This is the prerequisite for the
|
||
address-based matching of D2/D3.
|
||
|
||
Verifiable at install time:
|
||
```python
|
||
# ccl/install_plan.py: assertion at the end of build_install_plans
|
||
all_rx_ranges = set()
|
||
for plan in plans:
|
||
for pe_install in plan.pe_installs:
|
||
for entry in pe_install.neighbors:
|
||
r = (entry.my_rx_base_pa,
|
||
entry.my_rx_base_pa + plan.n_slots * plan.slot_size)
|
||
overlap = any(_ranges_overlap(r, e) for e in all_rx_ranges)
|
||
assert not overlap
|
||
all_rx_ranges.add(r)
|
||
```
|
||
|
||
---
|
||
|
||
## Dependencies
|
||
|
||
- **ADR-0023** (IPCQ protocol): this ADR modifies ADR-0023's runtime
|
||
matching logic (D2, D3) and improves the install heuristic (D1). No
|
||
change to the IPCQ protocol's semantic layer.
|
||
- **ADR-0024** (launcher): the case where a 2-rank bidirectional ring is
|
||
actually used is the ws=SIP_count model of ADR-0024. This ADR makes that
|
||
case work.
|
||
- **ADR-0030** (PhysAddr transition, stub): **independent** — ADR-0025's
|
||
address-based matching works identically whether the current addresses
|
||
are synthetic or PhysAddr.
|
||
|
||
---
|
||
|
||
## Non-goals
|
||
|
||
- **Migrating IPCQ addressing to PhysAddr**: ADR-0030 scope. This ADR is
|
||
agnostic to how addresses are encoded.
|
||
- **Multi-hop routing**: the single-hop DMA write assumption of ADR-0023
|
||
D5 still holds.
|
||
- **Unidir ring specialization**: `ring_1d_unidir` only has a single
|
||
direction, so the bug does not apply.
|
||
|
||
---
|
||
|
||
## Open questions
|
||
|
||
- **Address-matching performance**: `_handle_meta_arrival` and
|
||
`_credit_worker` iterate qp linearly (max 4 directions). The performance
|
||
impact is negligible. If it becomes an issue, this can be switched to a
|
||
dict lookup (`_qp_by_rx_base`).
|
||
- **Re-evaluating the need for `IpcqDmaToken.src_direction`**: whether to
|
||
keep this field, which is only kept for diagnostics, or to split it out
|
||
of logging. Currently retained.
|
||
- **Cost of install-time invariant verification**: the I3.1 verification
|
||
of D6 is O(N_PE × N_direction)^2. It could be slow on large topologies
|
||
→ improvable via data structures such as interval trees. Simple
|
||
implementation first.
|
||
|
||
---
|
||
|
||
## Consequences
|
||
|
||
### Positive
|
||
|
||
- **Simplicity**: redundant `peer_direction` metadata removed. Address is
|
||
the single source of truth.
|
||
- **Unambiguous matching**: works on every topology (including duplicate
|
||
directions).
|
||
- **Minimal schema changes**: `IpcqInitEntry` unchanged, one field added
|
||
to `IpcqCreditMetadata`.
|
||
- **Independent of PhysAddr transition (ADR-0030)**: address-based matching
|
||
is agnostic to the address encoding.
|
||
- **Diagnostics retained**: `IpcqDmaToken.src_direction` is kept for
|
||
logging.
|
||
|
||
### Negative
|
||
|
||
- Runtime matching is now by address comparison, so when debugging
|
||
questions like "why did peer_head_cache[W] update rather than [E]" one
|
||
has to follow the address range (previously the direction name was
|
||
enough). Mitigation: include a "direction ↔ rx_base_pa" mapping in
|
||
pointer_dump.
|
||
|
||
### Neutral
|
||
|
||
- The semantic layer of the IPCQ protocol (sender computes dst_addr,
|
||
receiver receives) is unchanged.
|