Files
kernbench2/docs/adr/ADR-0025-algo-ipcq-direction-addressing.md
ywkang a796c1d2f7 ADR: bilingual structure — EN canonical in adr/, KO mirror in adr-ko/
Establish English as the canonical ADR language with Korean translations
held in a parallel docs/adr-ko/ tree as derived artifacts (1:1 mirror).
Promotion from adr-proposed/ to adr/ now writes English to adr/ and the
Korean to adr-ko/; bidirectional sync rule documented in CLAUDE.md.

- Migrate 30 ADRs in docs/adr/: 28 Korean-only translated to English,
  2 bilingual pairs (ADR-0020, ADR-0023) consolidated (.en.md suffix
  dropped). ADR-0023 EN regenerated against KO source which had newer
  HW Realization Notes (D16-D23) section.
- docs/adr-history/ left frozen by design (transitional state).
- CLAUDE.md (Part 2): update ADR Lifecycle for 4-folder layout, mark
  docs/adr-ko/ as a Derived Artifact, add ADR Translation Discipline
  section covering bidirectional sync, conflict resolution (EN wins),
  and proposed-language freedom.
- tools/verify_adr_lang_pairs.py: new verification tool checking pair
  completeness, filename mirroring, ADR-ID match, Status byte-equality.
  Pre-commit hook intentionally not added; run on demand or in CI.
- tests/test_verify_adr_lang_pairs.py: 11 cases including CRLF/LF
  normalization, em-dash title separator, underscore-slug edge case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:38:44 -07:00

11 KiB
Raw Permalink Blame History

ADR-0025: IPCQ Direction Addressing — address-based matching

Status

Accepted (Revision 2 — Address-based matching; peer_direction field dropped)

Context

Goal

In the IPCQ protocol of ADR-0023, make the identification of "which direction pair this transfer belongs to" consistent and address-based, without depending on topology / dict-order. It must work correctly in a 2-rank bidirectional ring (and more generally in any topology where multiple directions point to the same peer).

The bug surfaced — 2-rank bidirectional ring

ring_1d(rank, world_size=2){"E": 1, "W": 1} (rank 0). Both directions point to the same peer.

Bug 1 (install):

  • reverse_direction(0, 1) → returns "E" by dict order (wrong; "W" is the correct answer — opposite-direction convention)
  • rank 0's E entry is set with peer.rx_base_pa = rx_base(sip1, cube0, pe0, d="E")
  • tl.send(E) → data lands in sip1's E-rx buffer (should be W-rx)

Bug 2 (runtime):

  • Even if install set up the correct address, the receiver's _handle_meta_arrival matches direction by sender coordinates only → the first direction (E) wins
  • peer_head_cache[E] is incremented; peer_head_cache[W] is unchanged
  • The kernel's tl.recv(W) waits on peer_head_cache[W] → blocks forever → IpcqDeadlock

Root cause

The same issue along two axes:

  1. Install-time pairing: deciding "which of my directions pairs with which direction of the peer" depends on dict-iteration-order → fragile when multiple directions point to the same peer
  2. Runtime identification: deciding "which qp should be updated" is based on sender coordinates alone → ambiguous when directions are duplicated

Solution direction — address-based matching

Each PE's rx buffer sits at a unique address range per direction (rx_base_pa + direction_idx × bytes_per_direction). Therefore:

  • Runtime: match by dst_addr range instead of sender coord → unambiguous
  • Install: prefer the opposite direction as a heuristic (the natural symmetry of ring / mesh)
  • No need for redundant metadata like peer_directionaddress is the single source of truth

This design works independently of the PhysAddr transition (ADR-0030). Whether the current addresses are synthetic or PhysAddr, the same approach applies as long as the per-direction range uniqueness is preserved.


Decision

D1. Install — reverse_direction opposite-preference

src/kernbench/ccl/install.py:

# Extended in ADR-0032 with global_* pairs for inter-SIP directions,
# which were introduced by configure_sfr_intercube_multisip to keep
# intercube (N/S/E/W) and inter-SIP (global_N/S/E/W) namespaces disjoint.
_OPPOSITE_DIR = {
    "E": "W", "W": "E", "N": "S", "S": "N",
    "global_E": "global_W", "global_W": "global_E",
    "global_N": "global_S", "global_S": "global_N",
}

def reverse_direction(my_rank: int, peer_rank: int, my_dir: str) -> str | None:
    """Find peer's direction that reciprocates my_dir→peer_rank.

    Prefer the OPPOSITE direction (E↔W, N↔S) when the peer has it
    pointing back to us. This matters in 2-rank bidirectional rings
    where both E and W on one side point to the same peer — without
    the preference, the first-match-wins iteration would route data
    into the wrong rx slot. Falls back to any direction pointing back
    for topologies without an opposite convention (tree_binary's
    parent/child).
    """
    nt = neighbor_table[peer_rank]
    opp = _OPPOSITE_DIR.get(my_dir)
    if opp is not None and nt.get(opp) == my_rank:
        return opp
    for d, target in nt.items():
        if target == my_rank:
            return d
    return None

Call site:

for d, peer_rank in nbrs.items():
    peer_dir = reverse_direction(r, peer_rank, d)  # pass my_dir
    if peer_dir is None:
        continue
    ...

D2. Runtime — _handle_meta_arrival dst_addr matching

src/kernbench/components/builtin/pe_ipcq.py:

def _handle_meta_arrival(self, msg: IpcqMetaArrival) -> None:
    """Match incoming token to the receiver-side direction by dst_addr range.

    Each direction has a unique rx buffer address range
    (my_rx_base_pa + n_slots * slot_size). The token's dst_addr (set by
    the sender's IPCQ when computing peer's slot address) falls within
    exactly one such range. This address-based matching is unambiguous
    even when multiple directions have the same peer (2-rank ring).
    """
    token = msg.token
    dst_addr = token.dst_addr
    for d, qp in self._queue_pairs.items():
        base = qp["my_rx_base_pa"]
        size = qp["n_slots"] * qp["slot_size"]
        if base <= dst_addr < base + size:
            qp["peer_head_cache"] = max(qp["peer_head_cache"],
                                         token.sender_seq + 1)
            self._arrived_tokens.setdefault(d, []).append(token)
            waiters = self._recv_waiters.get(d, [])
            self._recv_waiters[d] = []
            for ev in waiters:
                if not ev.triggered:
                    ev.succeed()
            any_waiters = self._any_recv_waiters
            self._any_recv_waiters = []
            for ev in any_waiters:
                if not ev.triggered:
                    ev.succeed()
            return
    # Unknown dst_addr — diagnostic log (should not happen under correct install)

The sender-coordinate check is removed. dst_addr already determines the direction.

D3. Credit — add dst_rx_base_pa field

src/kernbench/common/ipcq_types.py:

@dataclass(frozen=True)
class IpcqCreditMetadata:
    consumer_seq: int
    dst_rx_base_pa: int       # NEW: matches the original sender's peer.rx_base_pa
    # Existing fields (kept for diagnostic / logging purposes)
    src_sip: int
    src_cube: int
    src_pe: int
    src_direction: str

When the credit is generated (_delayed_credit_send): it carries this direction's my_rx_base_pa as dst_rx_base_pa (this is the peer.rx_base_pa the other side used when it was the sender).

Receiver side (_credit_worker):

def _credit_worker(self, env):
    while True:
        credit = yield self._credit_inbox.get()
        for d, qp in self._queue_pairs.items():
            # Find the qp whose peer rx_base_pa matches the credit's dst_rx_base_pa
            if qp["peer"].rx_base_pa == credit.dst_rx_base_pa:
                qp["peer_tail_cache"] = max(qp["peer_tail_cache"],
                                              credit.consumer_seq)
                waiters = self._send_waiters.get(d, [])
                self._send_waiters[d] = []
                for ev in waiters:
                    if not ev.triggered:
                        ev.succeed()
                break

Sender-coordinate check removed. Matching by dst_rx_base_pa is unambiguous.

D4. Do not add a peer_direction field to IpcqInitEntry

The IpcqInitEntry.peer_direction proposed in ADR-0025 rev 1 is unnecessary. Reasons:

  • Meta arrivals are matched by dst_addr (D2)
  • Credits are matched by dst_rx_base_pa (D3)
  • No need to store peer_direction on qp
  • Install only uses peer_dir internally when computing rx_base_pa (reverse_direction)

No change to the IpcqInitEntry schema. Simpler than rev 1.

D5. Keep IpcqDmaToken.src_direction (diagnostic only)

The existing src_direction field is not removed. It is retained for:

  • Logging / trace: the (rank, t, dir, nbytes) output of KERNBENCH_CCL_TRACE=1
  • Diagnostics: showing direction in pointer_dump, etc.
  • Room for future extension

Runtime matching uses only dst_addr.

D6. Invariants (strengthens ADR-0023 I3)

I3 (strict): For each direction pair (my_direction, peer_direction), my rx_base and peer rx_base must point to distinct direction slots. Install must guarantee this (reverse_direction opposite-preference).

I3.1 (new): For every qp, qp["my_rx_base_pa"] and qp["peer"].rx_base_pa occupy mutually disjoint address ranges (buffers of different directions never overlap). This is the prerequisite for the address-based matching of D2/D3.

Verifiable at install time:

# ccl/install_plan.py: assertion at the end of build_install_plans
all_rx_ranges = set()
for plan in plans:
    for pe_install in plan.pe_installs:
        for entry in pe_install.neighbors:
            r = (entry.my_rx_base_pa,
                 entry.my_rx_base_pa + plan.n_slots * plan.slot_size)
            overlap = any(_ranges_overlap(r, e) for e in all_rx_ranges)
            assert not overlap
            all_rx_ranges.add(r)

Dependencies

  • ADR-0023 (IPCQ protocol): this ADR modifies ADR-0023's runtime matching logic (D2, D3) and improves the install heuristic (D1). No change to the IPCQ protocol's semantic layer.
  • ADR-0024 (launcher): the case where a 2-rank bidirectional ring is actually used is the ws=SIP_count model of ADR-0024. This ADR makes that case work.
  • ADR-0030 (PhysAddr transition, stub): independent — ADR-0025's address-based matching works identically whether the current addresses are synthetic or PhysAddr.

Non-goals

  • Migrating IPCQ addressing to PhysAddr: ADR-0030 scope. This ADR is agnostic to how addresses are encoded.
  • Multi-hop routing: the single-hop DMA write assumption of ADR-0023 D5 still holds.
  • Unidir ring specialization: ring_1d_unidir only has a single direction, so the bug does not apply.

Open questions

  • Address-matching performance: _handle_meta_arrival and _credit_worker iterate qp linearly (max 4 directions). The performance impact is negligible. If it becomes an issue, this can be switched to a dict lookup (_qp_by_rx_base).
  • Re-evaluating the need for IpcqDmaToken.src_direction: whether to keep this field, which is only kept for diagnostics, or to split it out of logging. Currently retained.
  • Cost of install-time invariant verification: the I3.1 verification of D6 is O(N_PE × N_direction)^2. It could be slow on large topologies → improvable via data structures such as interval trees. Simple implementation first.

Consequences

Positive

  • Simplicity: redundant peer_direction metadata removed. Address is the single source of truth.
  • Unambiguous matching: works on every topology (including duplicate directions).
  • Minimal schema changes: IpcqInitEntry unchanged, one field added to IpcqCreditMetadata.
  • Independent of PhysAddr transition (ADR-0030): address-based matching is agnostic to the address encoding.
  • Diagnostics retained: IpcqDmaToken.src_direction is kept for logging.

Negative

  • Runtime matching is now by address comparison, so when debugging questions like "why did peer_head_cache[W] update rather than [E]" one has to follow the address range (previously the direction name was enough). Mitigation: include a "direction ↔ rx_base_pa" mapping in pointer_dump.

Neutral

  • The semantic layer of the IPCQ protocol (sender computes dst_addr, receiver receives) is unchanged.