Fix ADR-0025: IPCQ direction addressing via address-based matching

2-rank bidirectional ring deadlock: when E and W neighbors point to the
same peer, sender-coord matching in _handle_meta_arrival / _credit_worker
picked the first direction in dict order, landing data in the wrong rx
slot relative to what the kernel recv(W) was waiting on.

Fix (ADR-0025 D1/D2/D3):
- install.reverse_direction: prefer OPPOSITE direction (E↔W, N↔S) when
  peer has it pointing back to us; fallback to any matching for
  topologies without opposite convention (tree_binary parent/child).
- _handle_meta_arrival: match by token.dst_addr range against each qp's
  my_rx_base_pa + n_slots × slot_size window (unambiguous).
- _credit_worker: match by credit.dst_rx_base_pa == qp.peer.rx_base_pa.
- IpcqCreditMetadata: new dst_rx_base_pa field carrying receiver-side
  rx base; _delayed_credit_send fills it from the consuming qp.

Tests (Phase 1 → Phase 2):
- test_reverse_direction_opposite_preference_2rank_ring
- test_reverse_direction_opposite_preference_4rank_ring_sanity
- test_meta_arrival_matches_by_dst_addr_same_peer
- test_credit_matches_by_dst_rx_base_pa_same_peer
- Existing credit-return test updated with dst_rx_base_pa.

508 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-14 00:38:41 -07:00
parent e1084800ab
commit 32536daf2e
6 changed files with 277 additions and 17 deletions
+19 -4
View File
@@ -219,9 +219,24 @@ def install_ipcq(
"neighbor_table": neighbor_table,
}
def reverse_direction(my_rank: int, peer_rank: int) -> str | None:
"""Find which direction in peer's neighbor table points back to my_rank."""
for d, target in neighbor_table[peer_rank].items():
_OPPOSITE_DIR = {"E": "W", "W": "E", "N": "S", "S": "N"}
def reverse_direction(my_rank: int, peer_rank: int, my_dir: str) -> str | None:
"""Find peer's direction that reciprocates my_dir→peer_rank.
Prefer the OPPOSITE direction (E↔W, N↔S) when the peer has it
pointing back to us (ADR-0025 D1). This matters in 2-rank
bidirectional rings where both E and W on one side point to the
same peer — without the preference, dict-order first-match would
route data into the wrong rx slot. Falls back to any direction
pointing back for topologies without an opposite convention
(e.g. tree_binary's parent/child).
"""
nt = neighbor_table[peer_rank]
opp = _OPPOSITE_DIR.get(my_dir)
if opp is not None and nt.get(opp) == my_rank:
return opp
for d, target in nt.items():
if target == my_rank:
return d
return None
@@ -234,7 +249,7 @@ def install_ipcq(
if peer_rank is None:
continue
peer_s, peer_c, peer_p = rank_pe[peer_rank]
peer_dir = reverse_direction(r, peer_rank)
peer_dir = reverse_direction(r, peer_rank, d)
if peer_dir is None:
# Peer doesn't have a reverse entry — skip (asymmetric topology)
continue