Fix ADR-0025: IPCQ direction addressing via address-based matching
2-rank bidirectional ring deadlock: when E and W neighbors point to the same peer, sender-coord matching in _handle_meta_arrival / _credit_worker picked the first direction in dict order, landing data in the wrong rx slot relative to what the kernel recv(W) was waiting on. Fix (ADR-0025 D1/D2/D3): - install.reverse_direction: prefer OPPOSITE direction (E↔W, N↔S) when peer has it pointing back to us; fallback to any matching for topologies without opposite convention (tree_binary parent/child). - _handle_meta_arrival: match by token.dst_addr range against each qp's my_rx_base_pa + n_slots × slot_size window (unambiguous). - _credit_worker: match by credit.dst_rx_base_pa == qp.peer.rx_base_pa. - IpcqCreditMetadata: new dst_rx_base_pa field carrying receiver-side rx base; _delayed_credit_send fills it from the consuming qp. Tests (Phase 1 → Phase 2): - test_reverse_direction_opposite_preference_2rank_ring - test_reverse_direction_opposite_preference_4rank_ring_sanity - test_meta_arrival_matches_by_dst_addr_same_peer - test_credit_matches_by_dst_rx_base_pa_same_peer - Existing credit-return test updated with dst_rx_base_pa. 508 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -196,10 +196,17 @@ class IpcqCreditMetadata:
|
||||
Sent by ``PeIpcqComponent._delayed_credit_send`` after a
|
||||
bottleneck-BW based latency, putting the metadata directly into
|
||||
the peer's pre-wired credit store (no fabric routing).
|
||||
|
||||
``dst_rx_base_pa`` is the receiver's ``my_rx_base_pa`` for the direction
|
||||
whose slot was consumed. The original sender matches this against
|
||||
``qp.peer.rx_base_pa`` to find the correct direction (ADR-0025 D3) —
|
||||
unambiguous even when multiple directions share the same peer (e.g.
|
||||
2-rank bidirectional ring).
|
||||
"""
|
||||
|
||||
consumer_seq: int # my_tail at recv side (new tail value)
|
||||
src_sip: int # which peer is sending the credit
|
||||
dst_rx_base_pa: int # receiver-side my_rx_base_pa (ADR-0025 D3)
|
||||
src_sip: int # which peer is sending the credit (diag)
|
||||
src_cube: int
|
||||
src_pe: int
|
||||
src_direction: str # sender-side direction (peer maps to its own)
|
||||
|
||||
Reference in New Issue
Block a user