ADR-0023 D9: blocking credit-emit with full-path latency

PE_IPCQ._handle_recv now yields-from _delayed_credit_send instead of
spawning it as a fork, so the receiver's pe_exec_ns includes the
credit-return cost. _credit_latency_ns switches from
compute_drain_ns(path, 16) to compute_path_latency_ns(path, 16) and
fixes a latent find_path bug where the destination lacked the
".pe_dma" suffix (silently returned 0 ns under the bare except).

Net effect on h3/h4 inter-cube pe-to-pe latency: IPCQ >= raw DMA at
every size, matching real-HW posted-write semantics. tl.send remains
fire-and-forget. ADR-0023 D9 amended; new diagnostic test
tests/test_pe_to_pe_diagnostic.py captures per-PE pe_exec_ns, paths,
drain, and meta-arrival timing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 15:12:38 -07:00
parent 19dfc86dc3
commit 90874abbfe
11 changed files with 901 additions and 25 deletions
+18 -7
View File
@@ -338,9 +338,13 @@ class PeIpcqComponent(ComponentBase):
nbytes=req.result_data.get("nbytes", 0),
)
# Fast path credit return — bottleneck BW based latency
env.process(
self._delayed_credit_send(env, direction, qp["peer_credit_store"], qp["my_tail"])
# Credit return: recv blocks on credit-emit so the protocol cost
# (full path latency to deliver the credit metadata back to the
# sender) is reflected in the recv's pe_exec_ns. Models the IPCQ
# control-plane completing the consume-acknowledgement before
# recv returns to the kernel.
yield from self._delayed_credit_send(
env, direction, qp["peer_credit_store"], qp["my_tail"],
)
if not req.done.triggered:
@@ -455,7 +459,12 @@ class PeIpcqComponent(ComponentBase):
yield peer_credit_store.put(meta)
def _credit_latency_ns(self, direction: str) -> float:
"""Compute credit fast path latency = credit_size / bottleneck_bw.
"""Full path latency for the credit-return packet.
Pays per-node overhead + edge prop + drain along the same fabric
the data took. PathRouter.find_path() auto-appends ".pe_dma" to
the source only, so the destination MUST be spelled with the
explicit ".pe_dma" suffix.
Falls back to 0 when ctx/router is unavailable (unit-test mode).
"""
@@ -463,10 +472,12 @@ class PeIpcqComponent(ComponentBase):
return 0.0
qp = self._queue_pairs[direction]
peer = qp["peer"]
peer_pe_prefix = f"sip{peer.sip}.cube{peer.cube}.pe{peer.pe}"
peer_pe_dma = f"sip{peer.sip}.cube{peer.cube}.pe{peer.pe}.pe_dma"
try:
path = self.ctx.router.find_path(self._pe_prefix, peer_pe_prefix)
return self.ctx.compute_drain_ns(path, self._credit_size_bytes)
path = self.ctx.router.find_path(self._pe_prefix, peer_pe_dma)
return self.ctx.compute_path_latency_ns(
path, self._credit_size_bytes,
)
except Exception:
return 0.0