ADR-0023 D9: blocking credit-emit with full-path latency
PE_IPCQ._handle_recv now yields-from _delayed_credit_send instead of spawning it as a fork, so the receiver's pe_exec_ns includes the credit-return cost. _credit_latency_ns switches from compute_drain_ns(path, 16) to compute_path_latency_ns(path, 16) and fixes a latent find_path bug where the destination lacked the ".pe_dma" suffix (silently returned 0 ns under the bare except). Net effect on h3/h4 inter-cube pe-to-pe latency: IPCQ >= raw DMA at every size, matching real-HW posted-write semantics. tl.send remains fire-and-forget. ADR-0023 D9 amended; new diagnostic test tests/test_pe_to_pe_diagnostic.py captures per-PE pe_exec_ns, paths, drain, and meta-arrival timing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -338,9 +338,13 @@ class PeIpcqComponent(ComponentBase):
|
||||
nbytes=req.result_data.get("nbytes", 0),
|
||||
)
|
||||
|
||||
# Fast path credit return — bottleneck BW based latency
|
||||
env.process(
|
||||
self._delayed_credit_send(env, direction, qp["peer_credit_store"], qp["my_tail"])
|
||||
# Credit return: recv blocks on credit-emit so the protocol cost
|
||||
# (full path latency to deliver the credit metadata back to the
|
||||
# sender) is reflected in the recv's pe_exec_ns. Models the IPCQ
|
||||
# control-plane completing the consume-acknowledgement before
|
||||
# recv returns to the kernel.
|
||||
yield from self._delayed_credit_send(
|
||||
env, direction, qp["peer_credit_store"], qp["my_tail"],
|
||||
)
|
||||
|
||||
if not req.done.triggered:
|
||||
@@ -455,7 +459,12 @@ class PeIpcqComponent(ComponentBase):
|
||||
yield peer_credit_store.put(meta)
|
||||
|
||||
def _credit_latency_ns(self, direction: str) -> float:
|
||||
"""Compute credit fast path latency = credit_size / bottleneck_bw.
|
||||
"""Full path latency for the credit-return packet.
|
||||
|
||||
Pays per-node overhead + edge prop + drain along the same fabric
|
||||
the data took. PathRouter.find_path() auto-appends ".pe_dma" to
|
||||
the source only, so the destination MUST be spelled with the
|
||||
explicit ".pe_dma" suffix.
|
||||
|
||||
Falls back to 0 when ctx/router is unavailable (unit-test mode).
|
||||
"""
|
||||
@@ -463,10 +472,12 @@ class PeIpcqComponent(ComponentBase):
|
||||
return 0.0
|
||||
qp = self._queue_pairs[direction]
|
||||
peer = qp["peer"]
|
||||
peer_pe_prefix = f"sip{peer.sip}.cube{peer.cube}.pe{peer.pe}"
|
||||
peer_pe_dma = f"sip{peer.sip}.cube{peer.cube}.pe{peer.pe}.pe_dma"
|
||||
try:
|
||||
path = self.ctx.router.find_path(self._pe_prefix, peer_pe_prefix)
|
||||
return self.ctx.compute_drain_ns(path, self._credit_size_bytes)
|
||||
path = self.ctx.router.find_path(self._pe_prefix, peer_pe_dma)
|
||||
return self.ctx.compute_path_latency_ns(
|
||||
path, self._credit_size_bytes,
|
||||
)
|
||||
except Exception:
|
||||
return 0.0
|
||||
|
||||
|
||||
Reference in New Issue
Block a user