Rename the intercube all-reduce identity to lrab_hierarchical_allreduce
(module, config key, distributed test) so the name reflects both levels
it implements: LRAB intra-SIP (local reduce to center root + broadcast)
and the hierarchical inter-SIP topology exchange (ring/torus/mesh).
ADR-0032 slug kept as the stable decision id; pure rename, no logic change.
Also in this batch:
- ADR-0032 (EN+KO): document the shipped center-root bidirectional reduce
(doc was stale corner-root); annotate ccl.yaml root_cube as a placeholder.
- Rename allreduce + pe2pe latency plots to descriptive, title-matching
filenames and retitle the in-plot headings; drop overview/overview_log.
- Point the PPTX image refs at the new plot names.
Doc + derived-artifact + rename only; no simulation behavior changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PE_IPCQ._handle_recv now yields-from _delayed_credit_send instead of
spawning it as a fork, so the receiver's pe_exec_ns includes the
credit-return cost. _credit_latency_ns switches from
compute_drain_ns(path, 16) to compute_path_latency_ns(path, 16) and
fixes a latent find_path bug where the destination lacked the
".pe_dma" suffix (silently returned 0 ns under the bare except).
Net effect on h3/h4 inter-cube pe-to-pe latency: IPCQ >= raw DMA at
every size, matching real-HW posted-write semantics. tl.send remains
fire-and-forget. ADR-0023 D9 amended; new diagnostic test
tests/test_pe_to_pe_diagnostic.py captures per-PE pe_exec_ns, paths,
drain, and meta-arrival timing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>