CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots

Rename the intercube all-reduce identity to lrab_hierarchical_allreduce
(module, config key, distributed test) so the name reflects both levels
it implements: LRAB intra-SIP (local reduce to center root + broadcast)
and the hierarchical inter-SIP topology exchange (ring/torus/mesh).
ADR-0032 slug kept as the stable decision id; pure rename, no logic change.

Also in this batch:
- ADR-0032 (EN+KO): document the shipped center-root bidirectional reduce
  (doc was stale corner-root); annotate ccl.yaml root_cube as a placeholder.
- Rename allreduce + pe2pe latency plots to descriptive, title-matching
  filenames and retitle the in-plot headings; drop overview/overview_log.
- Point the PPTX image refs at the new plot names.

Doc + derived-artifact + rename only; no simulation behavior changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 20:50:48 -07:00
parent e77e4a1703
commit ff7d727ddd
38 changed files with 259 additions and 272 deletions
+9 -5
View File
@@ -56,13 +56,17 @@ class Hop:
HOPS = [
Hop("h1_intra_horizontal", "Intra-cube horizontal (pe0 to pe1)",
Hop("latency_intracube_PE0_to_PE1_horizontal",
"Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal)",
(0, 0, 0), (0, 0, 1), "intra_E", "intra_W", True),
Hop("h2_intra_vertical", "Intra-cube vertical (pe0 to pe4)",
Hop("latency_intracube_PE0_to_PE4_vertical",
"Intra-cube PE-to-PE latency: PE0 → PE4 (vertical)",
(0, 0, 0), (0, 0, 4), "intra_S", "intra_N", True),
Hop("h3_inter_cube_horizontal", "Inter-cube horizontal (cube0 to cube1)",
Hop("latency_intercube_C0PE0_to_C1PE0_horizontal",
"Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal)",
(0, 0, 0), (0, 1, 0), "E", "W", True),
Hop("h4_inter_cube_vertical", "Inter-cube vertical (cube0 to cube4)",
Hop("latency_intercube_C0PE0_to_C4PE0_vertical",
"Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical)",
(0, 0, 0), (0, 4, 0), "S", "N", True),
]
@@ -80,7 +84,7 @@ def _measure_ipcq(hop: Hop, nbytes: int) -> float:
engine, spec = _make_engine()
cfg = load_ccl_config()
merged = resolve_algorithm_config(cfg, name="intercube_allreduce")
merged = resolve_algorithm_config(cfg, name="lrab_hierarchical_allreduce")
merged["slot_size"] = max(int(merged.get("slot_size", 4096)), nbytes)
n_elem = nbytes // ELEM_BYTES