CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots

Rename the intercube all-reduce identity to lrab_hierarchical_allreduce (module, config key, distributed test) so the name reflects both levels it implements: LRAB intra-SIP (local reduce to center root + broadcast) and the hierarchical inter-SIP topology exchange (ring/torus/mesh). ADR-0032 slug kept as the stable decision id; pure rename, no logic change. Also in this batch: - ADR-0032 (EN+KO): document the shipped center-root bidirectional reduce (doc was stale corner-root); annotate ccl.yaml root_cube as a placeholder. - Rename allreduce + pe2pe latency plots to descriptive, title-matching filenames and retitle the in-plot headings; drop overview/overview_log. - Point the PPTX image refs at the new plot names. Doc + derived-artifact + rename only; no simulation behavior changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 20:50:48 -07:00
parent e77e4a1703
commit ff7d727ddd
38 changed files with 259 additions and 272 deletions
@@ -56,13 +56,17 @@ class Hop:


 HOPS = [
-    Hop("h1_intra_horizontal", "Intra-cube horizontal (pe0 to pe1)",
+    Hop("latency_intracube_PE0_to_PE1_horizontal",
+        "Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal)",
        (0, 0, 0), (0, 0, 1), "intra_E", "intra_W", True),
-    Hop("h2_intra_vertical", "Intra-cube vertical (pe0 to pe4)",
+    Hop("latency_intracube_PE0_to_PE4_vertical",
+        "Intra-cube PE-to-PE latency: PE0 → PE4 (vertical)",
        (0, 0, 0), (0, 0, 4), "intra_S", "intra_N", True),
-    Hop("h3_inter_cube_horizontal", "Inter-cube horizontal (cube0 to cube1)",
+    Hop("latency_intercube_C0PE0_to_C1PE0_horizontal",
+        "Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal)",
        (0, 0, 0), (0, 1, 0), "E", "W", True),
-    Hop("h4_inter_cube_vertical", "Inter-cube vertical (cube0 to cube4)",
+    Hop("latency_intercube_C0PE0_to_C4PE0_vertical",
+        "Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical)",
        (0, 0, 0), (0, 4, 0), "S", "N", True),
 ]

@@ -80,7 +84,7 @@ def _measure_ipcq(hop: Hop, nbytes: int) -> float:
    engine, spec = _make_engine()

    cfg = load_ccl_config()
-    merged = resolve_algorithm_config(cfg, name="intercube_allreduce")
+    merged = resolve_algorithm_config(cfg, name="lrab_hierarchical_allreduce")
    merged["slot_size"] = max(int(merged.get("slot_size", 4096)), nbytes)

    n_elem = nbytes // ELEM_BYTES