CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots

Rename the intercube all-reduce identity to lrab_hierarchical_allreduce
(module, config key, distributed test) so the name reflects both levels
it implements: LRAB intra-SIP (local reduce to center root + broadcast)
and the hierarchical inter-SIP topology exchange (ring/torus/mesh).
ADR-0032 slug kept as the stable decision id; pure rename, no logic change.

Also in this batch:
- ADR-0032 (EN+KO): document the shipped center-root bidirectional reduce
  (doc was stale corner-root); annotate ccl.yaml root_cube as a placeholder.
- Rename allreduce + pe2pe latency plots to descriptive, title-matching
  filenames and retitle the in-plot headings; drop overview/overview_log.
- Point the PPTX image refs at the new plot names.

Doc + derived-artifact + rename only; no simulation behavior changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 20:50:48 -07:00
parent e77e4a1703
commit ff7d727ddd
38 changed files with 259 additions and 272 deletions
+6 -6
View File
@@ -1,7 +1,7 @@
"""Phase 1 test for moving the intercube_allreduce root cube from the
"""Phase 1 test for moving the lrab_hierarchical_allreduce root cube from the
bottom-right corner (3,3) to the geometric center (2,2).
Today's algorithm (intercube_allreduce.py) hardcodes
Today's algorithm (lrab_hierarchical_allreduce.py) hardcodes
``root_cube = (cube_h-1) * cube_w + (cube_w-1)`` (= cube 15 in 4×4).
The intra-SIP critical path for one allreduce is therefore::
@@ -55,7 +55,7 @@ def _run_torus_96kb(tmp_path: Path) -> float:
sub,
sip_topology="torus_2d",
n_sips=6,
algorithm="intercube_allreduce",
algorithm="lrab_hierarchical_allreduce",
sip_w=3, sip_h=2,
n_elem_override=49152, # 49152 × 2 = 96 KB / slot
)
@@ -70,7 +70,7 @@ def _run_torus_96kb(tmp_path: Path) -> float:
) as ctx:
result = run_allreduce(
ctx, engine, spec,
algorithm="intercube_allreduce", ccl_yaml=ccl_path,
algorithm="lrab_hierarchical_allreduce", ccl_yaml=ccl_path,
)
assert result["ok_cubes"] > 0
pe_exec_vals = [
@@ -121,7 +121,7 @@ def test_correctness_preserved(tmp_path):
sub,
sip_topology="torus_2d",
n_sips=6,
algorithm="intercube_allreduce",
algorithm="lrab_hierarchical_allreduce",
sip_w=3, sip_h=2,
n_elem_override=128, # tiny payload to keep this fast
)
@@ -136,7 +136,7 @@ def test_correctness_preserved(tmp_path):
) as ctx:
result = run_allreduce(
ctx, engine, spec,
algorithm="intercube_allreduce", ccl_yaml=ccl_path,
algorithm="lrab_hierarchical_allreduce", ccl_yaml=ccl_path,
)
n_cubes = 6 * 16 # 6 SIPs × 16 cubes/SIP
assert result["ok_cubes"] == n_cubes, (