kernbench2/tests at b610cb0d9abfafadb85a58598a5e3e94858430e0 - kernbench2 - YWGitServer

ywkang/kernbench2

Files

T

History

mukesh b610cb0d9a sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/

Convert the multidevice allreduce correctness + latency/buffer-kind sweeps
to run through the real PyTorch-distributed path
(init_process_group(backend="ahbm") -> mp.spawn -> dist.all_reduce) instead
of direct ctx.launch, and reorganize the CCL/allreduce tests into a
tests/sccl/ package split one test per file.

Production change (required for the distributed path on non-square SIP grids):
- AhbmCCLBackend now reads explicit system.sips.w/h from the spec, with a
  square-only sqrt fallback that raises on ambiguity, instead of silently
  guessing round(sqrt(count)). This fixes the 2x3 / 3x2 torus + mesh cases,
  which previously resolved to a wrong 2x2 grid. Mirrors the test helper's
  _sip_topo_dims precedence (explicit w/h > square fallback > raise).

Test reorganization (tests/sccl/):
- _allreduce_helpers.py: shared plumbing (distributed driver, config writers,
  direct-launch run_allreduce parity reference, sweep/buffer-kind constants,
  plot aggregators, topology-diagram + FSIM-comparison emitters).
- test_allreduce_ring_torus_mesh.py: correctness across ring/torus/mesh.
- test_distributed_default_topology.py: full distributed path on topology.yaml.
- test_plot_latency_sweep.py / test_plot_buffer_kind_sweep.py: sweep rows.
- test_plot_topology_diagram.py / test_plot_comparison_fsim.py: plot emitters.
- test_intercube_root_center.py: moved in (ADR-0032 center-root latency guard).

Also:
- Move the FSIM comparison plot generator out of scripts/ into the sccl suite.
- Delete superseded test files (test_allreduce_multidevice,
  test_distributed_lrab_hierarchical_allreduce, test_allreduce_buffer_kind_sweep)
  and repoint conftest aggregators + the ipcq buffer-kind importers.
- Regenerate the allreduce_latency_plots derived artifacts from the full sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 22:24:43 -07:00

..

Add SIP-level tensor parallelism, component registry YAML, VA offset verification

2026-03-26 01:13:17 -07:00

sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/

2026-05-20 22:24:43 -07:00

conftest.py

sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/

2026-05-20 22:24:43 -07:00

test_adr0026_dppolicy_intra_device.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_bench_registry.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_bw_occupancy.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_ccl_framework.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_ccl_helpers.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_ccl_install.py

Intercube allreduce: pe0 cube-mesh reduce + multi-SIP ring/torus/mesh

2026-04-16 17:33:42 -07:00

test_ccl_strict_mode.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_ccl_topologies.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_cli_list.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_cli_verify_data.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_cli.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_component_registry.py

Calibrate 3 tests for ADR-0033 Phase 2c per-flit wire timing

2026-05-14 23:06:33 -07:00

test_composite_epilogue.py

tl.composite: fused epilogue ops with per-op scope

2026-05-15 10:16:47 -07:00

test_cross_sip_routing.py

ADR-0019 D1/D4: per-PE HBM CTRL partitioning

2026-05-15 01:04:30 -07:00

test_d5_barrier_invariant.py

ADR-0009 D5: chain-aware target_start_ns + zero-byte launch fanout

2026-04-27 15:12:58 -07:00

test_data_executor.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_e2e_data.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_e2e_pipeline.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_emit_ipcq_diagram.py

Add tests/test_emit_ipcq_diagram.py (missed from earlier commit)

2026-04-27 21:42:44 -07:00

test_engine.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_flit_streaming.py

ADR-0033 Phase 2c-3 finish: op_log test + ADR doc reflect chunk-streaming

2026-05-14 23:12:50 -07:00

test_hbm_address_based_pc.py

ADR-0019 D1/D4: per-PE HBM CTRL partitioning

2026-05-15 01:04:30 -07:00

test_hbm_pc_striping.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_intercube_sfr_config.py

CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots

2026-05-20 20:50:48 -07:00

test_iochiplet_noc_d2h.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_ipcq_buffer_kind_latency.py

sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/

2026-05-20 22:24:43 -07:00

test_ipcq_buffer_kind_locations.py

sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/

2026-05-20 22:24:43 -07:00

test_ipcq_types.py

Fix ADR-0025: IPCQ direction addressing via address-based matching

2026-04-14 00:38:41 -07:00

test_kernel_launch_sync.py

Kernel-launch sync (ADR-0009 D5) and IPCQ drain at inbound (ADR-0023)

2026-04-23 15:30:29 -07:00

test_kernel_runner.py

Implement ADR-0020: 2-pass data execution with greenlet kernel runner

2026-04-08 00:22:44 -07:00

test_memory_store.py

Implement ADR-0020: 2-pass data execution with greenlet kernel runner

2026-04-08 00:22:44 -07:00

test_mmu_component.py

Rename impl names: add builtin. prefix for clear provenance

2026-04-09 00:16:24 -07:00

test_mmu_fabric.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_noc_mesh.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_op_log.py

Implement ADR-0020: 2-pass data execution with greenlet kernel runner

2026-04-08 00:22:44 -07:00

test_pe_components.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_pe_dma_ipcq.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_pe_ipcq.py

Fix ADR-0025: IPCQ direction addressing via address-based matching

2026-04-14 00:38:41 -07:00

test_pe_mmu.py

Add virtual memory support: PE_MMU, VA allocator, fabric MmuMapMsg

2026-03-26 00:01:47 -07:00

test_pe_pipeline.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_pe_to_pe_diagnostic.py

CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots

2026-05-20 20:50:48 -07:00

test_pe_to_pe_latency.py

CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots

2026-05-20 20:50:48 -07:00

test_per_pe_hbm_partition.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_phase_a_components.py

Latency model: HBM PC striping + chunk-loop drain (ADR-0033)

2026-05-14 21:59:07 -07:00

test_phyaddr.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_probe.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_routing.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_runtime_api_tensor.py

ADR-0026: DPPolicy intra-device only + ShardSpec structural coords

2026-04-14 13:02:19 -07:00

test_sip_parallel.py

ADR-0026: DPPolicy intra-device only + ShardSpec structural coords

2026-04-14 13:02:19 -07:00

test_sip_topology_rectangular.py

Rectangular SIP topology + 6-device allreduce sweep

2026-04-27 15:13:14 -07:00

test_tensor_free.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_tensor.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_tl_ipcq_api.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_topology_compile.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_topology_load.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_topology_visualize.py

Cube-view SVG: detailed topology validation rendering

2026-04-04 22:03:38 -07:00

test_tp_layers.py

ADR-0027: Megatron TP API + worker-wait generalization + mp.spawn

2026-04-14 16:31:13 -07:00

test_tp_mlp.py

ADR-0027: Megatron TP API + worker-wait generalization + mp.spawn

2026-04-14 16:31:13 -07:00

test_tp_parallel_state.py

ADR housekeeping: category prefixes, lifecycle folders, retroactive 0034-0037

2026-05-20 01:15:55 -07:00

test_triton_emu.py

Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)

2026-04-12 19:36:59 -07:00

test_va_allocator.py

Add virtual memory support: PE_MMU, VA allocator, fabric MmuMapMsg

2026-03-26 00:01:47 -07:00

test_va_integration.py

ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables

2026-04-27 15:52:29 -07:00

test_va_offset.py

benches: package as kernbench.benches, add @bench registry + list subcommand

2026-05-20 14:42:10 -07:00

test_verify_adr_lang_pairs.py

ADR: translate adr-ko/ to Korean, fix ADR-0013 slug, refine Status check

2026-05-20 08:17:56 -07:00

test_wire_cut_through.py

Latency model: HBM PC striping + chunk-loop drain (ADR-0033)

2026-05-14 21:59:07 -07:00