Speed up regression: 25min → 6min (test matrix + DataExecutor cleanup)
Test matrix restructure: - 256-rank full-system ring runs only ONCE (marked pytest.mark.slow) instead of 7× across matrix + perf tests. Cross-SIP routing is verified by the single run; buffer variants (tcm/hbm/sram) are tested at 8-rank where they finish in <0.5s. - Performance tests use 8-rank instead of 256-rank. - `pytest -m "not slow"` completes in ~2.5min (local dev). - Full suite including slow: ~6min (CI). DataExecutor optimization: - Remove ThreadPoolExecutor from DataExecutor.run(). Same-t_start groups are almost always size 1, so the thread pool creation and dispatch overhead dominated. Simple sequential loop is faster. - Skip dma_read ops at the loop level (they are always no-ops in Phase 2 but were dispatched through _execute_op → _execute_memory). - Remove redundant CLI Phase 2 re-execution: engine._flush_data_phase already replays during engine.wait(); the CLI now only prints the diagnostic summary without re-running DataExecutor. 502 tests pass. Wall time: 25m30s → 5m43s (full), 2m28s (no slow). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -72,15 +72,12 @@ def cmd_run(args) -> int:
|
||||
print(format_report(result.traces, title=args.bench, spec=spec))
|
||||
print(result.summary_text())
|
||||
|
||||
# Phase 2: data execution (ADR-0020)
|
||||
# Phase 2 diagnostic summary (ADR-0020). The actual Phase 2 replay
|
||||
# already runs inside engine.wait() → _flush_data_phase(). We only
|
||||
# print the summary here; no redundant re-execution.
|
||||
if verify_data and result.engine is not None:
|
||||
from kernbench.sim_engine.data_executor import DataExecutor
|
||||
|
||||
op_log = result.engine.op_log
|
||||
store = result.engine.memory_store
|
||||
if op_log and store is not None:
|
||||
executor = DataExecutor(op_log, store)
|
||||
executor.run()
|
||||
if op_log:
|
||||
n_gemm = sum(1 for r in op_log if r.op_kind == "gemm")
|
||||
n_math = sum(1 for r in op_log if r.op_kind == "math")
|
||||
print(f"[data] Phase 2 complete: {len(op_log)} ops ({n_gemm} gemm, {n_math} math)")
|
||||
|
||||
Reference in New Issue
Block a user