• Joined on 2026-03-04
ywkang pushed to master at ywkang/kernbench2 2026-04-28 04:25:46 +00:00
fca24feac5 Fix all remaining test failures: single-cube allreduce + matplotlib dep
ywkang pushed to master at ywkang/kernbench2 2026-04-28 00:15:03 +00:00
d55dc6cb4f Merge: accept remote pe2pe summary.csv
ywkang pushed to master at ywkang/kernbench2 2026-04-27 22:52:45 +00:00
81cc32c46b ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables
ywkang pushed to master at ywkang/kernbench2 2026-04-14 23:45:29 +00:00
cfc2d74ec4 Refactor ccl_allreduce bench: rank=SIP only, remove rank=PE legacy path
ywkang pushed to master at ywkang/kernbench2 2026-04-14 23:31:16 +00:00
105f1dc09e ADR-0027: Megatron TP API + worker-wait generalization + mp.spawn
ywkang pushed to master at ywkang/kernbench2 2026-04-14 21:13:27 +00:00
e7f376ebaa ADR-0027 rev7 (Megatron TP + worker-wait generalization) + ADR-0026 typo fix
ywkang pushed to master at ywkang/kernbench2 2026-04-14 20:02:27 +00:00
357cab525b ADR-0026: DPPolicy intra-device only + ShardSpec structural coords
ywkang pushed to master at ywkang/kernbench2 2026-04-14 19:46:39 +00:00
787409ced1 ADR-0024 Phase B: update xfail reason with architectural blocker details
ywkang pushed to master at ywkang/kernbench2 2026-04-14 16:14:05 +00:00
79124daab1 ADR-0024 Phase B (partial): scheduler-level collective drain
ywkang pushed to master at ywkang/kernbench2 2026-04-14 16:00:30 +00:00
4ba0a83e71 Implement ADR-0024 Phase A: SIP-level TP launcher MVP
ywkang pushed to master at ywkang/kernbench2 2026-04-14 07:38:45 +00:00
32536daf2e Fix ADR-0025: IPCQ direction addressing via address-based matching
e1084800ab docs: add ADRs 0024–0031 for SIP-TP launcher stack
Compare 2 commits »
ywkang pushed to master at ywkang/kernbench2 2026-04-13 23:31:35 +00:00
b2c52f0e34 Add English translations for ADR-0018, 0019, 0020, 0021
ywkang pushed to master at ywkang/kernbench2 2026-04-13 06:52:05 +00:00
10b33b44ba Add Tensor indexing + hierarchical 3-level all-reduce kernel
ywkang pushed to master at ywkang/kernbench2 2026-04-13 06:02:21 +00:00
1c8ddc2d03 Fix Phase 1 slot-overwrite race + PE_MATH latency model (n_slots=4 safe)
ywkang pushed to master at ywkang/kernbench2 2026-04-13 04:13:27 +00:00
74f5f5cf08 Add session-scoped topology fixture in tests/conftest.py
ywkang pushed to master at ywkang/kernbench2 2026-04-13 04:06:42 +00:00
372c987995 Reduce test time to 12s: shrink GEMM dims + enable pytest-xdist
ywkang pushed to master at ywkang/kernbench2 2026-04-13 03:52:14 +00:00
bcf941dcee Speed up regression: 25min → 6min (test matrix + DataExecutor cleanup)
ywkang pushed to master at ywkang/kernbench2 2026-04-13 02:37:05 +00:00
998cc85762 Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)
ywkang pushed to master at ywkang/kernbench2 2026-04-09 23:49:58 +00:00
ff2c677a9c Add 2D grid program_id semantics (ADR-0022)
ywkang pushed to master at ywkang/kernbench2 2026-04-09 16:34:07 +00:00
dc3fb02aed Add --verify-data CLI flag, Tensor.data property, parallel DataExecutor