This website requires JavaScript.
Explore
Help
Register
Sign In
ywkang
0 Followers
·
0 Following
Joined on
2026-03-04
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues.
Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
1
Projects
1
Packages
Public Activity
Starred Repositories
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-28 04:25:46 +00:00
fca24feac5
Fix all remaining test failures: single-cube allreduce + matplotlib dep
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-28 00:15:03 +00:00
d55dc6cb4f
Merge: accept remote pe2pe summary.csv
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-27 22:52:45 +00:00
81cc32c46b
ADR-0001 Rev 2: 51-bit PhysAddr layout with concrete sub-unit tables
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 23:45:29 +00:00
cfc2d74ec4
Refactor ccl_allreduce bench: rank=SIP only, remove rank=PE legacy path
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 23:31:16 +00:00
105f1dc09e
ADR-0027: Megatron TP API + worker-wait generalization + mp.spawn
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 21:13:27 +00:00
e7f376ebaa
ADR-0027 rev7 (Megatron TP + worker-wait generalization) + ADR-0026 typo fix
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 20:02:27 +00:00
357cab525b
ADR-0026: DPPolicy intra-device only + ShardSpec structural coords
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 19:46:39 +00:00
787409ced1
ADR-0024 Phase B: update xfail reason with architectural blocker details
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 16:14:05 +00:00
79124daab1
ADR-0024 Phase B (partial): scheduler-level collective drain
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 16:00:30 +00:00
4ba0a83e71
Implement ADR-0024 Phase A: SIP-level TP launcher MVP
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-14 07:38:45 +00:00
32536daf2e
Fix ADR-0025: IPCQ direction addressing via address-based matching
e1084800ab
docs: add ADRs 0024–0031 for SIP-TP launcher stack
Compare 2 commits »
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 23:31:35 +00:00
b2c52f0e34
Add English translations for ADR-0018, 0019, 0020, 0021
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 06:52:05 +00:00
10b33b44ba
Add Tensor indexing + hierarchical 3-level all-reduce kernel
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 06:02:21 +00:00
1c8ddc2d03
Fix Phase 1 slot-overwrite race + PE_MATH latency model (n_slots=4 safe)
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 04:13:27 +00:00
74f5f5cf08
Add session-scoped topology fixture in tests/conftest.py
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 04:06:42 +00:00
372c987995
Reduce test time to 12s: shrink GEMM dims + enable pytest-xdist
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 03:52:14 +00:00
bcf941dcee
Speed up regression: 25min → 6min (test matrix + DataExecutor cleanup)
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-13 02:37:05 +00:00
998cc85762
Add PE-level IPCQ collective infra + unified ccl_allreduce bench (ADR-0023)
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-09 23:49:58 +00:00
ff2c677a9c
Add 2D grid program_id semantics (ADR-0022)
ywkang
pushed to
master
at
ywkang/kernbench2
2026-04-09 16:34:07 +00:00
dc3fb02aed
Add --verify-data CLI flag, Tensor.data property, parallel DataExecutor
Previous
2
Next