This website requires JavaScript.
Explore
Help
Register
Sign In
mukesh
0 Followers
·
0 Following
Joined on
2026-04-13
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues.
Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
Projects
Packages
Public Activity
Starred Repositories
mukesh
pushed to
master
at
ywkang/kernbench2
2026-06-02 05:23:35 +00:00
b3ca532023
attention: milestone-gqa-llama70b figures + MILESTONE_FAST (sub-cycle 4c, 5/6)
e748a62264
attention: land milestone-gqa-llama70b 4-panel sweep bench (ADR-0057 v1)
Compare 2 commits »
mukesh
pushed to
master
at
ywkang/kernbench2
2026-06-02 02:53:20 +00:00
222815d374
attention: add rank_axis kwarg to mesh kernels for multi_user cube ring
mukesh
pushed to
master
at
ywkang/kernbench2
2026-06-02 02:33:43 +00:00
d9e767d048
runtime_api: ctx.launch honors DPPolicy.num_cubes + adds _auto_dim_remap opt-out
mukesh
pushed to
master
at
ywkang/kernbench2
2026-06-02 02:15:16 +00:00
313dee503c
sim_engine: fix IPCQ slot-wrap snapshot race in Phase 2 replay
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-22 22:37:31 +00:00
b1d6fafd3a
eval: commit milestone bench output (track generated figures + results)
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-22 22:32:20 +00:00
cc1bbd0ab7
eval: fold GEMM/allreduce harnesses into self-contained milestone benches
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-21 18:07:48 +00:00
fd56b6cacd
adr: add ADR-0043/0044 (eval harnesses); reconcile ADR-0024/0032 for SIP w/h
0e346b939d
gemm: test-generated GEMM plots under tests/gemm/ + docs/diagrams/gemm_plots/
b610cb0d9a
sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/
Compare 3 commits »
mukesh
pushed to
sccl-distributed-allreduce
at
ywkang/kernbench2
2026-05-21 17:27:28 +00:00
fd56b6cacd
adr: add ADR-0043/0044 (eval harnesses); reconcile ADR-0024/0032 for SIP w/h
mukesh
pushed to
sccl-distributed-allreduce
at
ywkang/kernbench2
2026-05-21 16:58:56 +00:00
0e346b939d
gemm: test-generated GEMM plots under tests/gemm/ + docs/diagrams/gemm_plots/
mukesh
created branch
sccl-distributed-allreduce
in
ywkang/kernbench2
2026-05-21 05:30:43 +00:00
mukesh
pushed to
sccl-distributed-allreduce
at
ywkang/kernbench2
2026-05-21 05:30:43 +00:00
b610cb0d9a
sccl: drive allreduce tests via torch.distributed; reorganize into tests/sccl/
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-21 03:52:00 +00:00
ff7d727ddd
CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-15 17:17:30 +00:00
a7fe785e5f
tl.composite: fused epilogue ops with per-op scope
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-14 21:20:06 +00:00
f6d262e359
Honest measured pipeline efficiency: two timing fixes
mukesh
pushed to
master
at
ywkang/kernbench2
2026-05-13 22:03:13 +00:00
83ea97b05f
Composite GEMM: K-loop accumulator residency, pinned operands, sweep + deck
mukesh
pushed to
master
at
ywkang/kernbench2
2026-04-29 01:21:48 +00:00
5accd98171
Add deck builder + overview-with-ref diagram scripts
a563169e89
Add tl.recv_no_consume diagnostic API for apples-to-apples pe2pe plot
9c129d6131
ADR-0023 D9.7+: charge PE↔bank fabric hop for SRAM/HBM IPCQ slots
Compare 3 commits »
mukesh
pushed to
master
at
ywkang/kernbench2
2026-04-28 04:43:19 +00:00
54fcb7e4bc
Add tests/test_emit_ipcq_diagram.py (missed from earlier commit)
ad5f01ab13
Merge origin/master: combine single-cube fast path + center-root reduce
1c5752a9ec
Intercube allreduce: center root + bidirectional reduce
84a1325e5c
ADR-0023 D9.7: IPCQ slot-memory latency model (TCM/SRAM/HBM)
1e39214f89
Move generated diagrams to docs/diagrams/; add IPCQ diagram emitter
Compare 5 commits »
mukesh
pushed to
master
at
ywkang/kernbench2
2026-04-27 23:43:57 +00:00
46291bf91b
PE-to-PE latency: drop h5 inter-SIP panel from overview
04c912f53e
Allreduce sweep: parametrized + xdist parallelism + topology diagram
1c33afec55
ADR-0032 + intra_* opposite directions in IPCQ install
Compare 3 commits »
mukesh
pushed to
master
at
ywkang/kernbench2
2026-04-27 22:13:37 +00:00
e9cc40f74d
Rectangular SIP topology + 6-device allreduce sweep
c1a5cf3a2a
ADR-0009 D5: chain-aware target_start_ns + zero-byte launch fanout
90874abbfe
ADR-0023 D9: blocking credit-emit with full-path latency
Compare 3 commits »
mukesh
pushed to
master
at
ywkang/kernbench2
2026-04-27 17:16:47 +00:00
19dfc86dc3
Allreduce latency sweep across topologies and data sizes
Previous
1
Next