eval: commit milestone bench output (track generated figures + results)

Per request, the milestone bench output is now tracked in git instead of
gitignored, so the figures/results are viewable on the remote:

- src/kernbench/benches/1H_milestone_output/gemm/  (3 PNGs + gemm_sweep.json)
- src/kernbench/benches/1H_milestone_output/ccl/   (3 per-topology PNGs,
  buffer-kind PNG+CSV, FSIM comparison PNG, topology.png, summary.csv)

Drop the .gitignore rule; update ADR-0054 D3 + Negative (EN+KO) to say the
output is committed (regenerable by rerunning the bench). Artifacts produced
by full bench runs (milestone-1h-gemm non-FAST, milestone-1h-ccl).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 15:37:27 -07:00
parent cc1bbd0ab7
commit b1d6fafd3a
15 changed files with 1695 additions and 9 deletions
-3
View File
@@ -6,9 +6,6 @@
# Auto-generated mesh file # Auto-generated mesh file
cube_mesh.yaml cube_mesh.yaml
# Milestone bench output (regenerable: kernbench run --bench milestone-1h-*)
src/kernbench/benches/1H_milestone_output/
# Python # Python
__pycache__/ __pycache__/
*.py[cod] *.py[cod]
@@ -59,8 +59,8 @@ ADR-0045 D5는 bench를 단일 구성(single-SIP, 또는 ADR-0024 multi-SIP CCL
쓴다(사용자 요청 — bench 옆 아티팩트). 디렉터리는 생성된 PNG/CSV/JSON만 쓴다(사용자 요청 — bench 옆 아티팩트). 디렉터리는 생성된 PNG/CSV/JSON만
보유하며(`.py`/`__init__.py` 없음), 따라서 eager-import audit(ADR-0045 보유하며(`.py`/`__init__.py` 없음), 따라서 eager-import audit(ADR-0045
첫 동작)이 무시한다 — `pkgutil.iter_modules`는 비-패키지 하위 디렉터리를 첫 동작)이 무시한다 — `pkgutil.iter_modules`는 비-패키지 하위 디렉터리를
yield하지 않는다. committed `docs/diagrams/` 아티팩트와 달리 yield하지 않는다. `docs/diagrams/` 아티팩트처럼 **커밋된다**(원격에서
**git-ignore**된다(요청 시 재생성 가능). figure를 볼 수 있도록); bench 재실행 시 제자리에서 재생성된다.
### D4. GEMM 무거운 sweep — 기본은 fresh, `MILESTONE_FAST`로 재사용 ### D4. GEMM 무거운 sweep — 기본은 fresh, `MILESTONE_FAST`로 재사용
@@ -115,7 +115,8 @@ bench 실행이 곧 재생성이다. slow 경로는 `@pytest.mark.slow` bench
드로잉을 섞는다). 대부분 평가 하니스인 "bench"는 이례적이며, 본 ADR이 드로잉을 섞는다). 대부분 평가 하니스인 "bench"는 이례적이며, 본 ADR이
이를 정당화한다. 이를 정당화한다.
- 생성 아티팩트가 명시적 요청에 의해 source tree(`src/kernbench/benches/`) - 생성 아티팩트가 명시적 요청에 의해 source tree(`src/kernbench/benches/`)
안에 산다; 커밋을 피하려 git-ignore. 안에 살며 커밋된다(원격에서 figure를 볼 수 있도록); bench 재실행 시
재생성된다.
- `milestone-1h-ccl`(및 기본 `milestone-1h-gemm`)은 분 단위 소요 — - `milestone-1h-ccl`(및 기본 `milestone-1h-gemm`)은 분 단위 소요 —
on-demand 마일스톤 아티팩트에는 수용 가능, 일상 실행에는 아님. on-demand 마일스톤 아티팩트에는 수용 가능, 일상 실행에는 아님.
+5 -3
View File
@@ -61,8 +61,9 @@ Both benches write to `src/kernbench/benches/1H_milestone_output/{gemm,ccl}/`
(per user request — artifacts beside the bench). The directory holds only (per user request — artifacts beside the bench). The directory holds only
generated PNG/CSV/JSON (never a `.py`/`__init__.py`), so the eager-import generated PNG/CSV/JSON (never a `.py`/`__init__.py`), so the eager-import
audit (ADR-0045 first action) ignores it — `pkgutil.iter_modules` does not audit (ADR-0045 first action) ignores it — `pkgutil.iter_modules` does not
yield non-package subdirectories. It is **git-ignored** (regenerable on yield non-package subdirectories. It is **committed** (like the
demand), unlike the committed `docs/diagrams/` artifacts. `docs/diagrams/` artifacts) so the figures are viewable on the remote;
rerunning the bench regenerates it in place.
### D4. GEMM heavy sweep — fresh by default, `MILESTONE_FAST` to reuse ### D4. GEMM heavy sweep — fresh by default, `MILESTONE_FAST` to reuse
@@ -118,7 +119,8 @@ ADR-0045 D1).
sweeps, and matplotlib drawing). A "bench" that is mostly an eval harness sweeps, and matplotlib drawing). A "bench" that is mostly an eval harness
is unusual; this ADR legitimizes it. is unusual; this ADR legitimizes it.
- Generated artifacts live inside the source tree (`src/kernbench/benches/`) - Generated artifacts live inside the source tree (`src/kernbench/benches/`)
by explicit request; git-ignored to avoid committing them. by explicit request and are committed (so the figures are viewable on the
remote); rerunning the bench regenerates them.
- `milestone-1h-ccl` (and the default `milestone-1h-gemm`) take minutes — - `milestone-1h-ccl` (and the default `milestone-1h-gemm`) take minutes —
acceptable for an on-demand milestone artifact, not for routine runs. acceptable for an on-demand milestone artifact, not for routine runs.
Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

@@ -0,0 +1,13 @@
buffer_kind,sip_topology,n_sips,n_elem,bytes_per_pe,latency_ns
hbm,torus_2d,6,128,256,2120.040000000012
hbm,torus_2d,6,1024,2048,2717.2783333333473
hbm,torus_2d,6,8192,16384,7315.184999999989
hbm,torus_2d,6,32768,65536,23081.26500000037
sram,torus_2d,6,128,256,2060.040000000012
sram,torus_2d,6,1024,2048,2909.2783333333473
sram,torus_2d,6,8192,16384,9523.184999999869
sram,torus_2d,6,32768,65536,32201.265000000385
tcm,torus_2d,6,128,256,1964.040000000012
tcm,torus_2d,6,1024,2048,2477.2783333333473
tcm,torus_2d,6,8192,16384,6403.185000000109
tcm,torus_2d,6,32768,65536,19865.265000000378
1 buffer_kind sip_topology n_sips n_elem bytes_per_pe latency_ns
2 hbm torus_2d 6 128 256 2120.040000000012
3 hbm torus_2d 6 1024 2048 2717.2783333333473
4 hbm torus_2d 6 8192 16384 7315.184999999989
5 hbm torus_2d 6 32768 65536 23081.26500000037
6 sram torus_2d 6 128 256 2060.040000000012
7 sram torus_2d 6 1024 2048 2909.2783333333473
8 sram torus_2d 6 8192 16384 9523.184999999869
9 sram torus_2d 6 32768 65536 32201.265000000385
10 tcm torus_2d 6 128 256 1964.040000000012
11 tcm torus_2d 6 1024 2048 2477.2783333333473
12 tcm torus_2d 6 8192 16384 6403.185000000109
13 tcm torus_2d 6 32768 65536 19865.265000000378
Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

@@ -0,0 +1,37 @@
algorithm,sip_topology,n_sips,n_elem,bytes_per_pe,bytes_per_sip,latency_ns
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,8,16,256,2666.552500000015
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,32,64,1024,2747.7400000000152
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,64,128,2048,2855.990000000018
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,128,256,4096,3072.490000000019
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,512,1024,16384,3337.1133333333582
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,1024,2048,32768,3708.0333333333692
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,2048,4096,65536,4449.873333333393
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,4096,8192,131072,5933.020000000124
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,8192,16384,262144,8900.379999999863
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,16384,32768,524288,14835.099999999224
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,32768,65536,1048576,26704.540000000765
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,49152,98304,1572864,38573.97999999701
lrab_hierarchical_allreduce,ring_1d,6,8,16,256,2365.255833333347
lrab_hierarchical_allreduce,ring_1d,6,32,64,1024,2436.9433333333473
lrab_hierarchical_allreduce,ring_1d,6,64,128,2048,2532.526666666683
lrab_hierarchical_allreduce,ring_1d,6,128,256,4096,2723.693333333349
lrab_hierarchical_allreduce,ring_1d,6,512,1024,16384,3048.635000000021
lrab_hierarchical_allreduce,ring_1d,6,1024,2048,32768,3393.4016666666957
lrab_hierarchical_allreduce,ring_1d,6,2048,4096,65536,4082.401666666714
lrab_hierarchical_allreduce,ring_1d,6,4096,8192,131072,5458.80166666677
lrab_hierarchical_allreduce,ring_1d,6,8192,16384,262144,8216.934999999943
lrab_hierarchical_allreduce,ring_1d,6,16384,32768,524288,13733.201666665835
lrab_hierarchical_allreduce,ring_1d,6,32768,65536,1048576,24765.73500000064
lrab_hierarchical_allreduce,ring_1d,6,49152,98304,1572864,35798.268333331536
lrab_hierarchical_allreduce,torus_2d,6,8,16,256,1700.6025000000095
lrab_hierarchical_allreduce,torus_2d,6,32,64,1024,1753.2900000000102
lrab_hierarchical_allreduce,torus_2d,6,64,128,2048,1823.540000000012
lrab_hierarchical_allreduce,torus_2d,6,128,256,4096,1964.040000000012
lrab_hierarchical_allreduce,torus_2d,6,512,1024,16384,2196.8183333333463
lrab_hierarchical_allreduce,torus_2d,6,1024,2048,32768,2477.2783333333473
lrab_hierarchical_allreduce,torus_2d,6,2048,4096,65536,3038.1983333333583
lrab_hierarchical_allreduce,torus_2d,6,4096,8192,131072,4159.5050000000665
lrab_hierarchical_allreduce,torus_2d,6,8192,16384,262144,6403.185000000109
lrab_hierarchical_allreduce,torus_2d,6,16384,32768,524288,10890.5449999995
lrab_hierarchical_allreduce,torus_2d,6,32768,65536,1048576,19865.265000000378
lrab_hierarchical_allreduce,torus_2d,6,49152,98304,1572864,28839.98500000059
1 algorithm sip_topology n_sips n_elem bytes_per_pe bytes_per_sip latency_ns
2 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 8 16 256 2666.552500000015
3 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 32 64 1024 2747.7400000000152
4 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 64 128 2048 2855.990000000018
5 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 128 256 4096 3072.490000000019
6 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 512 1024 16384 3337.1133333333582
7 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 1024 2048 32768 3708.0333333333692
8 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 2048 4096 65536 4449.873333333393
9 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 4096 8192 131072 5933.020000000124
10 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 8192 16384 262144 8900.379999999863
11 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 16384 32768 524288 14835.099999999224
12 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 32768 65536 1048576 26704.540000000765
13 lrab_hierarchical_allreduce mesh_2d_no_wrap 6 49152 98304 1572864 38573.97999999701
14 lrab_hierarchical_allreduce ring_1d 6 8 16 256 2365.255833333347
15 lrab_hierarchical_allreduce ring_1d 6 32 64 1024 2436.9433333333473
16 lrab_hierarchical_allreduce ring_1d 6 64 128 2048 2532.526666666683
17 lrab_hierarchical_allreduce ring_1d 6 128 256 4096 2723.693333333349
18 lrab_hierarchical_allreduce ring_1d 6 512 1024 16384 3048.635000000021
19 lrab_hierarchical_allreduce ring_1d 6 1024 2048 32768 3393.4016666666957
20 lrab_hierarchical_allreduce ring_1d 6 2048 4096 65536 4082.401666666714
21 lrab_hierarchical_allreduce ring_1d 6 4096 8192 131072 5458.80166666677
22 lrab_hierarchical_allreduce ring_1d 6 8192 16384 262144 8216.934999999943
23 lrab_hierarchical_allreduce ring_1d 6 16384 32768 524288 13733.201666665835
24 lrab_hierarchical_allreduce ring_1d 6 32768 65536 1048576 24765.73500000064
25 lrab_hierarchical_allreduce ring_1d 6 49152 98304 1572864 35798.268333331536
26 lrab_hierarchical_allreduce torus_2d 6 8 16 256 1700.6025000000095
27 lrab_hierarchical_allreduce torus_2d 6 32 64 1024 1753.2900000000102
28 lrab_hierarchical_allreduce torus_2d 6 64 128 2048 1823.540000000012
29 lrab_hierarchical_allreduce torus_2d 6 128 256 4096 1964.040000000012
30 lrab_hierarchical_allreduce torus_2d 6 512 1024 16384 2196.8183333333463
31 lrab_hierarchical_allreduce torus_2d 6 1024 2048 32768 2477.2783333333473
32 lrab_hierarchical_allreduce torus_2d 6 2048 4096 65536 3038.1983333333583
33 lrab_hierarchical_allreduce torus_2d 6 4096 8192 131072 4159.5050000000665
34 lrab_hierarchical_allreduce torus_2d 6 8192 16384 262144 6403.185000000109
35 lrab_hierarchical_allreduce torus_2d 6 16384 32768 524288 10890.5449999995
36 lrab_hierarchical_allreduce torus_2d 6 32768 65536 1048576 19865.265000000378
37 lrab_hierarchical_allreduce torus_2d 6 49152 98304 1572864 28839.98500000059
Binary file not shown.

After

Width:  |  Height:  |  Size: 194 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

File diff suppressed because it is too large Load Diff