eval: commit milestone bench output (track generated figures + results)
Per request, the milestone bench output is now tracked in git instead of gitignored, so the figures/results are viewable on the remote: - src/kernbench/benches/1H_milestone_output/gemm/ (3 PNGs + gemm_sweep.json) - src/kernbench/benches/1H_milestone_output/ccl/ (3 per-topology PNGs, buffer-kind PNG+CSV, FSIM comparison PNG, topology.png, summary.csv) Drop the .gitignore rule; update ADR-0054 D3 + Negative (EN+KO) to say the output is committed (regenerable by rerunning the bench). Artifacts produced by full bench runs (milestone-1h-gemm non-FAST, milestone-1h-ccl). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@@ -6,9 +6,6 @@
|
||||
# Auto-generated mesh file
|
||||
cube_mesh.yaml
|
||||
|
||||
# Milestone bench output (regenerable: kernbench run --bench milestone-1h-*)
|
||||
src/kernbench/benches/1H_milestone_output/
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
|
||||
@@ -59,8 +59,8 @@ ADR-0045 D5는 bench를 단일 구성(single-SIP, 또는 ADR-0024 multi-SIP CCL
|
||||
쓴다(사용자 요청 — bench 옆 아티팩트). 디렉터리는 생성된 PNG/CSV/JSON만
|
||||
보유하며(`.py`/`__init__.py` 없음), 따라서 eager-import audit(ADR-0045
|
||||
첫 동작)이 무시한다 — `pkgutil.iter_modules`는 비-패키지 하위 디렉터리를
|
||||
yield하지 않는다. committed `docs/diagrams/` 아티팩트와 달리
|
||||
**git-ignore**된다(요청 시 재생성 가능).
|
||||
yield하지 않는다. `docs/diagrams/` 아티팩트처럼 **커밋된다**(원격에서
|
||||
figure를 볼 수 있도록); bench 재실행 시 제자리에서 재생성된다.
|
||||
|
||||
### D4. GEMM 무거운 sweep — 기본은 fresh, `MILESTONE_FAST`로 재사용
|
||||
|
||||
@@ -115,7 +115,8 @@ bench 실행이 곧 재생성이다. slow 경로는 `@pytest.mark.slow` bench
|
||||
드로잉을 섞는다). 대부분 평가 하니스인 "bench"는 이례적이며, 본 ADR이
|
||||
이를 정당화한다.
|
||||
- 생성 아티팩트가 명시적 요청에 의해 source tree(`src/kernbench/benches/`)
|
||||
안에 산다; 커밋을 피하려 git-ignore.
|
||||
안에 살며 커밋된다(원격에서 figure를 볼 수 있도록); bench 재실행 시
|
||||
재생성된다.
|
||||
- `milestone-1h-ccl`(및 기본 `milestone-1h-gemm`)은 분 단위 소요 —
|
||||
on-demand 마일스톤 아티팩트에는 수용 가능, 일상 실행에는 아님.
|
||||
|
||||
|
||||
@@ -61,8 +61,9 @@ Both benches write to `src/kernbench/benches/1H_milestone_output/{gemm,ccl}/`
|
||||
(per user request — artifacts beside the bench). The directory holds only
|
||||
generated PNG/CSV/JSON (never a `.py`/`__init__.py`), so the eager-import
|
||||
audit (ADR-0045 first action) ignores it — `pkgutil.iter_modules` does not
|
||||
yield non-package subdirectories. It is **git-ignored** (regenerable on
|
||||
demand), unlike the committed `docs/diagrams/` artifacts.
|
||||
yield non-package subdirectories. It is **committed** (like the
|
||||
`docs/diagrams/` artifacts) so the figures are viewable on the remote;
|
||||
rerunning the bench regenerates it in place.
|
||||
|
||||
### D4. GEMM heavy sweep — fresh by default, `MILESTONE_FAST` to reuse
|
||||
|
||||
@@ -118,7 +119,8 @@ ADR-0045 D1).
|
||||
sweeps, and matplotlib drawing). A "bench" that is mostly an eval harness
|
||||
is unusual; this ADR legitimizes it.
|
||||
- Generated artifacts live inside the source tree (`src/kernbench/benches/`)
|
||||
by explicit request; git-ignored to avoid committing them.
|
||||
by explicit request and are committed (so the figures are viewable on the
|
||||
remote); rerunning the bench regenerates them.
|
||||
- `milestone-1h-ccl` (and the default `milestone-1h-gemm`) take minutes —
|
||||
acceptable for an on-demand milestone artifact, not for routine runs.
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 38 KiB |
|
After Width: | Height: | Size: 36 KiB |
@@ -0,0 +1,13 @@
|
||||
buffer_kind,sip_topology,n_sips,n_elem,bytes_per_pe,latency_ns
|
||||
hbm,torus_2d,6,128,256,2120.040000000012
|
||||
hbm,torus_2d,6,1024,2048,2717.2783333333473
|
||||
hbm,torus_2d,6,8192,16384,7315.184999999989
|
||||
hbm,torus_2d,6,32768,65536,23081.26500000037
|
||||
sram,torus_2d,6,128,256,2060.040000000012
|
||||
sram,torus_2d,6,1024,2048,2909.2783333333473
|
||||
sram,torus_2d,6,8192,16384,9523.184999999869
|
||||
sram,torus_2d,6,32768,65536,32201.265000000385
|
||||
tcm,torus_2d,6,128,256,1964.040000000012
|
||||
tcm,torus_2d,6,1024,2048,2477.2783333333473
|
||||
tcm,torus_2d,6,8192,16384,6403.185000000109
|
||||
tcm,torus_2d,6,32768,65536,19865.265000000378
|
||||
|
|
After Width: | Height: | Size: 75 KiB |
|
After Width: | Height: | Size: 37 KiB |
|
After Width: | Height: | Size: 86 KiB |
@@ -0,0 +1,37 @@
|
||||
algorithm,sip_topology,n_sips,n_elem,bytes_per_pe,bytes_per_sip,latency_ns
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,8,16,256,2666.552500000015
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,32,64,1024,2747.7400000000152
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,64,128,2048,2855.990000000018
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,128,256,4096,3072.490000000019
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,512,1024,16384,3337.1133333333582
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,1024,2048,32768,3708.0333333333692
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,2048,4096,65536,4449.873333333393
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,4096,8192,131072,5933.020000000124
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,8192,16384,262144,8900.379999999863
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,16384,32768,524288,14835.099999999224
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,32768,65536,1048576,26704.540000000765
|
||||
lrab_hierarchical_allreduce,mesh_2d_no_wrap,6,49152,98304,1572864,38573.97999999701
|
||||
lrab_hierarchical_allreduce,ring_1d,6,8,16,256,2365.255833333347
|
||||
lrab_hierarchical_allreduce,ring_1d,6,32,64,1024,2436.9433333333473
|
||||
lrab_hierarchical_allreduce,ring_1d,6,64,128,2048,2532.526666666683
|
||||
lrab_hierarchical_allreduce,ring_1d,6,128,256,4096,2723.693333333349
|
||||
lrab_hierarchical_allreduce,ring_1d,6,512,1024,16384,3048.635000000021
|
||||
lrab_hierarchical_allreduce,ring_1d,6,1024,2048,32768,3393.4016666666957
|
||||
lrab_hierarchical_allreduce,ring_1d,6,2048,4096,65536,4082.401666666714
|
||||
lrab_hierarchical_allreduce,ring_1d,6,4096,8192,131072,5458.80166666677
|
||||
lrab_hierarchical_allreduce,ring_1d,6,8192,16384,262144,8216.934999999943
|
||||
lrab_hierarchical_allreduce,ring_1d,6,16384,32768,524288,13733.201666665835
|
||||
lrab_hierarchical_allreduce,ring_1d,6,32768,65536,1048576,24765.73500000064
|
||||
lrab_hierarchical_allreduce,ring_1d,6,49152,98304,1572864,35798.268333331536
|
||||
lrab_hierarchical_allreduce,torus_2d,6,8,16,256,1700.6025000000095
|
||||
lrab_hierarchical_allreduce,torus_2d,6,32,64,1024,1753.2900000000102
|
||||
lrab_hierarchical_allreduce,torus_2d,6,64,128,2048,1823.540000000012
|
||||
lrab_hierarchical_allreduce,torus_2d,6,128,256,4096,1964.040000000012
|
||||
lrab_hierarchical_allreduce,torus_2d,6,512,1024,16384,2196.8183333333463
|
||||
lrab_hierarchical_allreduce,torus_2d,6,1024,2048,32768,2477.2783333333473
|
||||
lrab_hierarchical_allreduce,torus_2d,6,2048,4096,65536,3038.1983333333583
|
||||
lrab_hierarchical_allreduce,torus_2d,6,4096,8192,131072,4159.5050000000665
|
||||
lrab_hierarchical_allreduce,torus_2d,6,8192,16384,262144,6403.185000000109
|
||||
lrab_hierarchical_allreduce,torus_2d,6,16384,32768,524288,10890.5449999995
|
||||
lrab_hierarchical_allreduce,torus_2d,6,32768,65536,1048576,19865.265000000378
|
||||
lrab_hierarchical_allreduce,torus_2d,6,49152,98304,1572864,28839.98500000059
|
||||
|
|
After Width: | Height: | Size: 194 KiB |
|
After Width: | Height: | Size: 40 KiB |
|
After Width: | Height: | Size: 45 KiB |
|
After Width: | Height: | Size: 45 KiB |