CCL allreduce: rename to lrab_hierarchical_allreduce + descriptive plots
Rename the intercube all-reduce identity to lrab_hierarchical_allreduce (module, config key, distributed test) so the name reflects both levels it implements: LRAB intra-SIP (local reduce to center root + broadcast) and the hierarchical inter-SIP topology exchange (ring/torus/mesh). ADR-0032 slug kept as the stable decision id; pure rename, no logic change. Also in this batch: - ADR-0032 (EN+KO): document the shipped center-root bidirectional reduce (doc was stale corner-root); annotate ccl.yaml root_cube as a placeholder. - Rename allreduce + pe2pe latency plots to descriptive, title-matching filenames and retitle the in-plot headings; drop overview/overview_log. - Point the PPTX image refs at the new plot names. Doc + derived-artifact + rename only; no simulation behavior changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@@ -31,7 +31,7 @@ pe0만의 same-lane 큐브 간 reduce**, 그 다음 루트 큐브에서 SIP 간
|
||||
|
||||
### 현재 상태
|
||||
|
||||
- `src/kernbench/ccl/algorithms/intercube_allreduce.py` — 커널
|
||||
- `src/kernbench/ccl/algorithms/lrab_hierarchical_allreduce.py` — 커널
|
||||
- `src/kernbench/ccl/sfr_config.py` — `configure_sfr_intercube_multisip`
|
||||
- `src/kernbench/runtime_api/distributed.py` — `AhbmCCLBackend`가
|
||||
`init_process_group` 시점에 자동으로 와이어링한다.
|
||||
@@ -42,29 +42,46 @@ pe0만의 same-lane 큐브 간 reduce**, 그 다음 루트 큐브에서 SIP 간
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. 알고리즘 구조 — 5단계
|
||||
### D1. 알고리즘 구조 — 5단계 (center-root, 양방향)
|
||||
|
||||
루트 큐브는 큐브 메시의 기하학적 **중심**에 위치한다:
|
||||
|
||||
```
|
||||
root_col = cube_w // 2
|
||||
root_row = cube_h // 2
|
||||
root_cube = root_row * cube_w + root_col # 중심; 4×4 메시에서 10
|
||||
```
|
||||
|
||||
각 reduce/broadcast 단계는 이 중심을 향해 **양방향으로** 수렴/발산하여,
|
||||
corner-root 워크 대비 SIP 내부 임계 경로를 절반으로 줄인다 (4×4 메시:
|
||||
reduce 4홉 + broadcast 4홉 vs SE-코너 루트의 6+6).
|
||||
|
||||
각 SIP에 대해 (`mp.spawn`으로 동시에 launch):
|
||||
|
||||
```
|
||||
Phase 1 — Row reduce W → E (큐브 메시, pe0만):
|
||||
col=0이 E로 송신 → col=1이 누적, E로 송신 → ... → col=3이 row sum 보유.
|
||||
Phase 1 — col == root_col에서 수렴하는 Row reduce (큐브 메시, pe0만):
|
||||
좌측 절반(col < root_col)은 W→E로, 우측 절반(col > root_col)은
|
||||
E→W로 진행; root_col 큐브가 양쪽을 병합 → row sum 보유.
|
||||
|
||||
Phase 2 — 최우측 열에서 Col reduce N → S (pe0, col = mesh_w-1):
|
||||
row=0이 S로 송신 → row=1이 누적, S로 송신 → ... → 루트 큐브 (15)가
|
||||
전체 SIP sum 보유.
|
||||
Phase 2 — col == root_col에서 row == root_row로 수렴하는 Col reduce:
|
||||
위쪽(row < root_row)은 N→S로, 아래쪽(row > root_row)은 S→N로 진행;
|
||||
루트 큐브가 양쪽을 병합 → 전체 SIP sum 보유.
|
||||
|
||||
Phase 3 — 루트 큐브에서 SIP 간 교환 (루트 큐브의 pe0만):
|
||||
Phase 3 — cube_id == root_cube에서 SIP 간 교환 (pe0만):
|
||||
Ring / torus-2d row+col ring / mesh-2d chain reduce+broadcast —
|
||||
sip_topo_kind(topology.yaml의 sips.topology)로 선택.
|
||||
|
||||
Phase 4 — 최우측 열에서 Col 브로드캐스트 S → N.
|
||||
Phase 4 — col == root_col에서 root_row로부터 바깥쪽으로 Col 브로드캐스트.
|
||||
|
||||
Phase 5 — 큐브 메시 전반에 걸친 Row 브로드캐스트 E → W.
|
||||
Phase 5 — root_col로부터 바깥쪽으로 큐브 메시 전반에 Row 브로드캐스트.
|
||||
```
|
||||
|
||||
모든 단계가 끝나면 모든 큐브의 pe0이 전역 sum을 보유한다.
|
||||
|
||||
**단일 큐브 fast-path**: `cube_w == cube_h == 1`(rank당 큐브 하나, 일반적인
|
||||
TP 케이스)인 경우 SIP 내부 reduce/broadcast 단계를 건너뛰고 곧바로
|
||||
Phase 3 SIP 간 교환으로 진행한다.
|
||||
|
||||
커널은 `sip_topo_kind ∈ {0, 1, 2}`(ring_1d, torus_2d, mesh_2d_no_wrap)로
|
||||
파라미터화된 단일 함수이다. Phase 1-2와 4-5는 토폴로지 전반에서 동일하며,
|
||||
phase 3만 분기한다. 헬퍼 함수 `_inter_sip_ring`, `_inter_sip_torus_2d`,
|
||||
@@ -152,17 +169,19 @@ system:
|
||||
|
||||
```yaml
|
||||
defaults:
|
||||
algorithm: intercube_allreduce
|
||||
algorithm: lrab_hierarchical_allreduce
|
||||
buffer_kind: tcm
|
||||
...
|
||||
|
||||
algorithms:
|
||||
intercube_allreduce:
|
||||
module: kernbench.ccl.algorithms.intercube_allreduce
|
||||
lrab_hierarchical_allreduce:
|
||||
module: kernbench.ccl.algorithms.lrab_hierarchical_allreduce
|
||||
topology: none
|
||||
buffer_kind: tcm
|
||||
n_elem: 8
|
||||
root_cube: 15
|
||||
root_cube: 15 # 현재 사용되지 않음 — 커널이 루트를 기하학적 중심으로
|
||||
# 동적으로 선출한다 (D1 참조). 향후 명시적 루트 override /
|
||||
# 런타임 선출 훅을 위한 placeholder로 유지한다.
|
||||
```
|
||||
|
||||
`topology.yaml`:
|
||||
@@ -205,10 +224,11 @@ sip:
|
||||
- **비대칭 SIP 토폴로지** (정사각형이 아닌 메시/토러스).
|
||||
`torus_2d`와 `mesh_2d_no_wrap`은 `n_sips = k²`를 요구한다.
|
||||
- **파이프라인 청크**: 큐브당 단일 타일, 아직 파이프라이닝 없음.
|
||||
- **루트 큐브의 런타임 선출**: 커널은 현재 SE 코너로 하드코딩된
|
||||
`root_cube = (mesh_h - 1) * mesh_w + (mesh_w - 1)`을 사용한다. SFR
|
||||
와이어링이 모든 큐브를 커버하므로, 필요해질 때 런타임 선출은 순수
|
||||
커널 변경이다.
|
||||
- **루트 큐브의 런타임 선출**: 커널은 현재 SIP 내부 임계 경로를
|
||||
최소화하기 위해 기하학적 중심인
|
||||
`root_cube = (mesh_h // 2) * mesh_w + (mesh_w // 2)`을 사용한다. SFR
|
||||
와이어링이 모든 큐브를 커버하므로, 필요해질 때 다른 루트를 런타임에
|
||||
선출하는 것은 순수 커널 변경이다.
|
||||
|
||||
---
|
||||
|
||||
@@ -241,15 +261,15 @@ sip:
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `src/kernbench/ccl/algorithms/intercube_allreduce.py` (신규) | 커널 + `_inter_sip_*` 헬퍼 + `TOPO_NAME_TO_KIND` |
|
||||
| `src/kernbench/ccl/algorithms/lrab_hierarchical_allreduce.py` (신규) | 커널 + `_inter_sip_*` 헬퍼 + `TOPO_NAME_TO_KIND` |
|
||||
| `src/kernbench/ccl/sfr_config.py` (신규) | `configure_sfr_intercube_multisip` |
|
||||
| `src/kernbench/ccl/topologies.py` | `torus_2d`, `mesh_2d_no_wrap` 추가 |
|
||||
| `src/kernbench/ccl/install.py` | `_OPPOSITE_DIR`을 `global_*` 쌍으로 확장 |
|
||||
| `src/kernbench/runtime_api/distributed.py` | `AhbmCCLBackend`가 `configure_sfr_intercube_multisip` 사용 + sip_rank/topo 인자 추가 |
|
||||
| `ccl.yaml` | 단일 `intercube_allreduce` 항목 |
|
||||
| `ccl.yaml` | 단일 `lrab_hierarchical_allreduce` 항목 |
|
||||
| `topology.yaml` | `system.sips.topology` 추가 |
|
||||
| `benches/ccl_allreduce.py` | Row-wise 큐브-메시 텐서 레이아웃 |
|
||||
| `tests/test_allreduce_multidevice.py` (신규) | 구성 기반 ring/torus/mesh |
|
||||
| `tests/test_distributed_intercube_allreduce.py` (신규) | 전체 `dist.all_reduce` 경로 |
|
||||
| `tests/test_distributed_lrab_hierarchical_allreduce.py` (신규) | 전체 `dist.all_reduce` 경로 |
|
||||
| `tests/test_intercube_sfr_config.py` (신규) | SFR 와이어링 검증 |
|
||||
| 제거 | `ring_allreduce.py`, `mesh_allreduce.py`, `tree_allreduce.py`, `hierarchical_allreduce.py`, `hello_send.py`, `testing.py` 및 그 테스트 |
|
||||
|
||||
@@ -32,7 +32,7 @@ bandwidth characteristics for the common per-cube DP workload.
|
||||
|
||||
### Current state
|
||||
|
||||
- `src/kernbench/ccl/algorithms/intercube_allreduce.py` — kernel
|
||||
- `src/kernbench/ccl/algorithms/lrab_hierarchical_allreduce.py` — kernel
|
||||
- `src/kernbench/ccl/sfr_config.py` — `configure_sfr_intercube_multisip`
|
||||
- `src/kernbench/runtime_api/distributed.py` — `AhbmCCLBackend` wires this
|
||||
automatically at `init_process_group` time.
|
||||
@@ -43,29 +43,46 @@ bandwidth characteristics for the common per-cube DP workload.
|
||||
|
||||
## Decision
|
||||
|
||||
### D1. Algorithm structure — 5 phases
|
||||
### D1. Algorithm structure — 5 phases (center-root, bidirectional)
|
||||
|
||||
The root cube sits at the geometric **center** of the cube mesh:
|
||||
|
||||
```
|
||||
root_col = cube_w // 2
|
||||
root_row = cube_h // 2
|
||||
root_cube = root_row * cube_w + root_col # center; 10 on a 4×4 mesh
|
||||
```
|
||||
|
||||
Each reduce/broadcast phase converges/diverges **bidirectionally** toward
|
||||
this center, halving the intra-SIP critical path versus a corner-root walk
|
||||
(4×4 mesh: 4 hops reduce + 4 hops broadcast vs 6+6 with an SE-corner root).
|
||||
|
||||
For each SIP (launched concurrently by `mp.spawn`):
|
||||
|
||||
```
|
||||
Phase 1 — Row reduce W → E (cube mesh, pe0 only):
|
||||
col=0 sends E → col=1 accumulates, sends E → ... → col=3 holds row sum.
|
||||
Phase 1 — Row reduce converging at col == root_col (cube mesh, pe0 only):
|
||||
left half (col < root_col) walks W→E; right half (col > root_col)
|
||||
walks E→W; the root_col cube merges both sides → holds row sum.
|
||||
|
||||
Phase 2 — Col reduce N → S on rightmost column (pe0, col = mesh_w-1):
|
||||
row=0 sends S → row=1 accumulates, sends S → ... → root cube (15)
|
||||
holds the full SIP sum.
|
||||
Phase 2 — Col reduce on col == root_col converging at row == root_row:
|
||||
above (row < root_row) walks N→S; below (row > root_row) walks S→N;
|
||||
the root cube merges both → holds the full SIP sum.
|
||||
|
||||
Phase 3 — Inter-SIP exchange on root cube (pe0 of root cube only):
|
||||
Phase 3 — Inter-SIP exchange on cube_id == root_cube (pe0 only):
|
||||
Ring / torus-2d row+col ring / mesh-2d chain reduce+broadcast —
|
||||
selected by sip_topo_kind (from topology.yaml sips.topology).
|
||||
|
||||
Phase 4 — Col broadcast S → N on rightmost column.
|
||||
Phase 4 — Col broadcast on col == root_col, outward from root_row.
|
||||
|
||||
Phase 5 — Row broadcast E → W across the cube mesh.
|
||||
Phase 5 — Row broadcast outward from root_col across the cube mesh.
|
||||
```
|
||||
|
||||
After all phases every cube's pe0 holds the global sum.
|
||||
|
||||
**Single-cube fast-path**: when `cube_w == cube_h == 1` (one cube per rank,
|
||||
the common TP case), the intra-SIP reduce/broadcast phases are skipped and
|
||||
the kernel goes straight to the Phase 3 inter-SIP exchange.
|
||||
|
||||
The kernel is a single function parameterised by `sip_topo_kind ∈ {0, 1, 2}`
|
||||
(ring_1d, torus_2d, mesh_2d_no_wrap). Phases 1-2 and 4-5 are identical
|
||||
across topologies; only phase 3 branches. Helper functions
|
||||
@@ -154,17 +171,19 @@ At each `dist.all_reduce(tensor)` call:
|
||||
|
||||
```yaml
|
||||
defaults:
|
||||
algorithm: intercube_allreduce
|
||||
algorithm: lrab_hierarchical_allreduce
|
||||
buffer_kind: tcm
|
||||
...
|
||||
|
||||
algorithms:
|
||||
intercube_allreduce:
|
||||
module: kernbench.ccl.algorithms.intercube_allreduce
|
||||
lrab_hierarchical_allreduce:
|
||||
module: kernbench.ccl.algorithms.lrab_hierarchical_allreduce
|
||||
topology: none
|
||||
buffer_kind: tcm
|
||||
n_elem: 8
|
||||
root_cube: 15
|
||||
root_cube: 15 # NOT read today — the kernel elects the root dynamically
|
||||
# as the geometric center (see D1). Kept as a placeholder
|
||||
# for a future explicit-root override / runtime election.
|
||||
```
|
||||
|
||||
`topology.yaml`:
|
||||
@@ -207,9 +226,10 @@ Modules loaded via `cfg["module"]` must export:
|
||||
`mesh_2d_no_wrap` require `n_sips = k²`.
|
||||
- **Pipelined chunks**: single-tile per cube, no pipelining yet.
|
||||
- **Root cube runtime election**: the kernel currently uses
|
||||
`root_cube = (mesh_h - 1) * mesh_w + (mesh_w - 1)` hardcoded to the SE
|
||||
corner. SFR wiring covers all cubes, so runtime election is a pure kernel
|
||||
change when needed.
|
||||
`root_cube = (mesh_h // 2) * mesh_w + (mesh_w // 2)` — the geometric
|
||||
center, chosen to minimize the intra-SIP critical path. SFR wiring
|
||||
covers all cubes, so electing a different root at runtime is a pure
|
||||
kernel change when needed.
|
||||
|
||||
---
|
||||
|
||||
@@ -242,15 +262,15 @@ Modules loaded via `cfg["module"]` must export:
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `src/kernbench/ccl/algorithms/intercube_allreduce.py` (new) | Kernel + `_inter_sip_*` helpers + `TOPO_NAME_TO_KIND` |
|
||||
| `src/kernbench/ccl/algorithms/lrab_hierarchical_allreduce.py` (new) | Kernel + `_inter_sip_*` helpers + `TOPO_NAME_TO_KIND` |
|
||||
| `src/kernbench/ccl/sfr_config.py` (new) | `configure_sfr_intercube_multisip` |
|
||||
| `src/kernbench/ccl/topologies.py` | Added `torus_2d`, `mesh_2d_no_wrap` |
|
||||
| `src/kernbench/ccl/install.py` | Extended `_OPPOSITE_DIR` with `global_*` pairs |
|
||||
| `src/kernbench/runtime_api/distributed.py` | `AhbmCCLBackend` uses `configure_sfr_intercube_multisip` + appends sip_rank/topo args |
|
||||
| `ccl.yaml` | Single `intercube_allreduce` entry |
|
||||
| `ccl.yaml` | Single `lrab_hierarchical_allreduce` entry |
|
||||
| `topology.yaml` | Added `system.sips.topology` |
|
||||
| `benches/ccl_allreduce.py` | Row-wise cube-mesh tensor layout |
|
||||
| `tests/test_allreduce_multidevice.py` (new) | Config-driven ring/torus/mesh |
|
||||
| `tests/test_distributed_intercube_allreduce.py` (new) | Full `dist.all_reduce` path |
|
||||
| `tests/test_distributed_lrab_hierarchical_allreduce.py` (new) | Full `dist.all_reduce` path |
|
||||
| `tests/test_intercube_sfr_config.py` (new) | SFR wiring verification |
|
||||
| Removed | `ring_allreduce.py`, `mesh_allreduce.py`, `tree_allreduce.py`, `hierarchical_allreduce.py`, `hello_send.py`, `testing.py` and their tests |
|
||||
|
||||
|
After Width: | Height: | Size: 38 KiB |
|
After Width: | Height: | Size: 36 KiB |
|
After Width: | Height: | Size: 75 KiB |
|
After Width: | Height: | Size: 37 KiB |
|
Before Width: | Height: | Size: 75 KiB |
|
After Width: | Height: | Size: 86 KiB |
|
Before Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 79 KiB |
|
Before Width: | Height: | Size: 80 KiB |
|
Before Width: | Height: | Size: 75 KiB |
|
Before Width: | Height: | Size: 37 KiB |
|
Before Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 46 KiB |
|
Before Width: | Height: | Size: 51 KiB |
|
Before Width: | Height: | Size: 43 KiB |
|
After Width: | Height: | Size: 53 KiB |
|
After Width: | Height: | Size: 52 KiB |
|
After Width: | Height: | Size: 51 KiB |
|
After Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 135 KiB After Width: | Height: | Size: 137 KiB |
@@ -1,81 +1,81 @@
|
||||
hop,label,size_bytes,path,total_ns
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),128,ipcq,24.88749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),128,raw,33.57999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),256,ipcq,28.13749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),256,raw,36.07999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),384,ipcq,29.88749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),384,raw,37.07999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),512,ipcq,31.63749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),512,raw,38.07999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),768,ipcq,35.13749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),768,raw,40.07999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),1024,ipcq,38.63749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),1024,raw,42.07999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),2048,ipcq,52.63749999999891
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),2048,raw,50.07999999999811
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),4096,ipcq,80.63750000000073
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),4096,raw,66.08000000000175
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),8192,ipcq,136.63750000000073
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),8192,raw,98.08000000000175
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),10240,ipcq,164.63750000000073
|
||||
h1_intra_horizontal,Intra-cube horizontal (pe0 to pe1),10240,raw,114.08000000000175
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),128,ipcq,38.49749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),128,raw,47.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),256,ipcq,43.24749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),256,raw,51.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),384,ipcq,44.99749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),384,raw,52.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),512,ipcq,46.74749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),512,raw,53.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),768,ipcq,50.24749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),768,raw,55.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),1024,ipcq,53.74749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),1024,raw,57.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),2048,ipcq,67.74749999999585
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),2048,raw,65.18999999999505
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),4096,ipcq,95.74750000000131
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),4096,raw,81.19000000000233
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),8192,ipcq,151.7475000000013
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),8192,raw,113.19000000000233
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),10240,ipcq,179.7475000000013
|
||||
h2_intra_vertical,Intra-cube vertical (pe0 to pe4),10240,raw,129.19000000000233
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),128,ipcq,81.15999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),128,raw,89.28999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),256,ipcq,88.65999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),256,raw,95.53999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),384,ipcq,90.90999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),384,raw,96.53999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),512,ipcq,93.15999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),512,raw,97.53999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),768,ipcq,97.65999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),768,raw,99.53999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),1024,ipcq,103.15999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),1024,raw,102.53999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),2048,ipcq,125.15999999999804
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),2048,raw,114.53999999999724
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),4096,ipcq,169.15999999999985
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),4096,raw,138.54000000000087
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),8192,ipcq,257.15999999999985
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),8192,raw,186.54000000000087
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),10240,ipcq,301.15999999999985
|
||||
h3_inter_cube_horizontal,Inter-cube horizontal (cube0 to cube1),10240,raw,210.54000000000087
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),128,ipcq,103.15999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),128,raw,111.28999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),256,ipcq,112.65999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),256,raw,119.53999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),384,ipcq,114.90999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),384,raw,120.53999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),512,ipcq,117.15999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),512,raw,121.53999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),768,ipcq,121.65999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),768,raw,123.53999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),1024,ipcq,127.15999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),1024,raw,126.53999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),2048,ipcq,149.15999999999804
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),2048,raw,138.53999999999724
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),4096,ipcq,193.15999999999985
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),4096,raw,162.54000000000087
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),8192,ipcq,281.15999999999985
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),8192,raw,210.54000000000087
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),10240,ipcq,325.15999999999985
|
||||
h4_inter_cube_vertical,Inter-cube vertical (cube0 to cube4),10240,raw,234.54000000000087
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),128,ipcq,24.88749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),128,raw,33.57999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),256,ipcq,28.13749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),256,raw,36.07999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),384,ipcq,29.88749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),384,raw,37.07999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),512,ipcq,31.63749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),512,raw,38.07999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),768,ipcq,35.13749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),768,raw,40.07999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),1024,ipcq,38.63749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),1024,raw,42.07999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),2048,ipcq,52.63749999999891
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),2048,raw,50.07999999999811
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),4096,ipcq,80.63750000000073
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),4096,raw,66.08000000000175
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),8192,ipcq,136.63750000000073
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),8192,raw,98.08000000000175
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),10240,ipcq,164.63750000000073
|
||||
latency_intracube_PE0_to_PE1_horizontal,Intra-cube PE-to-PE latency: PE0 → PE1 (horizontal),10240,raw,114.08000000000175
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),128,ipcq,38.49749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),128,raw,47.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),256,ipcq,43.24749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),256,raw,51.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),384,ipcq,44.99749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),384,raw,52.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),512,ipcq,46.74749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),512,raw,53.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),768,ipcq,50.24749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),768,raw,55.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),1024,ipcq,53.74749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),1024,raw,57.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),2048,ipcq,67.74749999999585
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),2048,raw,65.18999999999505
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),4096,ipcq,95.74750000000131
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),4096,raw,81.19000000000233
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),8192,ipcq,151.7475000000013
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),8192,raw,113.19000000000233
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),10240,ipcq,179.7475000000013
|
||||
latency_intracube_PE0_to_PE4_vertical,Intra-cube PE-to-PE latency: PE0 → PE4 (vertical),10240,raw,129.19000000000233
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),128,ipcq,81.15999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),128,raw,89.28999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),256,ipcq,88.65999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),256,raw,95.53999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),384,ipcq,90.90999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),384,raw,96.53999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),512,ipcq,93.15999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),512,raw,97.53999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),768,ipcq,97.65999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),768,raw,99.53999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),1024,ipcq,103.15999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),1024,raw,102.53999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),2048,ipcq,125.15999999999804
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),2048,raw,114.53999999999724
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),4096,ipcq,169.15999999999985
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),4096,raw,138.54000000000087
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),8192,ipcq,257.15999999999985
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),8192,raw,186.54000000000087
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),10240,ipcq,301.15999999999985
|
||||
latency_intercube_C0PE0_to_C1PE0_horizontal,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube1.PE0 (horizontal),10240,raw,210.54000000000087
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),128,ipcq,103.15999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),128,raw,111.28999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),256,ipcq,112.65999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),256,raw,119.53999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),384,ipcq,114.90999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),384,raw,120.53999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),512,ipcq,117.15999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),512,raw,121.53999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),768,ipcq,121.65999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),768,raw,123.53999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),1024,ipcq,127.15999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),1024,raw,126.53999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),2048,ipcq,149.15999999999804
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),2048,raw,138.53999999999724
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),4096,ipcq,193.15999999999985
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),4096,raw,162.54000000000087
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),8192,ipcq,281.15999999999985
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),8192,raw,210.54000000000087
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),10240,ipcq,325.15999999999985
|
||||
latency_intercube_C0PE0_to_C4PE0_vertical,Inter-cube PE-to-PE latency: Cube0.PE0 → Cube4.PE0 (vertical),10240,raw,234.54000000000087
|
||||
|
||||
|