Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep
- Probe CLI: restructured output (tables first, routes below), per-hop timestamps, split cross-cube into best/worst cases, D2H read section - UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix cross-cube-best < cross-half latency inversion - HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing effective BW from 256 to 204.8 GB/s - Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases - Probe default data size: 4KB -> 32KB for more realistic measurements - IOChiplet NOC + D2H topology and tests - NOC mesh, xbar, BW occupancy components and tests - Cube mesh visualization diagram 278 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -24,6 +24,7 @@ from kernbench.components.impls import (
|
||||
IoCpuComponent,
|
||||
MCpuComponent,
|
||||
PcieEpComponent,
|
||||
PositionAwareXbarComponent,
|
||||
SramComponent,
|
||||
TransitComponent,
|
||||
)
|
||||
@@ -231,7 +232,7 @@ def test_m_cpu_terminal_no_ctx_completes():
|
||||
("forwarding_v1", TransitComponent),
|
||||
("noc_v1", TransitComponent),
|
||||
("ucie_v1", TransitComponent),
|
||||
("xbar_v1", TransitComponent),
|
||||
("xbar_v1", PositionAwareXbarComponent),
|
||||
("pcie_ep_v1", PcieEpComponent),
|
||||
("io_cpu_v1", IoCpuComponent),
|
||||
("m_cpu_v1", MCpuComponent),
|
||||
|
||||
Reference in New Issue
Block a user