Add probe CLI improvements, D2H read, UCIe/HBM tuning, BW sweep
- Probe CLI: restructured output (tables first, routes below), per-hop timestamps, split cross-cube into best/worst cases, D2H read section - UCIe overhead: 1ns -> 8ns per port (16ns per crossing) to fix cross-cube-best < cross-half latency inversion - HBM efficiency: added efficiency=0.8 factor to hbm_ctrl, reducing effective BW from 256 to 204.8 GB/s - Multi-size BW sweep: saturation tables (4KB-1MB) for all probe cases - Probe default data size: 4KB -> 32KB for more realistic measurements - IOChiplet NOC + D2H topology and tests - NOC mesh, xbar, BW occupancy components and tests - Cube mesh visualization diagram 278 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -69,17 +69,37 @@ class HbmCtrlComponent(ComponentBase):
|
||||
yield from self._send_response(env, txn)
|
||||
|
||||
def _send_response(self, env: simpy.Environment, txn: Any) -> Generator:
|
||||
"""Create ResponseMsg and send on reverse path back to originator.
|
||||
"""Route completion based on path type.
|
||||
|
||||
PeDmaMsg is a direct probe with no IO_CPU/M_CPU aggregation in the path,
|
||||
so we succeed txn.done directly instead of sending a response Transaction.
|
||||
- PeDmaMsg: succeed done directly (probe).
|
||||
- Bypass path (no m_cpu): MemoryWrite succeeds done; MemoryRead sends
|
||||
data back on reverse path with original done event.
|
||||
- M_CPU DMA path: send ResponseMsg for m_cpu/io_cpu aggregation.
|
||||
"""
|
||||
from kernbench.runtime_api.kernel import PeDmaMsg
|
||||
from kernbench.runtime_api.kernel import MemoryReadMsg, PeDmaMsg
|
||||
|
||||
if isinstance(txn.request, PeDmaMsg):
|
||||
txn.done.succeed()
|
||||
return
|
||||
|
||||
# Bypass path: no m_cpu in the transaction path
|
||||
is_bypass = not any("m_cpu" in n for n in txn.path)
|
||||
if is_bypass:
|
||||
if isinstance(txn.request, MemoryReadMsg):
|
||||
# D2H: send data back on reverse path to pcie_ep
|
||||
reverse_path = list(reversed(txn.path))
|
||||
if len(reverse_path) >= 2:
|
||||
resp_txn = Transaction(
|
||||
request=txn.request, path=reverse_path, step=0,
|
||||
nbytes=txn.request.nbytes, done=txn.done,
|
||||
)
|
||||
yield self.out_ports[reverse_path[1]].put(resp_txn.advance())
|
||||
return
|
||||
# MemoryWrite bypass or short path: done
|
||||
txn.done.succeed()
|
||||
return
|
||||
|
||||
# M_CPU DMA path: send ResponseMsg for aggregation
|
||||
reverse_path = list(reversed(txn.path))
|
||||
if len(reverse_path) >= 2 and self.ctx:
|
||||
from kernbench.runtime_api.kernel import ResponseMsg
|
||||
|
||||
Reference in New Issue
Block a user