ADR-0033 D6: address-based PC selection at HBM CTRL

Replaces global round-robin with deterministic address-derived PC
striping:

    pc_shift = log2(burst_bytes)
    pc_mask  = num_pcs - 1
    pc       = (flit.address >> pc_shift) & pc_mask

Each Transaction carries base_address (HBM byte offset of the first
chunk); each Flit derives its own address as base + i*flit_bytes.
HBM CTRL routes flits to PCs via this formula, replacing the
arrival-order RR pointer. Also splits the is_last wait into an
asynchronous _finalize_txn process so the worker isn't blocked on
PC commit, exposing true PC parallelism for disjoint addresses.

phyaddr.py documents the canonical bit layout (bits [10:8] for the
default burst=256, num_pcs=8 case). ADR-0033 D6 records the
derivation and the workload scenarios where address-striping
matters (strided streams, offset-disjoint parallel transfers).

Adds tests/test_hbm_address_based_pc.py: canonical bit mapping,
strided 8-way load distribution, same-address PC-0 serialization,
PC-aligned 2KB pair collision, dynamic pc_shift from burst_bytes,
and power-of-2 attr validation. Integration tests inspect
_pc_avail ledger directly: at default config UCIe's 8 ns per-txn
overhead exactly matches chunk_time, masking PC contention at the
makespan level even though the ledger correctly distinguishes the
cases.

Full suite: 631 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-15 00:18:46 -07:00
parent a44f832be5
commit aaa1cbfaf6
6 changed files with 292 additions and 27 deletions
+5
View File
@@ -29,6 +29,8 @@ class Transaction:
drain_ns: float = 0.0 # wormhole drain time: nbytes / bottleneck_bw (applied once at terminal)
is_response: bool = False # True when carrying ResponseMsg on reverse path
result_data: dict[str, Any] = field(default_factory=dict) # PE-level metrics (pe_exec_ns, etc.)
base_address: int = 0 # HBM byte offset of the first chunk; per-flit addresses
# derived as base + flit_index * flit_bytes (ADR-0033 D6)
@property
def next_hop(self) -> str | None:
@@ -47,6 +49,7 @@ class Transaction:
drain_ns=self.drain_ns,
is_response=self.is_response,
result_data=self.result_data,
base_address=self.base_address,
)
def into_flits(self, flit_bytes: int) -> Iterator[Flit]:
@@ -71,6 +74,7 @@ class Transaction:
flit_index=i,
flit_nbytes=size,
is_last=(i == n_total - 1),
address=self.base_address + i * flit_bytes,
)
@@ -91,3 +95,4 @@ class Flit:
flit_index: int # 0..n_flits-1
flit_nbytes: int # bytes carried (usually flit_bytes; last may be smaller)
is_last: bool # True for the terminating flit
address: int = 0 # HBM byte offset for this flit's chunk (ADR-0033 D6)