ADR-0033 D6: address-based PC selection at HBM CTRL

Replaces global round-robin with deterministic address-derived PC striping: pc_shift = log2(burst_bytes) pc_mask = num_pcs - 1 pc = (flit.address >> pc_shift) & pc_mask Each Transaction carries base_address (HBM byte offset of the first chunk); each Flit derives its own address as base + i*flit_bytes. HBM CTRL routes flits to PCs via this formula, replacing the arrival-order RR pointer. Also splits the is_last wait into an asynchronous _finalize_txn process so the worker isn't blocked on PC commit, exposing true PC parallelism for disjoint addresses. phyaddr.py documents the canonical bit layout (bits [10:8] for the default burst=256, num_pcs=8 case). ADR-0033 D6 records the derivation and the workload scenarios where address-striping matters (strided streams, offset-disjoint parallel transfers). Adds tests/test_hbm_address_based_pc.py: canonical bit mapping, strided 8-way load distribution, same-address PC-0 serialization, PC-aligned 2KB pair collision, dynamic pc_shift from burst_bytes, and power-of-2 attr validation. Integration tests inspect _pc_avail ledger directly: at default config UCIe's 8 ns per-txn overhead exactly matches chunk_time, masking PC contention at the makespan level even though the ledger correctly distinguishes the cases. Full suite: 631 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 00:18:46 -07:00
parent a44f832be5
commit aaa1cbfaf6
6 changed files with 292 additions and 27 deletions
@@ -29,6 +29,8 @@ class Transaction:
    drain_ns: float = 0.0   # wormhole drain time: nbytes / bottleneck_bw (applied once at terminal)
    is_response: bool = False  # True when carrying ResponseMsg on reverse path
    result_data: dict[str, Any] = field(default_factory=dict)  # PE-level metrics (pe_exec_ns, etc.)
+    base_address: int = 0   # HBM byte offset of the first chunk; per-flit addresses
+                            # derived as base + flit_index * flit_bytes (ADR-0033 D6)

    @property
    def next_hop(self) -> str | None:
@@ -47,6 +49,7 @@ class Transaction:
            drain_ns=self.drain_ns,
            is_response=self.is_response,
            result_data=self.result_data,
+            base_address=self.base_address,
        )

    def into_flits(self, flit_bytes: int) -> Iterator[Flit]:
@@ -71,6 +74,7 @@ class Transaction:
                flit_index=i,
                flit_nbytes=size,
                is_last=(i == n_total - 1),
+                address=self.base_address + i * flit_bytes,
            )


@@ -91,3 +95,4 @@ class Flit:
    flit_index: int        # 0..n_flits-1
    flit_nbytes: int       # bytes carried (usually flit_bytes; last may be smaller)
    is_last: bool          # True for the terminating flit
+    address: int = 0       # HBM byte offset for this flit's chunk (ADR-0033 D6)