Add reverse path response latency for PE DMA and PE_CPU→M_CPU

Model fabric response hop latency for PE-internal operations: - HBM_CTRL sends PeDmaMsg response on reverse path instead of direct done signal - PE_CPU sends ResponseMsg via NOC→M_CPU on kernel completion - Add NOC→PE_DMA and PE_CPU→NOC edges in topology builder - Make HBM BW test assertions dynamic based on topology efficiency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:40:56 -07:00
parent 8b5afef5eb
commit 62fb01ae18
8 changed files with 88 additions and 24 deletions
@@ -29,19 +29,19 @@ def test_full_graph_node_count():

 def test_full_graph_edge_count():
    g = _graph()
-    # Per cube: 168
+    # Per cube: 184
    #   PE-internal: 56
-    #   PE_DMA→noc: 8, noc→pe_cpu: 8
+    #   PE_DMA→noc: 8, noc→pe_dma: 8, noc→pe_cpu: 8, pe_cpu→noc: 8
    #   xbar_top→hbm{0..3}: 4+4=8, xbar_bot→hbm{4..7}: 4+4=8
    #   noc↔xbar_top: 2, noc↔xbar_bot: 2
    #   xbar_top↔bridge.left: 2, bridge.left↔xbar_bot: 2
    #   xbar_top↔bridge.right: 2, bridge.right↔xbar_bot: 2
    #   ucie: 64, m_cpu↔noc: 2, noc↔sram: 2
-    #   Total: 56+8+8+8+8+2+2+2+2+2+2+64+2+2 = 168
+    #   Total: 56+8+8+8+8+8+8+2+2+2+2+2+2+64+2+2 = 184
    # IO edges per SIP: 77
-    # Per SIP: 16*168 + 48 inter-cube + 77 IO = 2813
-    # Total: 2 * 2813 = 5626
-    assert len(g.edges) == 5626
+    # Per SIP: 16*184 + 48 inter-cube + 77 IO = 3069
+    # Total: 2 * 3069 = 6138
+    assert len(g.edges) == 6138


 # ── Full graph: specific nodes exist ─────────────────────────────────