Implement ADR-0020: 2-pass data execution with greenlet kernel runner

Step 1 — Foundation:
- OpRecord/OpLogger: op log infrastructure with t_start stable ordering
- MemoryStore: numpy ndarray tensor-granular storage (reference semantics)
- data_op=True flag on DmaReadCmd, DmaWriteCmd, GemmCmd, MathCmd, CompositeCmd
- numpy/greenlet dependencies added to pyproject.toml

Step 2 — ComponentBase hooks:
- _on_process_start/end hooks in _forward_txn (fabric messages)
- _handle_with_hooks in PeEngineBase (PE-internal commands)
- op_logger optional — zero overhead when disabled

Step 3 — KernelRunner + greenlet:
- KernelRunner: greenlet ↔ SimPy bridge in triton_emu/kernel_runner.py
- TLContext: _emit() method routes to greenlet switch or command list
- tl.load() returns real numpy data in greenlet mode
- Dynamic control flow supported (memory-read based branching)

Step 4 — PE_CPU integration:
- Greenlet mode when ctx.memory_store is set, legacy fallback otherwise
- Refactored into _execute_greenlet/_execute_legacy/_send_response
- ComponentContext gains memory_store and op_logger fields

Step 5 — DataExecutor:
- Phase 2 numpy execution for GEMM/Math ops from op_log
- _compute_math: all unary/binary/reduction ops
- verify(): compare MemoryStore against expected with dtype tolerance

28 new tests, 366 total passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-08 00:22:44 -07:00
parent 140b85436a
commit 51004c311c
14 changed files with 1181 additions and 59 deletions
+5
View File
@@ -55,6 +55,7 @@ class DmaReadCmd:
handle: TensorHandle
src_addr: int
nbytes: int
data_op: bool = True
@dataclass(frozen=True)
@@ -64,6 +65,7 @@ class DmaWriteCmd:
handle: TensorHandle
dst_addr: int
nbytes: int
data_op: bool = True
@dataclass(frozen=True)
@@ -79,6 +81,7 @@ class GemmCmd:
m: int
k: int
n: int
data_op: bool = True
@dataclass(frozen=True)
@@ -94,6 +97,7 @@ class MathCmd:
inputs: tuple[TensorHandle, ...]
out: TensorHandle
axis: int | None = None # for reductions
data_op: bool = True
@dataclass(frozen=True)
@@ -111,6 +115,7 @@ class CompositeCmd:
out_addr: int
out_nbytes: int
math_op: str | None = None # for op="math": which math operation
data_op: bool = True
@dataclass(frozen=True)