# ADR-0008: Tensor Deployment and Allocation (Host Allocator, PA-first) ## Status Accepted ## Context Benchmarks require PyTorch-like tensor semantics: - tensor creation (empty, fill), - deployment to accelerator devices (tensor.to()). In the realistic system, host software manages allocation/mapping and installs mappings for DMA/MMU. For Phase 0 we simplify (ADR-0011): - device memory operations use PA only, - VA/MMU/IOMMU is not modeled. To keep the host↔device interface minimal, we avoid a separate AllocateTensorMeta message. Instead, host allocation produces a PA shard map that is used directly by MemoryWrite/Read and KernelLaunch. --- ## Decision ### D1. Tensor is a host-owned handle with PA shard mapping A Tensor object is a host-owned handle that encapsulates: - shape and dtype, - initialization intent, - device placement and allocation metadata as a PA shard map. After deployment, the Tensor handle MUST contain: - a list of shards, each with (sip,cube,pe,pa,nbytes,offset_bytes). This PA shard mapping is the single source of truth for kernel argument binding. --- ### D2. Deployment uses a host allocator (Phase 0) In Phase 0, tensor deployment produces PA shard mappings via a host allocator: - placement (split/replicate/hybrid) is decided by a DP policy, - allocation assigns PA ranges at the PE level and returns shard mappings, - the Tensor handle stores the resulting shard list deterministically. No separate host-visible device allocation RPC is required in Phase 0. --- ### D3. Data initialization and transfer uses MemoryWrite/Read only Any data initialization or transfer implied by a tensor (e.g., fill, copy) MUST be represented using Host ↔ IO_CPU messages only: - MemoryWrite - MemoryRead Rules: - MemoryWrite/Read MUST reference PA + (sip,cube,pe) tags (ADR-0012). - Allocation metadata MUST NOT be embedded as a separate allocation message. - Bulk tensor data MUST NOT be embedded in Phase 0 messages. The simulation engine schedules MemoryWrite/Read through the graph so that latency is computed by explicit traversal. --- ### D4. Extension path (non-breaking) Future ADRs MAY introduce optional VA/MMU/IOMMU modeling by adding: - virtual addressing in tensor handles, - mapping install steps, - translation latency/page granularity. The Phase 0 PA shard map remains a valid fast-path configuration. --- ## Consequences - Host↔IO_CPU contract remains minimal (MemoryRead/Write + KernelLaunch). - KernelLaunch can pass per-PE data placement explicitly via shard tags. - Early implementation stays simple and testable. --- ## Links - ADR-0011 (Memory Addressing — PA / VA / LA) - ADR-0012 (Host↔IO_CPU schema) - ADR-0007 (runtime_api vs sim_engine boundaries) - ADR-0009 (Kernel execution)