Add CHANGES.md, README, update SPEC/ADRs for release 2
- CHANGES.md: detailed changelog for release 1 and 2 - README.md: full project docs with install, probe, run, test usage - SPEC.md: add ADR-0014~0017 references, update R7 for pcie_ep endpoint - ADR-0003: update NOC description to reference ADR-0017 - ADR-0004: add HBM efficiency factor (0.8) to BW guarantee contract - ADR-0014: status Proposed -> Accepted - ADR-0015: update D4 to M_CPU bypass for Memory R/W, add ADR-0016/0017 links - ADR-0016 (new): IOChiplet NOC and memory data path - ADR-0017 (new): Cube NOC 2D mesh architecture - Fix MD lint warnings (unfenced code blocks) across all docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
@@ -123,7 +123,7 @@ Examples include:
|
||||
|
||||
Execution flow:
|
||||
|
||||
```
|
||||
```text
|
||||
PE_CPU → SubmissionQueue → PE_SCHEDULER → engine queue → engine execution → completion event → PE_SCHEDULER → CompletionQueue
|
||||
```
|
||||
|
||||
@@ -133,7 +133,7 @@ Composite commands implement tiled pipelined execution across engines.
|
||||
|
||||
Each tile executes the following pipeline:
|
||||
|
||||
```
|
||||
```text
|
||||
Input DMA (READ)
|
||||
→ Compute (GEMM or MATH)
|
||||
→ Output DMA (WRITE)
|
||||
@@ -158,7 +158,7 @@ Operations for different tiles may overlap when engine resources permit.
|
||||
|
||||
Allowed overlaps:
|
||||
|
||||
```
|
||||
```text
|
||||
DMA_READ(t+1) ∥ COMPUTE(t)
|
||||
DMA_WRITE(t−1) ∥ COMPUTE(t)
|
||||
DMA_READ(t) ∥ DMA_WRITE(t)
|
||||
@@ -166,7 +166,7 @@ DMA_READ(t) ∥ DMA_WRITE(t)
|
||||
|
||||
Disallowed overlaps:
|
||||
|
||||
```
|
||||
```text
|
||||
GEMM(t) ∥ GEMM(t′)
|
||||
MATH(t) ∥ MATH(t′)
|
||||
GEMM(t) ∥ MATH(t′)
|
||||
@@ -182,7 +182,7 @@ Each engine behaves as a deterministic service resource.
|
||||
|
||||
PE_DMA contains two independent channels.
|
||||
|
||||
```
|
||||
```text
|
||||
DMA_READ capacity = 1
|
||||
DMA_WRITE capacity = 1
|
||||
```
|
||||
@@ -195,13 +195,13 @@ Rules:
|
||||
|
||||
Example allowed:
|
||||
|
||||
```
|
||||
```text
|
||||
DMA_READ(t+1) ∥ DMA_WRITE(t)
|
||||
```
|
||||
|
||||
Example not allowed:
|
||||
|
||||
```
|
||||
```text
|
||||
DMA_READ(t) ∥ DMA_READ(t+1)
|
||||
DMA_WRITE(t) ∥ DMA_WRITE(t+1)
|
||||
```
|
||||
@@ -210,7 +210,7 @@ DMA_WRITE(t) ∥ DMA_WRITE(t+1)
|
||||
|
||||
Compute operations share a single compute resource.
|
||||
|
||||
```
|
||||
```text
|
||||
PE_ACCEL capacity = 1
|
||||
```
|
||||
|
||||
@@ -230,7 +230,7 @@ Composite commands contain one compute opcode only.
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
```text
|
||||
COMPOSITE_GEMM
|
||||
COMPOSITE_MATH
|
||||
```
|
||||
@@ -250,13 +250,13 @@ Compute operations use a TCM-centric dataflow model.
|
||||
|
||||
**Input path (HBM)**
|
||||
|
||||
```
|
||||
```text
|
||||
HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
|
||||
```
|
||||
|
||||
**Input path (shared SRAM)**
|
||||
|
||||
```
|
||||
```text
|
||||
Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM
|
||||
```
|
||||
|
||||
@@ -264,7 +264,7 @@ Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM
|
||||
|
||||
Compute engines read input tensors from PE_TCM.
|
||||
|
||||
```
|
||||
```text
|
||||
PE_TCM → GEMM / MATH
|
||||
```
|
||||
|
||||
@@ -274,13 +274,13 @@ Weights for GEMM may optionally stream directly from HBM (via XBAR).
|
||||
|
||||
Compute results are written to PE_TCM, then DMA writes to HBM.
|
||||
|
||||
```
|
||||
```text
|
||||
PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
|
||||
```
|
||||
|
||||
**Output path (shared SRAM)**
|
||||
|
||||
```
|
||||
```text
|
||||
PE_TCM → PE_DMA (DMA_WRITE) → NOC → Shared SRAM
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user