Add CHANGES.md, README, update SPEC/ADRs for release 2

- CHANGES.md: detailed changelog for release 1 and 2
- README.md: full project docs with install, probe, run, test usage
- SPEC.md: add ADR-0014~0017 references, update R7 for pcie_ep endpoint
- ADR-0003: update NOC description to reference ADR-0017
- ADR-0004: add HBM efficiency factor (0.8) to BW guarantee contract
- ADR-0014: status Proposed -> Accepted
- ADR-0015: update D4 to M_CPU bypass for Memory R/W, add ADR-0016/0017 links
- ADR-0016 (new): IOChiplet NOC and memory data path
- ADR-0017 (new): Cube NOC 2D mesh architecture
- Fix MD lint warnings (unfenced code blocks) across all docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-19 01:43:15 -07:00
parent d75da439c6
commit fc6abbc8ee
10 changed files with 613 additions and 65 deletions
@@ -2,7 +2,7 @@
## Status
Proposed
Accepted
## Context
@@ -123,7 +123,7 @@ Examples include:
Execution flow:
```
```text
PE_CPU → SubmissionQueue → PE_SCHEDULER → engine queue → engine execution → completion event → PE_SCHEDULER → CompletionQueue
```
@@ -133,7 +133,7 @@ Composite commands implement tiled pipelined execution across engines.
Each tile executes the following pipeline:
```
```text
Input DMA (READ)
→ Compute (GEMM or MATH)
→ Output DMA (WRITE)
@@ -158,7 +158,7 @@ Operations for different tiles may overlap when engine resources permit.
Allowed overlaps:
```
```text
DMA_READ(t+1) ∥ COMPUTE(t)
DMA_WRITE(t1) ∥ COMPUTE(t)
DMA_READ(t) ∥ DMA_WRITE(t)
@@ -166,7 +166,7 @@ DMA_READ(t) ∥ DMA_WRITE(t)
Disallowed overlaps:
```
```text
GEMM(t) ∥ GEMM(t)
MATH(t) ∥ MATH(t)
GEMM(t) ∥ MATH(t)
@@ -182,7 +182,7 @@ Each engine behaves as a deterministic service resource.
PE_DMA contains two independent channels.
```
```text
DMA_READ capacity = 1
DMA_WRITE capacity = 1
```
@@ -195,13 +195,13 @@ Rules:
Example allowed:
```
```text
DMA_READ(t+1) ∥ DMA_WRITE(t)
```
Example not allowed:
```
```text
DMA_READ(t) ∥ DMA_READ(t+1)
DMA_WRITE(t) ∥ DMA_WRITE(t+1)
```
@@ -210,7 +210,7 @@ DMA_WRITE(t) ∥ DMA_WRITE(t+1)
Compute operations share a single compute resource.
```
```text
PE_ACCEL capacity = 1
```
@@ -230,7 +230,7 @@ Composite commands contain one compute opcode only.
Examples:
```
```text
COMPOSITE_GEMM
COMPOSITE_MATH
```
@@ -250,13 +250,13 @@ Compute operations use a TCM-centric dataflow model.
**Input path (HBM)**
```
```text
HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
```
**Input path (shared SRAM)**
```
```text
Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM
```
@@ -264,7 +264,7 @@ Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM
Compute engines read input tensors from PE_TCM.
```
```text
PE_TCM → GEMM / MATH
```
@@ -274,13 +274,13 @@ Weights for GEMM may optionally stream directly from HBM (via XBAR).
Compute results are written to PE_TCM, then DMA writes to HBM.
```
```text
PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
```
**Output path (shared SRAM)**
```
```text
PE_TCM → PE_DMA (DMA_WRITE) → NOC → Shared SRAM
```