Add CHANGES.md, README, update SPEC/ADRs for release 2

- CHANGES.md: detailed changelog for release 1 and 2 - README.md: full project docs with install, probe, run, test usage - SPEC.md: add ADR-0014~0017 references, update R7 for pcie_ep endpoint - ADR-0003: update NOC description to reference ADR-0017 - ADR-0004: add HBM efficiency factor (0.8) to BW guarantee contract - ADR-0014: status Proposed -> Accepted - ADR-0015: update D4 to M_CPU bypass for Memory R/W, add ADR-0016/0017 links - ADR-0016 (new): IOChiplet NOC and memory data path - ADR-0017 (new): Cube NOC 2D mesh architecture - Fix MD lint warnings (unfenced code blocks) across all docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 01:43:15 -07:00
parent d75da439c6
commit fc6abbc8ee
10 changed files with 613 additions and 65 deletions
@@ -2,7 +2,7 @@

 ## Status

-Proposed
+Accepted

 ## Context

@@ -123,7 +123,7 @@ Examples include:

 Execution flow:

-```
+```text
 PE_CPU → SubmissionQueue → PE_SCHEDULER → engine queue → engine execution → completion event → PE_SCHEDULER → CompletionQueue
 ```

@@ -133,7 +133,7 @@ Composite commands implement tiled pipelined execution across engines.

 Each tile executes the following pipeline:

-```
+```text
 Input DMA (READ)
 → Compute (GEMM or MATH)
 → Output DMA (WRITE)
@@ -158,7 +158,7 @@ Operations for different tiles may overlap when engine resources permit.

 Allowed overlaps:

-```
+```text
 DMA_READ(t+1) ∥ COMPUTE(t)
 DMA_WRITE(t−1) ∥ COMPUTE(t)
 DMA_READ(t) ∥ DMA_WRITE(t)
@@ -166,7 +166,7 @@ DMA_READ(t) ∥ DMA_WRITE(t)

 Disallowed overlaps:

-```
+```text
 GEMM(t) ∥ GEMM(t′)
 MATH(t) ∥ MATH(t′)
 GEMM(t) ∥ MATH(t′)
@@ -182,7 +182,7 @@ Each engine behaves as a deterministic service resource.

 PE_DMA contains two independent channels.

-```
+```text
 DMA_READ capacity  = 1
 DMA_WRITE capacity = 1
 ```
@@ -195,13 +195,13 @@ Rules:

 Example allowed:

-```
+```text
 DMA_READ(t+1) ∥ DMA_WRITE(t)
 ```

 Example not allowed:

-```
+```text
 DMA_READ(t) ∥ DMA_READ(t+1)
 DMA_WRITE(t) ∥ DMA_WRITE(t+1)
 ```
@@ -210,7 +210,7 @@ DMA_WRITE(t) ∥ DMA_WRITE(t+1)

 Compute operations share a single compute resource.

-```
+```text
 PE_ACCEL capacity = 1
 ```

@@ -230,7 +230,7 @@ Composite commands contain one compute opcode only.

 Examples:

-```
+```text
 COMPOSITE_GEMM
 COMPOSITE_MATH
 ```
@@ -250,13 +250,13 @@ Compute operations use a TCM-centric dataflow model.

 **Input path (HBM)**

-```
+```text
 HBM → XBAR → PE_DMA (DMA_READ) → PE_TCM
 ```

 **Input path (shared SRAM)**

-```
+```text
 Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM
 ```

@@ -264,7 +264,7 @@ Shared SRAM → NOC → PE_DMA (DMA_READ) → PE_TCM

 Compute engines read input tensors from PE_TCM.

-```
+```text
 PE_TCM → GEMM / MATH
 ```

@@ -274,13 +274,13 @@ Weights for GEMM may optionally stream directly from HBM (via XBAR).

 Compute results are written to PE_TCM, then DMA writes to HBM.

-```
+```text
 PE_TCM → PE_DMA (DMA_WRITE) → XBAR → HBM
 ```

 **Output path (shared SRAM)**

-```
+```text
 PE_TCM → PE_DMA (DMA_WRITE) → NOC → Shared SRAM
 ```