kernbench2/docs/adr/ADR-0053-dev-topology-builder-algorithms.md

# ADR-0053: Topology Builder + Visualizer Algorithms

## Status

Accepted (2026-05-22).

Pins down the key algorithmic choices of the topology compile and
visualization pipeline jointly implemented by `topology/builder.py`,
`topology/mesh_gen.py`, and `topology/visualizer.py` —
placement-driven router attachment, mesh auto-layout, the source_hash
cache, view projections, and SVG rendering. ADR-0006 defines the
high-level intent of topology compilation (compiled topology, distance
extraction, automatic diagram generation), but **which algorithms the
builder actually uses** was only discoverable via source grep.

## First action

When `resolve_topology(path_str)` is called, four steps run in order:

1. **Path validation** (`builder.py::resolve_topology`):
   `Path(path_str).expanduser().resolve()`, existence check, file
   check. Failure → `FileNotFoundError` or `ValueError`.
2. **YAML parsing** (`_read_spec`): `yaml.safe_load`. Parse errors
   yield a `ValueError` with line/column. Non-dict roots are
   rejected.
3. **Auto-generate the mesh** (`mesh_gen.ensure_mesh_file`): create or
   reuse a `cube_mesh.yaml` next to the topology file. Cache hit on
   matching source_hash; miss triggers regeneration. This step decides
   the cube NoC's router grid and attachment information.
4. **Compile the graph** (`_compile_graph`): system → IO chiplets →
   cubes → inter-cube edges → IO↔cube edges → system↔IO edges, then
   build four view projections (system, sip, cube, pe) and wrap into
   a `TopologyGraph`.

In short, **topology compilation's first act is "read topology.yaml as
a dict, create/validate cube_mesh.yaml in the same directory, then
build the flat graph + 4-view projection in system → sip → cube → pe
order"**.

## Context

`topology/` package responsibilities:

- **builder.py** (1207 lines): turns topology.yaml into a
  `TopologyGraph` (nodes + edges + 4 view projections).
- **mesh_gen.py** (305 lines): auto-decides the cube NoC's router
  grid and PE/UCIe/M_CPU/SRAM attachment positions and caches them in
  `cube_mesh.yaml`.
- **visualizer.py** (887 lines): generates four SVG diagrams (system /
  sip / cube / pe) from a `TopologyGraph`.

ADR-0006 makes the high-level decision that "the result of topology
compilation is the single source for distance metadata and diagram
generation", but specific algorithms (e.g., placement-driven nearest-
router attachment, the HBM exclusion zone, which fields in source_hash
trigger regeneration) are not in any ADR.

In particular, these decisions are absent at ADR level:

- Why is mesh_gen cached in a separate file (`cube_mesh.yaml`)?
- Which fields are in source_hash, and which changes force
  regeneration?
- Why placement coordinates in mm rather than cube coordinates?
- How are the HBM exclusion zone and UCIe N/S/E/W distribution
  decided inside the mesh?
- What is the abstraction-level difference among the four view
  projections (system/sip/cube/pe)?

This ADR captures these decisions in one place.

## Decision

### D1. Compile pipeline — six stages

`_compile_graph(spec)`:

1. **System nodes** (`_instantiate_system`): add system-level nodes
   like `fabric.switch0` and the host CPU.
2. **Per-SIP loop** (`for sip_id in range(system.sips.count)`):
   - **IO chiplets** (`_instantiate_io_chiplets`): create pcie_ep /
     io_cpu / io_noc / io_ucie PHYs / conn nodes and their bidirectional
     internal edges.
   - **Cube instantiation** (`_instantiate_cube`): using
     cube_mesh.yaml's router grid, instantiate cube routers, PE
     sub-components (pe_cpu, pe_dma, pe_fetch_store, pe_gemm, pe_math,
     pe_mmu, pe_tcm, pe_scheduler, pe_ipcq), m_cpu, sram, hbm_ctrl,
     and their internal edges.
   - **Inter-cube edges** (`_add_inter_cube_edges`): the UCIe
     N/S/E/W mesh edges.
   - **IO ↔ cube edges** (`_add_io_to_cube_edges`): connect io_noc to
     each cube's edge UCIe phy.
3. **Switch ↔ IO edges** (`_add_system_to_io_edges`): bidirectional
   edges between `fabric.switch0` and each SIP's `pcie_ep` (the
   cross-SIP IPCQ path of ADR-0038 D3 + ADR-0010).
4. **Build four view projections**:
   - `_build_system_view(spec)` — Tray level: SIPs and the system
     switch.
   - `_build_sip_view(spec)` — inside one SIP: cube mesh + IO
     chiplet.
   - `_build_cube_view(spec)` — inside one cube: router grid + PE /
     M_CPU / SRAM / HBM_CTRL attachments.
   - `_build_pe_view(spec)` — inside one PE: nine sub-components +
     internal edges (pe_internal kind).
5. **Return `TopologyGraph`**: `TopologyGraph(spec, nodes, edges,
   system_view, sip_view, cube_view, pe_view)`.

The six stages are **ordered for a reason**: only after cubes exist
do inter-cube edges have valid src/dst, and IO chiplets must precede
the IO ↔ cube edges that reference them. New node types must slot in
the right spot.

### D2. `cube_mesh.yaml` — a separate file with a source_hash cache

`mesh_gen.ensure_mesh_file(cube_spec, mesh_path)`:

1. Compute `source_hash = _compute_source_hash(cube_spec)` from these
   input fields:
   - `geometry` (cube_mm.w/h …).
   - `pe_layout` (corners, pe_per_corner).
   - `ucie.n_connections`.
   - `memory_map.hbm_mapping_mode`.
   - `placement` (m_cpu/sram pos_mm).
2. If `mesh_path` (= `cube_mesh.yaml` next to topology.yaml) exists
   and `existing.source_hash == source_hash`, reuse it (cache hit).
3. Otherwise, generate a new mesh via
   `_generate_mesh(cube_spec, source_hash)` and write to yaml.

Caching as a separate file because:

- Mesh generation involves nontrivial PE/UCIe/router attachment math
  and is too expensive to redo every time.
- Multiple runs with the same cube spec must guarantee an identical
  mesh.
- The resulting mesh is itself an inspectable / debuggable artifact.

The five fields listed in source_hash are the ones that determine
mesh shape; other changes (e.g., bandwidth, overhead_ns) do not
trigger mesh regeneration.

### D3. Cube NoC mesh auto-layout

`_generate_mesh(cube_spec)`:

#### D3.1. Rows / columns

- `pe_positions = _corner_pe_positions(cube_w, cube_h)`: PE-center
  coordinates (mm) per corner (NW/NE/SW/SE). Hardcoded patterns like
  `(1.5, 1.5)` and `(cube_w-1.5, cube_h-1.5)`; with `pe_per_corner=2`,
  each corner has two PE positions.
- `col_xs = _compute_col_positions(...)`: union of PE x-coordinates,
  plus relay columns inserted when any gap exceeds
  `max_spacing = 3.0 mm`.
- `row_ys, rows_per_half = _compute_row_positions(cube_h,
  n_connections, pe_positions)`:
  - `n_conn = max(n_connections, 2)` (hot-path minimum).
  - `rows_per_half = ceil(n_conn / 2)`.
  - Top half + two HBM rows + bottom half. HBM sits at
    `(cube_h/2 - 1.5, cube_h/2 + 1.5)`. The gap between PE rows and
    HBM rows is `hbm_gap = 1.5 mm`.

#### D3.2. HBM exclusion zone

`hbm_row_start = rows_per_half`,
`hbm_row_end = rows_per_half + 1`.
`hbm_col_start = n_cols // 2 - 1`,
`hbm_col_end = n_cols // 2`.

Router slots inside this (row, col) rectangle are marked `None` (no
router). HBM controllers are added separately as
`hbm_ctrl.pe{X}` nodes following ADR-0017 D9's per-PE partition
pattern.

#### D3.3. PE attachment

Each corner's PEs map to a row:

- Top half: NW → row 0, NE → row 1 (top_corners index).
- Bottom half: SW → row `hbm_row_end + 1`, SE → row
  `hbm_row_end + 2`.

Each PE's x-coordinate attaches to the nearest column's router
(`min(range(n_cols), key=lambda c: abs(col_xs[c] - pe_x))`).
Attachment items are `pe{pe_idx}.dma`, `pe{pe_idx}.cpu`,
`pe{pe_idx}.hbm` (pushed into the router's attach list).

#### D3.4. M_CPU / SRAM attachment — nearest router by Euclidean distance

For `placement.m_cpu.pos_mm` (default `[1.5, 5.5]`) and
`placement.sram.pos_mm` (default `[1.5, 8.5]`), find the router with
the smallest Euclidean distance and append `"m_cpu"` / `"sram"` to
its attach list.

#### D3.5. UCIe N/S/E/W distribution

`ucie_pe_rows = top_pe_rows + bot_pe_rows` (total
`2 * rows_per_half`).

- UCIe-E: one PE row at a time, attach `ucie_e.c{i}` to the rightmost
  column's router.
- UCIe-W: attach `ucie_w.c{i}` to the leftmost column's router (E's
  mirror).
- UCIe-N/S: split PE columns into left and right halves; attach to
  the top row's / bottom row's matching columns.

Each UCIe connection is suffixed `c{i}`, distributing
ucie_n_connections PHYs (ADR-0017 D5+).

### D4. Node naming convention — single ownership

builder.py creates nodes with the following naming convention (the
single-owner principle from ADR-0051 D5):

- `fabric.switch0` — system-level switch.
- `sip{S}.{io_id}.{pcie_ep|io_cpu|io_noc|io_ucie.{dir}|conn.{id}}` —
  IO chiplet.
- `sip{S}.cube{C}.{m_cpu|sram|hbm_ctrl.pe{X}|noc.r{R}c{C}|...}` —
  inside cube.
- `sip{S}.cube{C}.pe{P}.{pe_cpu|pe_dma|pe_fetch_store|pe_gemm|pe_math|pe_mmu|pe_tcm|pe_scheduler|pe_ipcq}` —
  PE sub-components.

Changing this convention requires updating both builder.py and
router.py's helpers (ADR-0051). Components never know the convention
directly — they only call the helpers.

### D5. Edge `kind` classification

Every edge gets a `kind`; routing policy (ADR-0051 D2) reads it. Major
kinds:

- `"pe_internal"` — within a PE between sub-components.
- `"pe_to_router"` — PE_DMA ↔ cube NoC router.
- `"router_mesh"` — between cube NoC routers.
- `"router_to_hbm"`, `"router_to_mcpu"`, `"router_to_sram"`,
  `"sram_to_router"`, etc. — between cube-attached components.
- `"ucie_internal"`, `"ucie_conn_to_router"`,
  `"router_to_ucie_conn"`, `"ucie_conn_to_noc"`,
  `"noc_to_ucie_conn"`, `"ucie_mesh"` — UCIe-related.
- `"io_internal"` — inside IO chiplet.
- `"io_to_cube"`, `"cube_to_io"` — at the IO ↔ cube boundary.
- `"pcie"` — switch ↔ pcie_ep.
- `"command"` — control-plane edges only (e.g., M_CPU ↔ NOC; excluded
  from PE DMA paths).

Adding a new edge kind requires picking a category in router.py's
four adjacency graphs (ADR-0051 D2). If you forget, it defaults to
`_adj_all` only, which can produce unintended routes.

### D6. View projection — four abstraction levels

`TopologyGraph` keeps four view projections alongside the flat
nodes+edges:

- **system_view** (`_build_system_view`): Tray level. SIP blocks and
  `fabric.switch0`. PCIe links shown. For external high-level
  overview.
- **sip_view** (`_build_sip_view`): inside one SIP — cube mesh + IO
  chiplet (pcie_ep + io_cpu + io_noc). UCIe N/S/E/W appear as
  cube-cube links.
- **cube_view** (`_build_cube_view`): inside one cube — router grid +
  PE / M_CPU / SRAM / HBM_CTRL attachments + UCIe PHY edges. For
  intra-cube routing / placement debugging.
- **pe_view** (`_build_pe_view`): inside one PE — nine sub-components
  + internal edges (pe_internal kind). For detailed PE-internal
  dataflow review.

Views are selectively rendered via the spec's
`visualization.emit_views: [system, sip, cube]` (ADR-0006). The pe
view is omitted from default output but the code is retained for
detailed debugging.

### D7. visualizer.py — SVG diagram output

`emit_diagrams(graph, out_dir)` renders every view as SVG. Key
functions:

- `_render_view_svg(view)` — generic view render (no router grid).
- `_render_cube_view_svg(view, spec)` — cube-view specific (HBM block,
  router grid layout, PE/M_CPU/SRAM/HBM placement).
- `_draw_node`, `_draw_edge` — node/edge visual representation.
- `_pick_scale`, `_compute_node_sizes` — auto-scaling.

The visualizer is a **derived artifact** (ADR-0006); changes here do
not pass production checks. Aligns with CLAUDE.md's "Derived
Artifacts" guidance.

### D8. Blast radius of spec changes

| spec field                            | effect              | mesh regenerated? |
|---------------------------------------|---------------------|-------------------|
| `system.sips.count`                   | SIP count, node count | No                |
| `sip.cube_mesh.w/h`                   | cube mesh shape     | No                |
| `cube.geometry.cube_mm.w/h`           | cube size (mm)      | **Yes**           |
| `cube.pe_layout.corners/pe_per_corner`| PE attachment positions | **Yes**       |
| `cube.ucie.n_connections`             | UCIe PHY distribution | **Yes**         |
| `cube.memory_map.hbm_mapping_mode`    | HBM distribution mode | **Yes**         |
| `cube.placement`                      | M_CPU/SRAM positions | **Yes**          |
| `cube.memory_map.*` (besides above)   | HBM capacity / BW   | No                |
| `*.links.*.bw_gbs`                    | edge bandwidth      | No                |
| `*.attrs.overhead_ns`                 | component latency   | No                |

The table mirrors D2's `_compute_source_hash` inputs. Changes that
require mesh regeneration automatically invalidate `cube_mesh.yaml`'s
source_hash.

## Alternatives Considered

### A1. Regenerate the mesh on every compile without a cache file

Rejected. The cost of mesh generation would be paid repeatedly (CLI
runs, probe, tests) for the same spec, and the human-inspectable
artifact would disappear.

### A2. Merge mesh generation into builder.py

Rejected (currently). It is a 305-line algorithm of its own, and the
mesh-layout decisions (placement-driven router attachment, HBM
exclusion zone) are different from builder's general node/edge
emission. Keeping it separate respects single-responsibility.

### A3. Express placement coordinates in cube coordinates (col/row)

Rejected. mm coordinates flow consistently between the visualizer and
mesh layout (for nearest-router computation). Cube coordinates are
undefined until the router grid is fixed, so they are unsuitable as
placement input.

### A4. Lazy view projection generation

Rejected (currently). The four views are cheap to build (typically <
100 ms), and eager construction guarantees `TopologyGraph` as the
single source of truth.

### A5. Visualizer output in formats besides SVG (PNG/PDF)

Rejected. SVG is vector + text-searchable + directly renderable in
browsers. PNG conversion, when required, is downstream
post-processing (e.g., rsvg-convert).

## Consequences

- ADR-0006's high-level intent is fleshed out via D1–D7; topology
  changes can be assessed quickly via D8's table.
- D3's mesh-layout algorithm is ADR-locked, so future PE attachment
  patterns (e.g., a 6-zone HBM split) make clear which stage they
  affect.
- D5's edge-kind list and D7's view structure are explicit, giving PR
  reviewers a quick map of where (builder + router + visualizer) a
  new component type ripples through.
- D2's source_hash invalidation rules are explicit, so a stale
  `cube_mesh.yaml` (e.g., when only bandwidth changed) is recognized
  as correct behavior.