Documents four cross-cutting surfaces that previously had no ADR backing,
each surfaced as a G4 candidate by /report:
- 0046 prog-tl-context-contract: the kernel-side tl.* API. Enumerates
all primitives (ref/load/store/dot/composite/math/reduction/IPCQ/...),
the two execution modes (command-list vs greenlet runner), scratch
allocator semantics, dispatch-overhead model, and the kernel registry.
- 0047 par-ahbm-ccl-backend: torch.distributed.init_process_group
(backend="ahbm") install path. world_size priority (algorithm >
defaults > topology), the 4-step init sequence (load ccl.yaml, import
algorithm module, derive world_size, install SFR + IPCQ), greenlet-
local rank registry, all_reduce dispatch via _defer_wait, barrier
no-op rationale, and the explicit list of unsupported dist.* APIs.
- 0048 mem-allocator-algorithms: VirtualAllocator + PEMemAllocator
free-list semantics. Offset-keyed first-fit with coalescing, the
no-validation trust model for free(), HBM/TCM channel separation,
page-aligned VA allocation, the page_size dual-default
(VirtualAllocator 2 MiB / _ensure_allocators 4 KiB fallback), and
one-allocator-per-sub-unit rule.
- 0049 ver-probe-subcommand: kernbench probe traffic-pattern catalog.
H2D / D2H / PE DMA categories with their exact cube-index choices,
the 32 KiB reference size, the 5-point utilization sweep, the
formula vs actual column meanings, automatic invariant checks
(monotonicity, D2H >= H2D, best < worst), per-case GraphEngine
isolation, and the human-readable (not machine-parsable) output
contract.
Bilingual pair verifier passes for all four EN/KO pairs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>