# Assembler (asm/)

Last verified: 2026-03-07

## Purpose

Translates dfasm graph assembly source into emulator-ready configurations. Bridges the gap between human-authored dataflow programs and the emulator's PEConfig/SMConfig/token structures.

## Contracts

- **Exposes**: `assemble(source) -> AssemblyResult`, `assemble_to_tokens(source) -> list`, `run_pipeline(source) -> IRGraph`, `serialize_graph(IRGraph) -> str`, `round_trip(source) -> str`
- **Guarantees**: Pipeline is parse -> lower -> expand -> resolve -> place -> allocate -> codegen. Each pass returns a new IRGraph (immutable pass pattern). Errors accumulate in `IRGraph.errors` rather than fail-fast. `AssemblyResult` contains valid PEConfig/SMConfig lists and seed MonadTokens.
- **Expects**: Valid dfasm source conforming to `dfasm.lark`. Raises `ValueError` if any pipeline stage reports errors.

## Pipeline Passes

1. **Lower** (`lower.py`): Lark CST -> IRGraph. Creates IRNodes, IREdges, IRRegions (function/location scopes), IRDataDefs, SystemConfig from @system pragma. Qualifies names with function scope (e.g., `$main.&add`). May contain MacroCall nodes and MacroDef regions.
2. **Expand** (`expand.py`): Macro expansion and function call wiring. Clones macro bodies, substitutes parameters (including opcodes via `${op}`, placement via `|${pe}`, ports via `:${port}`, context slots via `[${ctx}]`), evaluates const expressions, expands variadic repetition blocks, rewrites `@ret`/`@ret_name` macro outputs, qualifies expanded names with scope prefixes. Processes function call sites with `@ret` trampolines, `free_ctx` insertion, and cross-context wiring. After expand, IR contains only concrete IRNode/IREdge entries. No ParamRef placeholders, no MacroDef regions, no IRMacroCall entries remain.
3. **Resolve** (`resolve.py`): Validates all edge endpoints exist. Detects scope violations (cross-function label refs). Generates Levenshtein "did you mean" suggestions.
4. **Place** (`place.py`): Validates explicit PE placements. Auto-places unplaced nodes via greedy bin-packing with locality heuristic (prefer PE with most connected neighbours).
5. **Allocate** (`allocate.py`): Assigns IRAM offsets (dyadic first, then monadic). Assigns activation IDs (one per function scope per PE). Computes frame layouts with match/const/dest slot allocation. Assigns frame references (fref) for each instruction. Resolves symbolic destinations to `FrameDest(target_pe, offset, act_id, port, token_kind)`.
6. **Codegen** (`codegen.py`): Generates PEConfig/SMConfig with frame layouts, IRAM, and routing tables. Produces seed tokens (MonadToken) for initialization. Supports direct mode (immediate execution) and token stream mode (initialization via IRAM writes).

## Dependencies

- **Uses**: `cm_inst` (Port, MemOp, ALUOp, Instruction, FrameDest, OutputStyle, TokenKind), `tokens` (MonadToken, SMToken, PELocalWriteToken, FrameControlToken), `sm_mod` (Presence, SMCell), `emu/types` (PEConfig, SMConfig), `lark` (parser)
- **Used by**: Test suite, user programs, `dfgraph/` (pipeline, graph_json use ir, lower, resolve, place, allocate, errors, opcodes), `monitor/` (backend uses `run_pipeline` and `generate_direct`; graph_json uses ir, opcodes)
- **Boundary**: `emu/` and root-level modules must NEVER import from `asm/`

## Key Decisions

- Frozen dataclasses for IR types: follows existing `tokens.py`/`cm_inst.py` patterns
- `TypeAwareOpToMnemonicDict` and `TypeAwareMonadicOpsSet` in opcodes.py: required because IntEnum subclasses share numeric values across types (e.g., `ArithOp.ADD == 0 == MemOp.READ`), so plain dict/set lookups would collide
- Errors use `IRGraph.errors` accumulation: all issues are reported rather than stopping at the first error
- `#` sigil for macro namespace: avoids collision with other sigils ($, &, @)
- `@ret` reserved prefix for return markers: in function bodies, creates trampolines with cross-context routing and `free_ctx`; in macro bodies, rewrites edges to call-site destinations (no context management)
- Per-call-site activation ID allocation: each function call site gets its own activation ID on the target PE, managed by CallSite metadata
- Opcode parameters (`${op}`) resolved via `MNEMONIC_TO_OP`: enables generic macros like `#reduce_2 add`
- Parameterized qualifiers (`|${pe}`, `:${port}`) resolved during expansion via `PlacementRef`, `PortRef`
- Built-in macros prepended to user source: `#loop_counted`, `#loop_while`, `#permit_inject` (variadic), `#reduce_2`/`_3`/`_4` (parameterized opcode)

## Invariants

- Each pass returns a new IRGraph; IRGraphs are never mutated after construction
- Names inside function regions are always qualified: `$funcname.&label`
- Macro scopes (`#macro_N`) don't consume activation IDs: they're inlined label namespaces
- Expanded names are qualified: `#macroname_N.&label` for global macros, `$func.#macro_N.&label` for function-scoped macros
- Double-scoped names in function call bodies: `$func.#macro_N.&label` when macro is expanded inside a function call site
- `CallSite` metadata drives per-call-site activation ID allocation: each unique call location gets one activation ID on the target PE
- After expansion, IR contains only concrete IRNode/IREdge entries; no ParamRef, MacroDef, or IRMacroCall entries remain
- After placement, every IRNode has `pe is not None`
- After allocation, every IRNode has `iram_offset`, `act_id`, `fref`, and `mode` set; destinations are `ResolvedDest` with concrete `FrameDest`
- Frame layouts are computed per activation with match/const/dest/sink slot regions
- Token stream order is always: SM init -> IRAM writes -> seed tokens

## Key Files

- `__init__.py` -- Public API and pipeline orchestration
- `ir.py` -- All IR type definitions (IRNode, IREdge, IRGraph, IRRegion, IRDataDef, SystemConfig, SourceLoc, NameRef, ResolvedDest, MacroDef, IRMacroCall, CallSite, etc.)
- `errors.py` -- Structured error types with source context (ErrorCategory, AssemblyError, format_error)
- `opcodes.py` -- Mnemonic-to-opcode mapping and arity (monadic vs dyadic) classification
- `expand.py` -- Macro expansion and function call wiring pass
- `builtins.py` -- Built-in macro library (BUILTIN_MACROS string constant)
- `codegen.py` -- `AssemblyResult` dataclass and both code generation modes

## Gotchas

- `MemOp.WRITE` arity depends on const: monadic when const is set (cell_addr from const), dyadic when const is None (cell_addr from left operand)
- `RoutingOp.FREE_FRAME` (frame deallocation) and `MemOp.FREE` (SM free) are disambiguated by opcode type: RoutingOp vs MemOp
- Frame layouts are computed per activation, not per instruction: all nodes in an activation share the same slot map but use different frame slots
- `fref` (frame reference offset) is instruction-specific and points to different slot regions depending on instruction mode
- Dyadic instructions must have iram_offset < matchable_offsets to use matching hardware; exceeding this generates a warning (AC5.8)

<!-- freshness: 2026-03-07 -->