# Macro Enhancements: Opcode Parameters, Qualified Ref Parameters, and @ret Wiring Extends the dfasm macro system with three capabilities that reduce the need for per-variant macro definitions and make macros composable in the same way as functions. ## Current State The macro system (implemented in `asm/expand.py`, grammar in `dfasm.lark`) supports: - Parameter substitution in node names via `${param}` (token pasting with prefix/suffix) - Parameter substitution in edge endpoints via `${param}` in `qualified_ref` - Parameter substitution in const fields - Compile-time arithmetic via `ConstExpr` (`${base} + ${_idx} + 1`) - Variadic parameters with `@each` repetition blocks - Nested macro invocation (depth limit 32) Three gaps remain: 1. **Opcode position is not parameterizable.** The grammar defines `opcode: OPCODE` as a keyword terminal. You cannot pass an opcode as a macro argument. This forces per-opcode variants: `#reduce_add_2`, `#reduce_add_3`, etc. 2. **Placement and port qualifiers are not parameterizable.** The grammar defines `placement: "|" IDENT` and `port: ":" PORT_SPEC` — neither accepts `param_ref`. You cannot write `&ref:${port}` or `&ref|${pe}` in a macro body to parameterize which port or PE a reference targets. 3. **Macros have no output wiring convention.** Functions use `@ret` / `@ret_name` markers in their body, and the call syntax `$func args |> outputs` auto-wires return paths. Macros have no equivalent — the user must manually wire to expanded internal node names after invocation. ## Enhancement 1: Opcode Parameters ### Goal Allow macro parameters to appear in the opcode position of `inst_def`, `strong_edge`, and `weak_edge` rules. ### Grammar Change ```lark // Current: opcode: OPCODE // Proposed: opcode: OPCODE | param_ref ``` This is the only grammar change needed. `param_ref` (`${name}`) is already a valid production. Earley parsing handles the ambiguity. ### Lower Pass The `inst_def` handler in `lower.py` currently calls `self._resolve_opcode()` which maps mnemonic strings to `ALUOp`/`MemOp` values. When the opcode is a `ParamRef`, lowering must defer resolution — store the `ParamRef` on the `IRNode` in a new field (or overload the `opcode` field's type to `Union[ALUOp, MemOp, ParamRef]`). The `strong_edge` and `weak_edge` handlers need the same treatment: if the opcode token is a `ParamRef`, create the anonymous node with a deferred opcode. ### Expand Pass During `_clone_and_substitute_node`, if `node.opcode` is a `ParamRef`: 1. Look up the parameter in the substitution map 2. The argument value must be a string matching a known opcode mnemonic 3. Resolve via `MNEMONIC_TO_OP` to get the concrete `ALUOp`/`MemOp` 4. Replace the node's opcode with the resolved value 5. Error if the argument is not a valid opcode mnemonic ### Validation Opcode validation (monadic/dyadic arity, valid argument combinations) already happens after expansion in the resolve and allocate passes. No additional validation needed at expansion time beyond confirming the mnemonic exists. ### Example Before (current — per-opcode variants): ``` #reduce_add_2 |> { &r <| add } #reduce_add_3 |> { &r0 <| add; &r1 <| add; &r0 |> &r1:L } #reduce_sub_2 |> { &r <| sub } ; ... N variants per opcode ``` After (parameterized): ``` #reduce_2 op |> { &r <| ${op} } #reduce_3 op |> { &r0 <| ${op} &r1 <| ${op} &r0 |> &r1:L } ; Usage: #reduce_2 add #reduce_3 sub ``` ### Argument Syntax Opcode arguments are passed as bare identifiers in the macro call. The grammar for `macro_call_stmt` already accepts `argument` which includes `qualified_ref`, and a bare `IDENT` would normally parse as... hmm, actually it won't. An unqualified `add` in argument position parses as the `OPCODE` terminal (priority 2), not as `IDENT`. And `OPCODE` is not a valid `argument`. Two options: **Option A: Quote opcode arguments.** Pass as string literals: `#reduce_2 "add"`. Simple, unambiguous. Expand pass strips quotes and resolves. Slightly ugly. **Option B: Accept OPCODE as a macro argument.** Add `OPCODE` as an alternative in `positional_arg`: ```lark // Current: ?positional_arg: value | qualified_ref // Proposed: ?positional_arg: value | qualified_ref | OPCODE ``` The lower pass wraps the bare opcode token as a string argument in the `IRMacroCall`. Expand resolves it against `MNEMONIC_TO_OP`. This reads naturally: `#reduce_2 add`. Option B is cleaner. The only risk is if someone has an `IDENT` that collides with an opcode name as a label/node, but the priority system already handles that (opcodes win at lexer level), and this collision already exists in the language. **Recommendation: Option B.** ## Enhancement 2: Parameterized Placement and Port Qualifiers ### Goal Allow `${param}` in the placement (`|pe0`) and port (`:L`) positions of a `qualified_ref`, so macros can parameterize which PE a node targets, which port an edge uses, and (when exposed) which context slot to use. ### Current State `qualified_ref` is built from three parts: ```lark qualified_ref: (node_ref | label_ref | ... | param_ref) placement? port? placement: "|" IDENT port: ":" PORT_SPEC PORT_SPEC: IDENT | HEX_LIT | DEC_LIT ``` `${param}` can already stand in for the entire ref part (the first element). But the `placement` and `port` suffixes only accept literal tokens. So `&node:${port}` and `&node|${pe}` don't parse. In the lower pass, `qualified_ref` collects its children into a dict: - The ref part becomes `{"name": ...}` - `placement` returns a string (e.g., `"pe0"`) - `port` returns a `Port` enum (`Port.L`, `Port.R`) or raw `int` In the IR, `IRNode.pe` stores placement as `Optional[int]`, and `IREdge.port`/`IREdge.source_port` store port as `Port`. Neither field currently accepts `ParamRef`. The expand pass (`_clone_and_substitute_node`, `_clone_and_substitute_edge`) only substitutes `name`, `const`, `source`, and `dest`. It does not touch `pe`, `port`, or `source_port`. ### Grammar Changes ```lark // Current: placement: "|" IDENT port: ":" PORT_SPEC // Proposed: placement: "|" (IDENT | param_ref) port: ":" (PORT_SPEC | param_ref) ``` ### Lower Pass The `placement` handler currently does `return str(token)`. It needs to handle receiving a `ParamRef` from the parser and return it as-is: ```python def placement(self, *args): for arg in args: if isinstance(arg, ParamRef): return arg return str(args[-1]) ``` Similarly, the `port` handler needs to pass through `ParamRef` instead of resolving to `Port`: ```python def port(self, *args): for arg in args: if isinstance(arg, ParamRef): return arg # ... existing Port.L / Port.R / int resolution ``` The `qualified_ref` handler already iterates over args by type. It needs a new branch to detect `ParamRef` in placement/port positions (currently it only detects `ParamRef` in the ref-name position). The disambiguation is based on ordering: the ref-name comes first, placement second (prefixed with `|`), port third (prefixed with `:`). Since Lark processes them through their respective rules before `qualified_ref` sees them, the parser distinguishes them. The `qualified_ref` handler just needs to accept `ParamRef` for placement and port: ```python def qualified_ref(self, *args): ref_type = None placement = None port = None for arg in args: if isinstance(arg, ParamRef) and ref_type is None: ref_type = {"name": arg} elif isinstance(arg, ParamRef) and ref_type is not None: # Second or third ParamRef — depends on position # But Lark gives us placement/port through their handlers, # so we get ParamRef from the placement() or port() handler. # Need to distinguish: placement handler adds a marker or # we rely on Lark's rule names. ... ``` Actually, this is simpler than it looks. Lark calls `placement()` and `port()` before `qualified_ref()`. So `qualified_ref` receives: - A dict or `ParamRef` (from the ref-name rules) - A string or `ParamRef` (from the `placement` handler) - A `Port`/`int` or `ParamRef` (from the `port` handler) The existing type-based dispatch in `qualified_ref` needs one addition: if an arg is `ParamRef` and `ref_type` is already set, it's either placement or port. We can distinguish by wrapping them — the placement handler returns `("placement", ParamRef(...))` and port returns `("port", ParamRef(...))` when deferring. Or simpler: use a thin wrapper type. Alternatively, Lark's `@v_args(inline=True)` on placement/port means the handler already knows which rule matched. The cleanest approach: return a `ParamRef` tagged with its role: ```python @dataclass(frozen=True) class PlacementRef: """Deferred placement from macro parameter.""" param: ParamRef @dataclass(frozen=True) class PortRef: """Deferred port from macro parameter.""" param: ParamRef ``` Then `qualified_ref` type-dispatches on `PlacementRef`/`PortRef` alongside `str`/`Port`/`int`. ### IR Changes `IRNode.pe` type becomes `Optional[Union[int, ParamRef]]`. `IREdge.port` type becomes `Union[Port, ParamRef]`. `IREdge.source_port` type becomes `Optional[Union[Port, ParamRef]]`. These wider types only appear in macro template bodies. After expansion, all `ParamRef` values are resolved to concrete types. The resolve, place, and allocate passes never see `ParamRef` — if one leaks through, it's a bug in expand. ### Expand Pass `_clone_and_substitute_node` gains: ```python # Substitute PE placement if it's a ParamRef new_pe = node.pe if isinstance(new_pe, ParamRef): resolved = _substitute_param(new_pe, subst_map) # Must resolve to a PE identifier string like "pe0" or an int new_pe = _resolve_pe_placement(resolved) # parse "pe0" -> 0, or int -> int ``` `_clone_and_substitute_edge` gains: ```python # Substitute port if it's a ParamRef new_port = edge.port if isinstance(new_port, ParamRef): resolved = _substitute_param(new_port, subst_map) new_port = _resolve_port(resolved) # "L" -> Port.L, "R" -> Port.R, int -> int new_source_port = edge.source_port if isinstance(new_source_port, ParamRef): resolved = _substitute_param(new_source_port, subst_map) new_source_port = _resolve_port(resolved) ``` ### Validation Invalid port/placement values (e.g., passing `"banana"` as a port) produce a MACRO error during expansion. Post-expansion, the existing place and allocate passes validate that PE IDs are in range and ports are valid. ### Examples Parameterized port selection: ``` ; Macro that wires to a caller-selected port #wire_to_port target, port |> { &src <| pass &src |> ${target}:${port} } ; Usage: wire to left port #wire_to_port &dest, L ; Usage: wire to right port #wire_to_port &dest, R ``` Parameterized PE placement: ``` ; Macro that places its node on a specific PE #placed_const val, pe |> { &c <| const, ${val} |${pe} &c |> @ret } ; Usage: place on pe0 #placed_const 42, pe0 |> &target ; Usage: place on pe1 #placed_const 42, pe1 |> &target ``` Combined — a macro that builds a cross-PE relay: ``` ; Route a value from one PE to another #cross_pe_relay src_pe, dst_pe |> { &hop <| pass |${src_pe} &hop |> @ret } ; Usage: #cross_pe_relay pe0, pe1 |> &destination ``` ### Context Slot Syntax Context slots use bracket syntax `[N]`, distinct from all other qualifiers: ``` &node|pe0[2] ; place on pe0, context slot 2 &node[0] ; context slot 0, auto-placed PE &node|pe1[0..4] ; reserve context slots 0-4 for this instruction ``` The bracket syntax avoids overloading `:` (which already carries port, cell address, and potentially IRAM address semantics). `[N]` is exclusively context slots. #### Grammar ```lark // New production: ctx_slot: "[" (DEC_LIT | ctx_range | param_ref) "]" ctx_range: DEC_LIT ".." DEC_LIT // Updated qualified_ref: qualified_ref: (node_ref | label_ref | ... | param_ref) placement? ctx_slot? port? ``` `ctx_slot` appears between placement and port in the qualifier chain: `&node|pe0[2]:L`. #### Use Cases - **Explicit context partitioning**: place parallel computations in distinct context slots to avoid matching store collisions - **Debugging**: force a known context layout for inspection - **Range reservation** (`[0..4]`): reserve a contiguous block of slots for an instruction that will be targeted by multiple parallel sources wired identically — not essential but a natural extension #### Parameterization Same mechanism as placement/port. `[${ctx}]` in a macro body, substituted to an integer during expansion: ``` #placed_op op, pe, ctx |> { &n <| ${op} |${pe}[${ctx}] &n |> @ret } ; Usage: #placed_op add, pe0, 2 |> &target ``` ## Enhancement 3: @ret Wiring for Macros ### Goal Allow macros to define output points using `@ret` / `@ret_name` markers, and wire them to destinations at the call site using the `|>` syntax. ### Grammar Change Add optional output list to `macro_call_stmt`: ```lark // Current: macro_call_stmt: "#" IDENT (argument ("," argument)*)? // Proposed: macro_call_stmt: "#" IDENT (argument ("," argument)*)? (FLOW_OUT call_output_list)? ``` This reuses the existing `call_output_list` and `call_output` productions from `call_stmt`. Same syntax: `#macro args |> &dest` or `#macro args |> name=&dest`. ### Macro Body Convention Macro bodies use `@ret` and `@ret_name` in edge destinations, same as function bodies: ``` #loop op, init_val |> { &counter <| add &compare <| ${op} &counter |> &compare:L &inc <| inc &compare |> &inc:L &inc |> &counter:R ; Output edges use @ret convention &compare |> @ret_body &compare |> @ret_exit:R } ``` ### Lower Pass When lowering `macro_call_stmt` with a `FLOW_OUT` and `call_output_list`: 1. Parse the output list the same way `call_stmt` does (named/positional outputs) 2. Store output destinations on the `IRMacroCall` in a new field: `output_dests: tuple` The `IRMacroCall` dataclass gains: ```python @dataclass(frozen=True) class IRMacroCall: name: str positional_args: tuple named_args: tuple output_dests: tuple = () # New: output wiring destinations loc: Optional[SourceLoc] = None ``` ### Expand Pass After cloning and substituting the macro body, process `@ret` markers: 1. Scan expanded edges for destinations starting with `@ret` 2. For each `@ret` / `@ret_name` destination, look up the corresponding output from `IRMacroCall.output_dests` 3. Replace the `@ret*` destination with the actual target node name 4. If a `@ret*` marker has no matching output dest, report a MACRO error This is simpler than function call wiring because macros don't need: - Trampoline nodes (no cross-context routing) - `ctx_override` edges (macros inline into the caller's context) - `FREE_CTX` nodes (no context allocation) - Synthetic PASS nodes (direct edge replacement suffices) The `@ret` substitution in macros is purely edge rewriting — replace the symbolic `@ret_name` destination with the concrete node reference from the call site. ### Positional @ret Mapping Same convention as function calls: - Bare `@ret` maps to the first (or only) positional output - `@ret_name` maps to the named output `name=&dest` - Multiple bare `@ret` edges to different ports on the same output are valid ### Example ``` ; Define macro with outputs #loop_counted |> { &counter <| add &compare <| brgt &counter |> &compare:L &inc <| inc &compare |> &inc:L &inc |> &counter:R &compare |> @ret_body &compare |> @ret_exit:R } ; Invoke with output wiring #loop_counted |> body=&process, exit=&done &init |> #loop_counted_0.&counter:L &limit |> #loop_counted_0.&compare:R ``` Or positionally: ``` #simple_gate |> { &g <| gate &g |> @ret &g |> @ret:R ; second output port } ; Invoke — positional @ret maps to first output #simple_gate |> &body, &exit ``` ## Impact on Built-in Macros With both enhancements, the built-in library collapses significantly: ### Current (11 macros) ``` #loop_counted, #loop_while #permit_inject_1, #permit_inject_2, #permit_inject_3, #permit_inject_4 #reduce_add_2, #reduce_add_3, #reduce_add_4 ``` ### Proposed (4-5 macros, more capable) ``` ; Counted loop with output wiring #loop_counted |> { &counter <| add &compare <| brgt &counter |> &compare:L &inc <| inc &compare |> &inc:L &inc |> &counter:R &compare |> @ret_body &compare |> @ret_exit:R } ; Condition-tested loop #loop_while |> { &gate <| gate &gate |> @ret_body &gate |> @ret_exit:R } ; Permit injection — variadic, outputs via @ret #permit_inject *nodes |> { $( &p_${_idx} <| const, 1 &p_${_idx} |> @ret ),* } ; Binary reduction tree — parameterized opcode + arity #reduce_2 op |> { &r <| ${op} } #reduce_3 op |> { &r0 <| ${op} &r1 <| ${op} &r0 |> &r1:L } #reduce_4 op |> { &r0 <| ${op} &r1 <| ${op} &r2 <| ${op} &r0 |> &r2:L &r1 |> &r2:R } ``` Usage: ``` ; Old: !#loop_counted &init |> #loop_counted_0.&counter:L &limit |> #loop_counted_0.&compare:R #loop_counted_0.&compare |> &body:L #loop_counted_0.&compare |> &exit:R ; New: #loop_counted |> body=&process, exit=&done &init |> #loop_counted_0.&counter:L &limit |> #loop_counted_0.&compare:R ; Old: !#reduce_add_4 ; New: #reduce_4 add ``` Note: the `#permit_inject` example with variadic `@ret` is aspirational — it requires `@ret` to work inside repetition blocks, which means the `@ret` substitution must happen after repetition expansion. This ordering is already correct since repetition expansion happens before edge rewriting in the expand pass. ## Implementation Order 1. **Opcode parameters** — grammar change (`opcode: OPCODE | param_ref`), argument syntax (`positional_arg: ... | OPCODE`), expand pass substitution. Smallest diff, immediately useful. 2. **Qualified ref parameters** — grammar changes to `placement` and `port`, `PlacementRef`/`PortRef` wrapper types, IR type widening, expand pass substitution. Mechanically similar to opcode params, builds on the same `_substitute_param` infrastructure. 3. **@ret wiring for macros** — grammar change (output list on `macro_call_stmt`), `IRMacroCall.output_dests`, expand pass edge rewriting. Builds on existing `@ret` patterns from function calls. 4. **Built-in macro rewrite** — collapse per-variant macros using the new features. Backwards-incompatible (old macro names removed), but since the built-ins are bundled and the system is pre-1.0, this is acceptable. ## Open Questions 1. **Should macros with `@ret` also support `|>` on inputs?** Function calls use `$func a=&x |> @output`. Currently macro calls use `#macro arg1, arg2` for inputs. Adding `|>` for outputs is proposed above. Should inputs also support named wiring? Probably not needed — macros already have `${param}` for inputs, and the input wiring is fundamentally different (parameter substitution vs edge creation). 2. **Error messages for mismatched @ret counts.** If a macro body has `@ret_body` and `@ret_exit` but the call site only provides one output, what error? Probably MACRO category: "macro '#loop_counted' defines outputs @ret_body, @ret_exit but call provides 1 output". 3. **Interaction with nested macros.** If macro A calls macro B which has `@ret`, and A also has `@ret`, the scoping should work naturally — B's `@ret` resolves at B's call site (inside A's body), A's `@ret` resolves at A's call site. The existing scope qualification prevents name collisions.