# Macro Enhancements: Opcode Parameters, Qualified Ref Parameters, and @ret Wiring

Extends the dfasm macro system with three capabilities that reduce the need for per-variant macro definitions and make macros composable in the same way as functions.

## Current State

The macro system (implemented in `asm/expand.py`, grammar in `dfasm.lark`) supports:

- Parameter substitution in node names via `${param}` (token pasting with prefix/suffix)
- Parameter substitution in edge endpoints via `${param}` in `qualified_ref`
- Parameter substitution in const fields
- Compile-time arithmetic via `ConstExpr` (`${base} + ${_idx} + 1`)
- Variadic parameters with `@each` repetition blocks
- Nested macro invocation (depth limit 32)

Three gaps remain:

1. **Opcode position is not parameterizable.** The grammar defines `opcode: OPCODE` as a keyword terminal. You cannot pass an opcode as a macro argument. This forces per-opcode variants: `#reduce_add_2`, `#reduce_add_3`, etc.

2. **Placement and port qualifiers are not parameterizable.** The grammar defines `placement: "|" IDENT` and `port: ":" PORT_SPEC` — neither accepts `param_ref`. You cannot write `&ref:${port}` or `&ref|${pe}` in a macro body to parameterize which port or PE a reference targets.

3. **Macros have no output wiring convention.** Functions use `@ret` / `@ret_name` markers in their body, and the call syntax `$func args |> outputs` auto-wires return paths. Macros have no equivalent — the user must manually wire to expanded internal node names after invocation.

## Enhancement 1: Opcode Parameters

### Goal

Allow macro parameters to appear in the opcode position of `inst_def`, `strong_edge`, and `weak_edge` rules.

### Grammar Change

```lark
// Current:
opcode: OPCODE

// Proposed:
opcode: OPCODE | param_ref
```

This is the only grammar change needed. `param_ref` (`${name}`) is already a valid production. Earley parsing handles the ambiguity.

### Lower Pass

The `inst_def` handler in `lower.py` currently calls `self._resolve_opcode()` which maps mnemonic strings to `ALUOp`/`MemOp` values. When the opcode is a `ParamRef`, lowering must defer resolution — store the `ParamRef` on the `IRNode` in a new field (or overload the `opcode` field's type to `Union[ALUOp, MemOp, ParamRef]`).

The `strong_edge` and `weak_edge` handlers need the same treatment: if the opcode token is a `ParamRef`, create the anonymous node with a deferred opcode.

### Expand Pass

During `_clone_and_substitute_node`, if `node.opcode` is a `ParamRef`:

1. Look up the parameter in the substitution map
2. The argument value must be a string matching a known opcode mnemonic
3. Resolve via `MNEMONIC_TO_OP` to get the concrete `ALUOp`/`MemOp`
4. Replace the node's opcode with the resolved value
5. Error if the argument is not a valid opcode mnemonic

### Validation

Opcode validation (monadic/dyadic arity, valid argument combinations) already happens after expansion in the resolve and allocate passes. No additional validation needed at expansion time beyond confirming the mnemonic exists.

### Example

Before (current — per-opcode variants):

```
#reduce_add_2 |> { &r <| add }
#reduce_add_3 |> { &r0 <| add; &r1 <| add; &r0 |> &r1:L }
#reduce_sub_2 |> { &r <| sub }
; ... N variants per opcode
```

After (parameterized):

```
#reduce_2 op |> {
    &r <| ${op}
}

#reduce_3 op |> {
    &r0 <| ${op}
    &r1 <| ${op}
    &r0 |> &r1:L
}

; Usage:
#reduce_2 add
#reduce_3 sub
```

### Argument Syntax

Opcode arguments are passed as bare identifiers in the macro call. The grammar for `macro_call_stmt` already accepts `argument` which includes `qualified_ref`, and a bare `IDENT` would normally parse as... hmm, actually it won't. An unqualified `add` in argument position parses as the `OPCODE` terminal (priority 2), not as `IDENT`. And `OPCODE` is not a valid `argument`.

Two options:

**Option A: Quote opcode arguments.** Pass as string literals: `#reduce_2 "add"`. Simple, unambiguous. Expand pass strips quotes and resolves. Slightly ugly.

**Option B: Accept OPCODE as a macro argument.** Add `OPCODE` as an alternative in `positional_arg`:

```lark
// Current:
?positional_arg: value | qualified_ref

// Proposed:
?positional_arg: value | qualified_ref | OPCODE
```

The lower pass wraps the bare opcode token as a string argument in the `IRMacroCall`. Expand resolves it against `MNEMONIC_TO_OP`. This reads naturally: `#reduce_2 add`.

Option B is cleaner. The only risk is if someone has an `IDENT` that collides with an opcode name as a label/node, but the priority system already handles that (opcodes win at lexer level), and this collision already exists in the language.

**Recommendation: Option B.**


## Enhancement 2: Parameterized Placement and Port Qualifiers

### Goal

Allow `${param}` in the placement (`|pe0`) and port (`:L`) positions of a `qualified_ref`, so macros can parameterize which PE a node targets, which port an edge uses, and (when exposed) which context slot to use.

### Current State

`qualified_ref` is built from three parts:

```lark
qualified_ref: (node_ref | label_ref | ... | param_ref) placement? port?
placement: "|" IDENT
port:      ":" PORT_SPEC
PORT_SPEC: IDENT | HEX_LIT | DEC_LIT
```

`${param}` can already stand in for the entire ref part (the first element). But the `placement` and `port` suffixes only accept literal tokens. So `&node:${port}` and `&node|${pe}` don't parse.

In the lower pass, `qualified_ref` collects its children into a dict:
- The ref part becomes `{"name": ...}`
- `placement` returns a string (e.g., `"pe0"`)
- `port` returns a `Port` enum (`Port.L`, `Port.R`) or raw `int`

In the IR, `IRNode.pe` stores placement as `Optional[int]`, and `IREdge.port`/`IREdge.source_port` store port as `Port`. Neither field currently accepts `ParamRef`.

The expand pass (`_clone_and_substitute_node`, `_clone_and_substitute_edge`) only substitutes `name`, `const`, `source`, and `dest`. It does not touch `pe`, `port`, or `source_port`.

### Grammar Changes

```lark
// Current:
placement: "|" IDENT
port:      ":" PORT_SPEC

// Proposed:
placement: "|" (IDENT | param_ref)
port:      ":" (PORT_SPEC | param_ref)
```

### Lower Pass

The `placement` handler currently does `return str(token)`. It needs to handle receiving a `ParamRef` from the parser and return it as-is:

```python
def placement(self, *args):
    for arg in args:
        if isinstance(arg, ParamRef):
            return arg
    return str(args[-1])
```

Similarly, the `port` handler needs to pass through `ParamRef` instead of resolving to `Port`:

```python
def port(self, *args):
    for arg in args:
        if isinstance(arg, ParamRef):
            return arg
    # ... existing Port.L / Port.R / int resolution
```

The `qualified_ref` handler already iterates over args by type. It needs a new branch to detect `ParamRef` in placement/port positions (currently it only detects `ParamRef` in the ref-name position). The disambiguation is based on ordering: the ref-name comes first, placement second (prefixed with `|`), port third (prefixed with `:`). Since Lark processes them through their respective rules before `qualified_ref` sees them, the parser distinguishes them. The `qualified_ref` handler just needs to accept `ParamRef` for placement and port:

```python
def qualified_ref(self, *args):
    ref_type = None
    placement = None
    port = None
    for arg in args:
        if isinstance(arg, ParamRef) and ref_type is None:
            ref_type = {"name": arg}
        elif isinstance(arg, ParamRef) and ref_type is not None:
            # Second or third ParamRef — depends on position
            # But Lark gives us placement/port through their handlers,
            # so we get ParamRef from the placement() or port() handler.
            # Need to distinguish: placement handler adds a marker or
            # we rely on Lark's rule names.
            ...
```

Actually, this is simpler than it looks. Lark calls `placement()` and `port()` before `qualified_ref()`. So `qualified_ref` receives:
- A dict or `ParamRef` (from the ref-name rules)
- A string or `ParamRef` (from the `placement` handler)
- A `Port`/`int` or `ParamRef` (from the `port` handler)

The existing type-based dispatch in `qualified_ref` needs one addition: if an arg is `ParamRef` and `ref_type` is already set, it's either placement or port. We can distinguish by wrapping them — the placement handler returns `("placement", ParamRef(...))` and port returns `("port", ParamRef(...))` when deferring. Or simpler: use a thin wrapper type.

Alternatively, Lark's `@v_args(inline=True)` on placement/port means the handler already knows which rule matched. The cleanest approach: return a `ParamRef` tagged with its role:

```python
@dataclass(frozen=True)
class PlacementRef:
    """Deferred placement from macro parameter."""
    param: ParamRef

@dataclass(frozen=True)
class PortRef:
    """Deferred port from macro parameter."""
    param: ParamRef
```

Then `qualified_ref` type-dispatches on `PlacementRef`/`PortRef` alongside `str`/`Port`/`int`.

### IR Changes

`IRNode.pe` type becomes `Optional[Union[int, ParamRef]]`.

`IREdge.port` type becomes `Union[Port, ParamRef]`.

`IREdge.source_port` type becomes `Optional[Union[Port, ParamRef]]`.

These wider types only appear in macro template bodies. After expansion, all `ParamRef` values are resolved to concrete types. The resolve, place, and allocate passes never see `ParamRef` — if one leaks through, it's a bug in expand.

### Expand Pass

`_clone_and_substitute_node` gains:

```python
# Substitute PE placement if it's a ParamRef
new_pe = node.pe
if isinstance(new_pe, ParamRef):
    resolved = _substitute_param(new_pe, subst_map)
    # Must resolve to a PE identifier string like "pe0" or an int
    new_pe = _resolve_pe_placement(resolved)  # parse "pe0" -> 0, or int -> int
```

`_clone_and_substitute_edge` gains:

```python
# Substitute port if it's a ParamRef
new_port = edge.port
if isinstance(new_port, ParamRef):
    resolved = _substitute_param(new_port, subst_map)
    new_port = _resolve_port(resolved)  # "L" -> Port.L, "R" -> Port.R, int -> int

new_source_port = edge.source_port
if isinstance(new_source_port, ParamRef):
    resolved = _substitute_param(new_source_port, subst_map)
    new_source_port = _resolve_port(resolved)
```

### Validation

Invalid port/placement values (e.g., passing `"banana"` as a port) produce a MACRO error during expansion. Post-expansion, the existing place and allocate passes validate that PE IDs are in range and ports are valid.

### Examples

Parameterized port selection:

```
; Macro that wires to a caller-selected port
#wire_to_port target, port |> {
    &src <| pass
    &src |> ${target}:${port}
}

; Usage: wire to left port
#wire_to_port &dest, L

; Usage: wire to right port
#wire_to_port &dest, R
```

Parameterized PE placement:

```
; Macro that places its node on a specific PE
#placed_const val, pe |> {
    &c <| const, ${val} |${pe}
    &c |> @ret
}

; Usage: place on pe0
#placed_const 42, pe0 |> &target

; Usage: place on pe1
#placed_const 42, pe1 |> &target
```

Combined — a macro that builds a cross-PE relay:

```
; Route a value from one PE to another
#cross_pe_relay src_pe, dst_pe |> {
    &hop <| pass |${src_pe}
    &hop |> @ret
}

; Usage:
#cross_pe_relay pe0, pe1 |> &destination
```

### Context Slot Syntax

Context slots use bracket syntax `[N]`, distinct from all other qualifiers:

```
&node|pe0[2]       ; place on pe0, context slot 2
&node[0]           ; context slot 0, auto-placed PE
&node|pe1[0..4]    ; reserve context slots 0-4 for this instruction
```

The bracket syntax avoids overloading `:` (which already carries port, cell address, and potentially IRAM address semantics). `[N]` is exclusively context slots.

#### Grammar

```lark
// New production:
ctx_slot: "[" (DEC_LIT | ctx_range | param_ref) "]"
ctx_range: DEC_LIT ".." DEC_LIT

// Updated qualified_ref:
qualified_ref: (node_ref | label_ref | ... | param_ref) placement? ctx_slot? port?
```

`ctx_slot` appears between placement and port in the qualifier chain: `&node|pe0[2]:L`.

#### Use Cases

- **Explicit context partitioning**: place parallel computations in distinct context slots to avoid matching store collisions
- **Debugging**: force a known context layout for inspection
- **Range reservation** (`[0..4]`): reserve a contiguous block of slots for an instruction that will be targeted by multiple parallel sources wired identically — not essential but a natural extension

#### Parameterization

Same mechanism as placement/port. `[${ctx}]` in a macro body, substituted to an integer during expansion:

```
#placed_op op, pe, ctx |> {
    &n <| ${op} |${pe}[${ctx}]
    &n |> @ret
}

; Usage:
#placed_op add, pe0, 2 |> &target
```


## Enhancement 3: @ret Wiring for Macros

### Goal

Allow macros to define output points using `@ret` / `@ret_name` markers, and wire them to destinations at the call site using the `|>` syntax.

### Grammar Change

Add optional output list to `macro_call_stmt`:

```lark
// Current:
macro_call_stmt: "#" IDENT (argument ("," argument)*)?

// Proposed:
macro_call_stmt: "#" IDENT (argument ("," argument)*)? (FLOW_OUT call_output_list)?
```

This reuses the existing `call_output_list` and `call_output` productions from `call_stmt`. Same syntax: `#macro args |> &dest` or `#macro args |> name=&dest`.

### Macro Body Convention

Macro bodies use `@ret` and `@ret_name` in edge destinations, same as function bodies:

```
#loop op, init_val |> {
    &counter <| add
    &compare <| ${op}
    &counter |> &compare:L
    &inc <| inc
    &compare |> &inc:L
    &inc |> &counter:R
    ; Output edges use @ret convention
    &compare |> @ret_body
    &compare |> @ret_exit:R
}
```

### Lower Pass

When lowering `macro_call_stmt` with a `FLOW_OUT` and `call_output_list`:

1. Parse the output list the same way `call_stmt` does (named/positional outputs)
2. Store output destinations on the `IRMacroCall` in a new field: `output_dests: tuple`

The `IRMacroCall` dataclass gains:

```python
@dataclass(frozen=True)
class IRMacroCall:
    name: str
    positional_args: tuple
    named_args: tuple
    output_dests: tuple = ()   # New: output wiring destinations
    loc: Optional[SourceLoc] = None
```

### Expand Pass

After cloning and substituting the macro body, process `@ret` markers:

1. Scan expanded edges for destinations starting with `@ret`
2. For each `@ret` / `@ret_name` destination, look up the corresponding output from `IRMacroCall.output_dests`
3. Replace the `@ret*` destination with the actual target node name
4. If a `@ret*` marker has no matching output dest, report a MACRO error

This is simpler than function call wiring because macros don't need:
- Trampoline nodes (no cross-context routing)
- `ctx_override` edges (macros inline into the caller's context)
- `FREE_CTX` nodes (no context allocation)
- Synthetic PASS nodes (direct edge replacement suffices)

The `@ret` substitution in macros is purely edge rewriting — replace the symbolic `@ret_name` destination with the concrete node reference from the call site.

### Positional @ret Mapping

Same convention as function calls:

- Bare `@ret` maps to the first (or only) positional output
- `@ret_name` maps to the named output `name=&dest`
- Multiple bare `@ret` edges to different ports on the same output are valid

### Example

```
; Define macro with outputs
#loop_counted |> {
    &counter <| add
    &compare <| brgt
    &counter |> &compare:L
    &inc <| inc
    &compare |> &inc:L
    &inc |> &counter:R
    &compare |> @ret_body
    &compare |> @ret_exit:R
}

; Invoke with output wiring
#loop_counted |> body=&process, exit=&done
&init |> #loop_counted_0.&counter:L
&limit |> #loop_counted_0.&compare:R
```

Or positionally:

```
#simple_gate |> {
    &g <| gate
    &g |> @ret
    &g |> @ret:R   ; second output port
}

; Invoke — positional @ret maps to first output
#simple_gate |> &body, &exit
```


## Impact on Built-in Macros

With both enhancements, the built-in library collapses significantly:

### Current (11 macros)

```
#loop_counted, #loop_while
#permit_inject_1, #permit_inject_2, #permit_inject_3, #permit_inject_4
#reduce_add_2, #reduce_add_3, #reduce_add_4
```

### Proposed (4-5 macros, more capable)

```
; Counted loop with output wiring
#loop_counted |> {
    &counter <| add
    &compare <| brgt
    &counter |> &compare:L
    &inc <| inc
    &compare |> &inc:L
    &inc |> &counter:R
    &compare |> @ret_body
    &compare |> @ret_exit:R
}

; Condition-tested loop
#loop_while |> {
    &gate <| gate
    &gate |> @ret_body
    &gate |> @ret_exit:R
}

; Permit injection — variadic, outputs via @ret
#permit_inject *nodes |> {
    $(
        &p_${_idx} <| const, 1
        &p_${_idx} |> @ret
    ),*
}

; Binary reduction tree — parameterized opcode + arity
#reduce_2 op |> {
    &r <| ${op}
}

#reduce_3 op |> {
    &r0 <| ${op}
    &r1 <| ${op}
    &r0 |> &r1:L
}

#reduce_4 op |> {
    &r0 <| ${op}
    &r1 <| ${op}
    &r2 <| ${op}
    &r0 |> &r2:L
    &r1 |> &r2:R
}
```

Usage:

```
; Old:
!#loop_counted
&init |> #loop_counted_0.&counter:L
&limit |> #loop_counted_0.&compare:R
#loop_counted_0.&compare |> &body:L
#loop_counted_0.&compare |> &exit:R

; New:
#loop_counted |> body=&process, exit=&done
&init |> #loop_counted_0.&counter:L
&limit |> #loop_counted_0.&compare:R

; Old:
!#reduce_add_4

; New:
#reduce_4 add
```

Note: the `#permit_inject` example with variadic `@ret` is aspirational — it requires `@ret` to work inside repetition blocks, which means the `@ret` substitution must happen after repetition expansion. This ordering is already correct since repetition expansion happens before edge rewriting in the expand pass.


## Implementation Order

1. **Opcode parameters** — grammar change (`opcode: OPCODE | param_ref`), argument syntax (`positional_arg: ... | OPCODE`), expand pass substitution. Smallest diff, immediately useful.

2. **Qualified ref parameters** — grammar changes to `placement` and `port`, `PlacementRef`/`PortRef` wrapper types, IR type widening, expand pass substitution. Mechanically similar to opcode params, builds on the same `_substitute_param` infrastructure.

3. **@ret wiring for macros** — grammar change (output list on `macro_call_stmt`), `IRMacroCall.output_dests`, expand pass edge rewriting. Builds on existing `@ret` patterns from function calls.

4. **Built-in macro rewrite** — collapse per-variant macros using the new features. Backwards-incompatible (old macro names removed), but since the built-ins are bundled and the system is pre-1.0, this is acceptable.

## Open Questions

1. **Should macros with `@ret` also support `|>` on inputs?** Function calls use `$func a=&x |> @output`. Currently macro calls use `#macro arg1, arg2` for inputs. Adding `|>` for outputs is proposed above. Should inputs also support named wiring? Probably not needed — macros already have `${param}` for inputs, and the input wiring is fundamentally different (parameter substitution vs edge creation).

2. **Error messages for mismatched @ret counts.** If a macro body has `@ret_body` and `@ret_exit` but the call site only provides one output, what error? Probably MACRO category: "macro '#loop_counted' defines outputs @ret_body, @ret_exit but call provides 1 output".

3. **Interaction with nested macros.** If macro A calls macro B which has `@ret`, and A also has `@ret`, the scoping should work naturally — B's `@ret` resolves at B's call site (inside A's body), A's `@ret` resolves at A's call site. The existing scope qualification prevents name collisions.