OR-1 dataflow CPU sketch

Macro Enhancements: Opcode Parameters, Qualified Ref Parameters, and @ret Wiring#

Extends the dfasm macro system with three capabilities that reduce the need for per-variant macro definitions and make macros composable in the same way as functions.

Current State#

The macro system (implemented in asm/expand.py, grammar in dfasm.lark) supports:

  • Parameter substitution in node names via ${param} (token pasting with prefix/suffix)
  • Parameter substitution in edge endpoints via ${param} in qualified_ref
  • Parameter substitution in const fields
  • Compile-time arithmetic via ConstExpr (${base} + ${_idx} + 1)
  • Variadic parameters with @each repetition blocks
  • Nested macro invocation (depth limit 32)

Three gaps remain:

  1. Opcode position is not parameterizable. The grammar defines opcode: OPCODE as a keyword terminal. You cannot pass an opcode as a macro argument. This forces per-opcode variants: #reduce_add_2, #reduce_add_3, etc.

  2. Placement and port qualifiers are not parameterizable. The grammar defines placement: "|" IDENT and port: ":" PORT_SPEC — neither accepts param_ref. You cannot write &ref:${port} or &ref|${pe} in a macro body to parameterize which port or PE a reference targets.

  3. Macros have no output wiring convention. Functions use @ret / @ret_name markers in their body, and the call syntax $func args |> outputs auto-wires return paths. Macros have no equivalent — the user must manually wire to expanded internal node names after invocation.

Enhancement 1: Opcode Parameters#

Goal#

Allow macro parameters to appear in the opcode position of inst_def, strong_edge, and weak_edge rules.

Grammar Change#

// Current:
opcode: OPCODE

// Proposed:
opcode: OPCODE | param_ref

This is the only grammar change needed. param_ref (${name}) is already a valid production. Earley parsing handles the ambiguity.

Lower Pass#

The inst_def handler in lower.py currently calls self._resolve_opcode() which maps mnemonic strings to ALUOp/MemOp values. When the opcode is a ParamRef, lowering must defer resolution — store the ParamRef on the IRNode in a new field (or overload the opcode field's type to Union[ALUOp, MemOp, ParamRef]).

The strong_edge and weak_edge handlers need the same treatment: if the opcode token is a ParamRef, create the anonymous node with a deferred opcode.

Expand Pass#

During _clone_and_substitute_node, if node.opcode is a ParamRef:

  1. Look up the parameter in the substitution map
  2. The argument value must be a string matching a known opcode mnemonic
  3. Resolve via MNEMONIC_TO_OP to get the concrete ALUOp/MemOp
  4. Replace the node's opcode with the resolved value
  5. Error if the argument is not a valid opcode mnemonic

Validation#

Opcode validation (monadic/dyadic arity, valid argument combinations) already happens after expansion in the resolve and allocate passes. No additional validation needed at expansion time beyond confirming the mnemonic exists.

Example#

Before (current — per-opcode variants):

#reduce_add_2 |> { &r <| add }
#reduce_add_3 |> { &r0 <| add; &r1 <| add; &r0 |> &r1:L }
#reduce_sub_2 |> { &r <| sub }
; ... N variants per opcode

After (parameterized):

#reduce_2 op |> {
    &r <| ${op}
}

#reduce_3 op |> {
    &r0 <| ${op}
    &r1 <| ${op}
    &r0 |> &r1:L
}

; Usage:
#reduce_2 add
#reduce_3 sub

Argument Syntax#

Opcode arguments are passed as bare identifiers in the macro call. The grammar for macro_call_stmt already accepts argument which includes qualified_ref, and a bare IDENT would normally parse as... hmm, actually it won't. An unqualified add in argument position parses as the OPCODE terminal (priority 2), not as IDENT. And OPCODE is not a valid argument.

Two options:

Option A: Quote opcode arguments. Pass as string literals: #reduce_2 "add". Simple, unambiguous. Expand pass strips quotes and resolves. Slightly ugly.

Option B: Accept OPCODE as a macro argument. Add OPCODE as an alternative in positional_arg:

// Current:
?positional_arg: value | qualified_ref

// Proposed:
?positional_arg: value | qualified_ref | OPCODE

The lower pass wraps the bare opcode token as a string argument in the IRMacroCall. Expand resolves it against MNEMONIC_TO_OP. This reads naturally: #reduce_2 add.

Option B is cleaner. The only risk is if someone has an IDENT that collides with an opcode name as a label/node, but the priority system already handles that (opcodes win at lexer level), and this collision already exists in the language.

Recommendation: Option B.

Enhancement 2: Parameterized Placement and Port Qualifiers#

Goal#

Allow ${param} in the placement (|pe0) and port (:L) positions of a qualified_ref, so macros can parameterize which PE a node targets, which port an edge uses, and (when exposed) which context slot to use.

Current State#

qualified_ref is built from three parts:

qualified_ref: (node_ref | label_ref | ... | param_ref) placement? port?
placement: "|" IDENT
port:      ":" PORT_SPEC
PORT_SPEC: IDENT | HEX_LIT | DEC_LIT

${param} can already stand in for the entire ref part (the first element). But the placement and port suffixes only accept literal tokens. So &node:${port} and &node|${pe} don't parse.

In the lower pass, qualified_ref collects its children into a dict:

  • The ref part becomes {"name": ...}
  • placement returns a string (e.g., "pe0")
  • port returns a Port enum (Port.L, Port.R) or raw int

In the IR, IRNode.pe stores placement as Optional[int], and IREdge.port/IREdge.source_port store port as Port. Neither field currently accepts ParamRef.

The expand pass (_clone_and_substitute_node, _clone_and_substitute_edge) only substitutes name, const, source, and dest. It does not touch pe, port, or source_port.

Grammar Changes#

// Current:
placement: "|" IDENT
port:      ":" PORT_SPEC

// Proposed:
placement: "|" (IDENT | param_ref)
port:      ":" (PORT_SPEC | param_ref)

Lower Pass#

The placement handler currently does return str(token). It needs to handle receiving a ParamRef from the parser and return it as-is:

def placement(self, *args):
    for arg in args:
        if isinstance(arg, ParamRef):
            return arg
    return str(args[-1])

Similarly, the port handler needs to pass through ParamRef instead of resolving to Port:

def port(self, *args):
    for arg in args:
        if isinstance(arg, ParamRef):
            return arg
    # ... existing Port.L / Port.R / int resolution

The qualified_ref handler already iterates over args by type. It needs a new branch to detect ParamRef in placement/port positions (currently it only detects ParamRef in the ref-name position). The disambiguation is based on ordering: the ref-name comes first, placement second (prefixed with |), port third (prefixed with :). Since Lark processes them through their respective rules before qualified_ref sees them, the parser distinguishes them. The qualified_ref handler just needs to accept ParamRef for placement and port:

def qualified_ref(self, *args):
    ref_type = None
    placement = None
    port = None
    for arg in args:
        if isinstance(arg, ParamRef) and ref_type is None:
            ref_type = {"name": arg}
        elif isinstance(arg, ParamRef) and ref_type is not None:
            # Second or third ParamRef — depends on position
            # But Lark gives us placement/port through their handlers,
            # so we get ParamRef from the placement() or port() handler.
            # Need to distinguish: placement handler adds a marker or
            # we rely on Lark's rule names.
            ...

Actually, this is simpler than it looks. Lark calls placement() and port() before qualified_ref(). So qualified_ref receives:

  • A dict or ParamRef (from the ref-name rules)
  • A string or ParamRef (from the placement handler)
  • A Port/int or ParamRef (from the port handler)

The existing type-based dispatch in qualified_ref needs one addition: if an arg is ParamRef and ref_type is already set, it's either placement or port. We can distinguish by wrapping them — the placement handler returns ("placement", ParamRef(...)) and port returns ("port", ParamRef(...)) when deferring. Or simpler: use a thin wrapper type.

Alternatively, Lark's @v_args(inline=True) on placement/port means the handler already knows which rule matched. The cleanest approach: return a ParamRef tagged with its role:

@dataclass(frozen=True)
class PlacementRef:
    """Deferred placement from macro parameter."""
    param: ParamRef

@dataclass(frozen=True)
class PortRef:
    """Deferred port from macro parameter."""
    param: ParamRef

Then qualified_ref type-dispatches on PlacementRef/PortRef alongside str/Port/int.

IR Changes#

IRNode.pe type becomes Optional[Union[int, ParamRef]].

IREdge.port type becomes Union[Port, ParamRef].

IREdge.source_port type becomes Optional[Union[Port, ParamRef]].

These wider types only appear in macro template bodies. After expansion, all ParamRef values are resolved to concrete types. The resolve, place, and allocate passes never see ParamRef — if one leaks through, it's a bug in expand.

Expand Pass#

_clone_and_substitute_node gains:

# Substitute PE placement if it's a ParamRef
new_pe = node.pe
if isinstance(new_pe, ParamRef):
    resolved = _substitute_param(new_pe, subst_map)
    # Must resolve to a PE identifier string like "pe0" or an int
    new_pe = _resolve_pe_placement(resolved)  # parse "pe0" -> 0, or int -> int

_clone_and_substitute_edge gains:

# Substitute port if it's a ParamRef
new_port = edge.port
if isinstance(new_port, ParamRef):
    resolved = _substitute_param(new_port, subst_map)
    new_port = _resolve_port(resolved)  # "L" -> Port.L, "R" -> Port.R, int -> int

new_source_port = edge.source_port
if isinstance(new_source_port, ParamRef):
    resolved = _substitute_param(new_source_port, subst_map)
    new_source_port = _resolve_port(resolved)

Validation#

Invalid port/placement values (e.g., passing "banana" as a port) produce a MACRO error during expansion. Post-expansion, the existing place and allocate passes validate that PE IDs are in range and ports are valid.

Examples#

Parameterized port selection:

; Macro that wires to a caller-selected port
#wire_to_port target, port |> {
    &src <| pass
    &src |> ${target}:${port}
}

; Usage: wire to left port
#wire_to_port &dest, L

; Usage: wire to right port
#wire_to_port &dest, R

Parameterized PE placement:

; Macro that places its node on a specific PE
#placed_const val, pe |> {
    &c <| const, ${val} |${pe}
    &c |> @ret
}

; Usage: place on pe0
#placed_const 42, pe0 |> &target

; Usage: place on pe1
#placed_const 42, pe1 |> &target

Combined — a macro that builds a cross-PE relay:

; Route a value from one PE to another
#cross_pe_relay src_pe, dst_pe |> {
    &hop <| pass |${src_pe}
    &hop |> @ret
}

; Usage:
#cross_pe_relay pe0, pe1 |> &destination

Context Slot Syntax#

Context slots use bracket syntax [N], distinct from all other qualifiers:

&node|pe0[2]       ; place on pe0, context slot 2
&node[0]           ; context slot 0, auto-placed PE
&node|pe1[0..4]    ; reserve context slots 0-4 for this instruction

The bracket syntax avoids overloading : (which already carries port, cell address, and potentially IRAM address semantics). [N] is exclusively context slots.

Grammar#

// New production:
ctx_slot: "[" (DEC_LIT | ctx_range | param_ref) "]"
ctx_range: DEC_LIT ".." DEC_LIT

// Updated qualified_ref:
qualified_ref: (node_ref | label_ref | ... | param_ref) placement? ctx_slot? port?

ctx_slot appears between placement and port in the qualifier chain: &node|pe0[2]:L.

Use Cases#

  • Explicit context partitioning: place parallel computations in distinct context slots to avoid matching store collisions
  • Debugging: force a known context layout for inspection
  • Range reservation ([0..4]): reserve a contiguous block of slots for an instruction that will be targeted by multiple parallel sources wired identically — not essential but a natural extension

Parameterization#

Same mechanism as placement/port. [${ctx}] in a macro body, substituted to an integer during expansion:

#placed_op op, pe, ctx |> {
    &n <| ${op} |${pe}[${ctx}]
    &n |> @ret
}

; Usage:
#placed_op add, pe0, 2 |> &target

Enhancement 3: @ret Wiring for Macros#

Goal#

Allow macros to define output points using @ret / @ret_name markers, and wire them to destinations at the call site using the |> syntax.

Grammar Change#

Add optional output list to macro_call_stmt:

// Current:
macro_call_stmt: "#" IDENT (argument ("," argument)*)?

// Proposed:
macro_call_stmt: "#" IDENT (argument ("," argument)*)? (FLOW_OUT call_output_list)?

This reuses the existing call_output_list and call_output productions from call_stmt. Same syntax: #macro args |> &dest or #macro args |> name=&dest.

Macro Body Convention#

Macro bodies use @ret and @ret_name in edge destinations, same as function bodies:

#loop op, init_val |> {
    &counter <| add
    &compare <| ${op}
    &counter |> &compare:L
    &inc <| inc
    &compare |> &inc:L
    &inc |> &counter:R
    ; Output edges use @ret convention
    &compare |> @ret_body
    &compare |> @ret_exit:R
}

Lower Pass#

When lowering macro_call_stmt with a FLOW_OUT and call_output_list:

  1. Parse the output list the same way call_stmt does (named/positional outputs)
  2. Store output destinations on the IRMacroCall in a new field: output_dests: tuple

The IRMacroCall dataclass gains:

@dataclass(frozen=True)
class IRMacroCall:
    name: str
    positional_args: tuple
    named_args: tuple
    output_dests: tuple = ()   # New: output wiring destinations
    loc: Optional[SourceLoc] = None

Expand Pass#

After cloning and substituting the macro body, process @ret markers:

  1. Scan expanded edges for destinations starting with @ret
  2. For each @ret / @ret_name destination, look up the corresponding output from IRMacroCall.output_dests
  3. Replace the @ret* destination with the actual target node name
  4. If a @ret* marker has no matching output dest, report a MACRO error

This is simpler than function call wiring because macros don't need:

  • Trampoline nodes (no cross-context routing)
  • ctx_override edges (macros inline into the caller's context)
  • FREE_CTX nodes (no context allocation)
  • Synthetic PASS nodes (direct edge replacement suffices)

The @ret substitution in macros is purely edge rewriting — replace the symbolic @ret_name destination with the concrete node reference from the call site.

Positional @ret Mapping#

Same convention as function calls:

  • Bare @ret maps to the first (or only) positional output
  • @ret_name maps to the named output name=&dest
  • Multiple bare @ret edges to different ports on the same output are valid

Example#

; Define macro with outputs
#loop_counted |> {
    &counter <| add
    &compare <| brgt
    &counter |> &compare:L
    &inc <| inc
    &compare |> &inc:L
    &inc |> &counter:R
    &compare |> @ret_body
    &compare |> @ret_exit:R
}

; Invoke with output wiring
#loop_counted |> body=&process, exit=&done
&init |> #loop_counted_0.&counter:L
&limit |> #loop_counted_0.&compare:R

Or positionally:

#simple_gate |> {
    &g <| gate
    &g |> @ret
    &g |> @ret:R   ; second output port
}

; Invoke — positional @ret maps to first output
#simple_gate |> &body, &exit

Impact on Built-in Macros#

With both enhancements, the built-in library collapses significantly:

Current (11 macros)#

#loop_counted, #loop_while
#permit_inject_1, #permit_inject_2, #permit_inject_3, #permit_inject_4
#reduce_add_2, #reduce_add_3, #reduce_add_4

Proposed (4-5 macros, more capable)#

; Counted loop with output wiring
#loop_counted |> {
    &counter <| add
    &compare <| brgt
    &counter |> &compare:L
    &inc <| inc
    &compare |> &inc:L
    &inc |> &counter:R
    &compare |> @ret_body
    &compare |> @ret_exit:R
}

; Condition-tested loop
#loop_while |> {
    &gate <| gate
    &gate |> @ret_body
    &gate |> @ret_exit:R
}

; Permit injection — variadic, outputs via @ret
#permit_inject *nodes |> {
    $(
        &p_${_idx} <| const, 1
        &p_${_idx} |> @ret
    ),*
}

; Binary reduction tree — parameterized opcode + arity
#reduce_2 op |> {
    &r <| ${op}
}

#reduce_3 op |> {
    &r0 <| ${op}
    &r1 <| ${op}
    &r0 |> &r1:L
}

#reduce_4 op |> {
    &r0 <| ${op}
    &r1 <| ${op}
    &r2 <| ${op}
    &r0 |> &r2:L
    &r1 |> &r2:R
}

Usage:

; Old:
!#loop_counted
&init |> #loop_counted_0.&counter:L
&limit |> #loop_counted_0.&compare:R
#loop_counted_0.&compare |> &body:L
#loop_counted_0.&compare |> &exit:R

; New:
#loop_counted |> body=&process, exit=&done
&init |> #loop_counted_0.&counter:L
&limit |> #loop_counted_0.&compare:R

; Old:
!#reduce_add_4

; New:
#reduce_4 add

Note: the #permit_inject example with variadic @ret is aspirational — it requires @ret to work inside repetition blocks, which means the @ret substitution must happen after repetition expansion. This ordering is already correct since repetition expansion happens before edge rewriting in the expand pass.

Implementation Order#

  1. Opcode parameters — grammar change (opcode: OPCODE | param_ref), argument syntax (positional_arg: ... | OPCODE), expand pass substitution. Smallest diff, immediately useful.

  2. Qualified ref parameters — grammar changes to placement and port, PlacementRef/PortRef wrapper types, IR type widening, expand pass substitution. Mechanically similar to opcode params, builds on the same _substitute_param infrastructure.

  3. @ret wiring for macros — grammar change (output list on macro_call_stmt), IRMacroCall.output_dests, expand pass edge rewriting. Builds on existing @ret patterns from function calls.

  4. Built-in macro rewrite — collapse per-variant macros using the new features. Backwards-incompatible (old macro names removed), but since the built-ins are bundled and the system is pre-1.0, this is acceptable.

Open Questions#

  1. Should macros with @ret also support |> on inputs? Function calls use $func a=&x |> @output. Currently macro calls use #macro arg1, arg2 for inputs. Adding |> for outputs is proposed above. Should inputs also support named wiring? Probably not needed — macros already have ${param} for inputs, and the input wiring is fundamentally different (parameter substitution vs edge creation).

  2. Error messages for mismatched @ret counts. If a macro body has @ret_body and @ret_exit but the call site only provides one output, what error? Probably MACRO category: "macro '#loop_counted' defines outputs @ret_body, @ret_exit but call provides 1 output".

  3. Interaction with nested macros. If macro A calls macro B which has @ret, and A also has @ret, the scoping should work naturally — B's @ret resolves at B's call site (inside A's body), A's @ret resolves at A's call site. The existing scope qualification prevents name collisions.