design-notes/dfasm-primer.md at main · nonbinary.computer/or1-design

nonbinary.computer / or1-design
fork atom
OR-1 dataflow CPU sketch
fork atom
or1-design / design-notes / dfasm-primer.md
at main 672 lines 27 kB view raw view rendered
wrap content
Orual docs updates, pulled in datasheets 3w ago
3416127c
  1# dfasm - Dataflow Graph Assembly Language
  2
  3A primer on the assembly dialect used in the OR-1
  4
  5See `assembler-architecture.md` for the assembler's internal pipeline.
  6See `architecture-overview.md` for the hardware model dfasm targets.
  7
  8## What dfasm Is
  9
 10dfasm is a representation of low-level dataflow program graphs in text form. Each instruction forms a **node** with zero, one, or two inputs, and up to two outputs. Connections between nodes are conceived of as graph edges. Execution is entirely data-driven. A node fires when all its required operands have arrived as tokens.
 11
 12It is *not* conventional assembly with strong implicit sequential behaviour. There is no program counter, and jumps and execution order are driven primarily by the graph topology. Writing things in a sensible order is up to you, and is for your own benefit.
 13## Syntax Overview
 14
 15### Comments
 16
 17Semicolons start line comments, as is fairly common for assembly. While the parser is sophisticated, I've chosen to do this to set a specific tone. dfasm, while it has a number of seemingly sophisticated features, remains an *assembly language*, tightly coupled to the low-level functions of the hardware.
 18
 19```dfasm
 20; This is a comment
 21&add <| add   ; inline comment
 22```
 23
 24### Names and Sigils
 25
 26dfasm uses four sigil-prefixed naming conventions:
 27
 28| Sigil    | Scope                             | Use                               |
 29| -------- | --------------------------------- | --------------------------------- |
 30| `@name`  | Global (top-level)                | Node references, data definitions |
 31| `&name`  | Local (within enclosing function) | Labels for instructions           |
 32| `$name`  | Global                            | Function / subgraph definitions   |
 33| `#name`  | Global                            | Macro definitions and invocations |
 34
 35Additionally, `${name}` is used within macro bodies for parameter substitution (see Macros section).
 36
 37Names are composed of `[a-zA-Z_][a-zA-Z0-9_]*`.
 38
 39### Qualifier Chains
 40
 41Names can be chained with placement and port qualifiers. No spaces are allowed within a chain. Placement indicators are mostly optional. The assembler will attempt to resolve and auto-place instructions, currently using a basic greedy locality heuristic. If an instruction cannot be placed, or the assembler places it badly, you can manually assign its placement.
 42
 43```dfasm
 44&sum|pe0:L       ; label "sum", placed on PE 0, left port
 45@data|sm0:5      ; node "data", placed on SM 0, cell address 5
 46&branch|pe1:R    ; label "branch", PE 1, right port
 47```
 48
 49| Qualifier | Syntax | Meaning |
 50|-----------|--------|---------|
 51| Placement | `\|peN` or `\|smN` | Assign to a specific PE or SM |
 52| Port | `:L` or `:R` | Left or right input port (for edges) |
 53| Cell address | `:N` | SM cell address (for data definitions) |
 54
 55## Statement Types
 56
 57### Pragma
 58
 59Pragmas are built-in nodes that provide specific information about how the program should be assembled.
 60
 61### `@system`
 62
 63Declares hardware configuration. Required for programs that need specific PE/SM counts:
 64
 65```dfasm
 66@system pe=4, sm=1, iram=128, ctx=4
 67```
 68
 69| Parameter | Required | Default | Meaning                                  |
 70| --------- | -------- | ------- | ---------------------------------------- |
 71| `pe`      | yes      | —       | Number of processing elements            |
 72| `sm`      | yes      | —       | Number of structure memory modules       |
 73| `iram`    | no       | 128     | IRAM capacity per PE (instruction slots) |
 74| `ctx`     | no       | 16      | Context slots per PE                     |
 75
 76At most one `@system` pragma per program.
 77
 78### `@rom_data`
 79
 80Declares that the following section will be placed in ROM with an optional name and base address. If the address is not specified, it will be placed after the contents of the reset vector. It will *not* be emitted as tokens during bootstrapping. 
 81
 82```dfasm
 83@rom_data [name=, addr=...]
 84```
 85
 86Unlike the `@system` pragma, the `@rom_data` pragma can be used more than once. Functions placed in a `@rom_data` section can be loaded via `exec`, as can a named `@rom_data` section. A function with its seed tokens in its scope while be called with those tokens.
 87### Instruction Definition
 88
 89Defines a named node with an opcode and optional arguments:
 90
 91```dfasm
 92&label <| opcode [, arg ...]
 93```
 94
 95The `<|` operator reads as "receives from". 
 96The node receives data from whatever edges point to it.
 97
 98**Examples:**
 99
100```dfasm
101&c1|pe0 <| const, 42       ; constant node, value 42, placed on PE 0
102&adder <| add               ; dyadic add, auto-placed
103&reader|pe1 <| read, 5     ; SM read at cell 5, placed on PE 1
104&branch <| sweq             ; switch-on-equal (routing op)
105```
106
107Named arguments are supported for clarity:
108
109```dfasm
110@serial <| ior dest=0x45, addr=0x91, data=0x43
111```
112
113### Plain Edge
114
115Wires a named source to one or more named destinations:
116
117```dfasm
118&source |> &dest:L              ; single edge, left port
119&source |> &dest1:L, &dest2:R   ; fan-out to two destinations
120```
121
122The `|>` operator reads as "flows to". Data flows from source to destination. Port qualifiers on the destination specify which input the data arrives on. Port qualifiers on the source specify which
123output slot it leaves from (relevant for dual-output nodes like switch operations).
124
125> `const` instructions on the left/source side create 'seed' tokens, injected into the machine after loading, at startup, or when their function enters scope.
126
127**When no port is specified, the default is L**
128
129### Strong Edge (Inline Anonymous Node)
130
131Creates an anonymous node with explicit input wiring:
132
133```dfasm
134opcode inputs... |> outputs...
135```
136
137```dfasm
138add &a, &b |> &result:L        ; anonymous add of &a and &b → &result
139```
140
141This is shorthand. The assembler creates a hidden node (named `&__anon_N`) and wires the inputs and outputs. Useful for small, one-off operations that don't need a label.
142
143### Weak Edge (Reverse Inline)
144
145Same as strong edge but with reversed syntax:
146
147```dfasm
148outputs... opcode <| inputs...
149```
150
151```dfasm
152&result:L add <| &a, &b        ; same as: add &a, &b |> &result:L
153```
154
155The distinction between strong and weak edges is currently purely syntactic, they produce identical IR. Future iterations of the OR-1 will execute a series of strong edges as a pseudo-sequential block.
156
157### Function Definition
158
159Groups instructions into a named scope:
160
161```dfasm
162$fib |> {
163  &c_n <| const, 10
164  &sub1 <| sub
165  &branch <| sweq
166
167  &c_n |> &branch:L
168  &c_n |> &sub1:L
169  ; ...
170}
171```
172
173Labels (`&name`) inside a function are scoped to that function. You cannot reference `&sub1` from outside `$fib`. Internally, the assembler qualifies the name as `$fib.&sub1`.
174
175> Node references (`@name`) are always global, and can be referenced from anywhere.
176
177### Data Definition
178
179Initializes a structure memory cell before execution begins:
180
181```dfasm
182@data|sm0:5 = 0x42             ; SM 0, cell 5, value 0x42
183@pair|sm0:0 = 'h', 'i'         ; two chars packed big-endian → 0x6869
184@msg|sm1:10 = "hello"          ; string chars as packed 16-bit words
185```
186
187Data definitions require SM placement (`|smN`) and a cell address (`:N`). The assembler translates these into SM write tokens during bootstrap, if placed in RAM, or into a text section of the ROM image if placed in ROM.
188
189### Location Directive
190
191Sets a location context for subsequent definitions:
192
193```dfasm
194@compute_region
195&a <| const, 5      ; these nodes are inside @compute_region
196&b <| add
197```
198
199Statements following a location directive are collected into that location's scope until the next function or location directive.
200
201## Opcodes
202
203### Arithmetic (dyadic unless noted)
204
205| Mnemonic  | Arity   | Description                      |
206| --------- | ------- | -------------------------------- |
207| `add`     | dyadic  | L + R                            |
208| `sub`     | dyadic  | L − R                            |
209| `inc`     | monadic | data + 1                         |
210| `dec`     | monadic | data − 1                         |
211| `shiftl`  | monadic | shift left by 1 bits             |
212| `shiftr`  | monadic | logical shift right by 1 bits    |
213| `ashiftr` | monadic | arithmetic shift right by 1 bits |
214
215### Logical
216
217| Mnemonic | Arity   | Description |
218| -------- | ------- | ----------- |
219| `and`    | dyadic  | bitwise AND |
220| `or`     | dyadic  | bitwise OR  |
221| `xor`    | dyadic  | bitwise XOR |
222| `not`    | monadic | bitwise NOT |
223|          |         |             |
224
225### Comparison (dyadic, produce bool_out)
226
227| Mnemonic | Description    |
228| -------- | -------------- |
229| `eq`     | L == R         |
230| `lt`     | L < R (signed) |
231| `lte`    | L ≤ R (signed) |
232| `gt`     | L > R (signed) |
233| `gte`    | L ≥ R (signed) |
234
235Comparison results are signed 2's complement interpretation of 16-bit values.
236
237### Routing / Switching / Branching (dyadic)
238
239These operations route tokens based on a comparison result. They are all dyadic — they compare L and R, then route accordingly.
240
241**Branch operations** (`br*`): compare L and R, then emit data to `dest_l` (taken) or `dest_r` (not taken). Both outputs carry the data value; the branch condition selects the destination:
242
243| Mnemonic | Condition  |
244| -------- | ---------- |
245| `breq`   | L == R     |
246| `brgt`   | L > R      |
247| `brge`   | L ≥ R      |
248| `brof`   | overflow   |
249
250> NOTE:
251> `br*` ops use predicate register and internal-to-PE loopback route if supported by hardware. Future strongly-connected block execution will change the behaviour of `br*` ops to support pseudo-sequential execution within a PE.
252
253**Switch operations** (`sw*`): like branch, but when the condition is true, data goes to `dest_l` and a trigger token (value 0) goes to `dest_r`.
254When false, trigger goes to `dest_l` and data goes to `dest_r`:
255
256| Mnemonic | Condition |
257|----------|-----------|
258| `sweq`   | L == R    |
259| `swgt`   | L > R     |
260| `swge`   | L ≥ R     |
261| `swof`   | overflow  |
262
263**Other routing:**
264
265| Mnemonic | Arity  | Description                                                         |
266| -------- | ------ | ------------------------------------------------------------------- |
267| `gate`   | dyadic | pass data through if bool_out is true, suppress if false            |
268| `sel`    | dyadic | select between L and R based on a condition                         |
269| `merge`  | dyadic | merge two token streams (non-deterministic: fires on either input)  |
270
271### Data
272
273| Mnemonic   | Arity   | Description                             |
274| ---------- | ------- | --------------------------------------- |
275| `pass`     | monadic | pass data through unchanged             |
276| `const`    | monadic | emit constant value (from const field)  |
277| `free_ctx` | monadic | deallocate context slot, no data output |
278
279- `free_ctx` is a special instruction used to handle function body and loop exits. It frees the context slot so it can be reused.
280
281### Structure Memory
282
283| Mnemonic    | Arity             | Description                                                                                                           |
284| ----------- | ----------------- | --------------------------------------------------------------------------------------------------------------------- |
285| `read`      | monadic           | read from SM cell (const = cell address)                                                                              |
286| `write`     | context-dependent | write to SM cell — monadic if const is set (cell addr from const), dyadic if const is None (cell addr from L operand) |
287| `clear`     | monadic           | clear SM cell (reset to EMPTY state)                                                                                  |
288| `alloc`     | monadic           | allocate SM cell                                                                                                      |
289| `free`      | monadic           | free SM cell                                                                                                          |
290| `rd_inc`    | monadic           | atomic read-and-increment                                                                                             |
291| `rd_dec`    | monadic           | atomic read-and-decrement                                                                                             |
292| `cmp_sw`    | dyadic            | compare-and-swap (L = expected, R = new value)                                                                        |
293| `exec`      | monadic           | trigger EXEC on SM (inject tokens from T0 storage into network)                                                       |
294| `raw_read`  | monadic           | raw read from T0 storage (no I-structure semantics)                                                                   |
295| `set_page`  | monadic           | set SM page register (T0 operation)                                                                                   |
296| `write_imm` | monadic           | immediate write to SM cell (T0 operation)                                                                             |
297| `ext`       | monadic           | extended SM operation                                                                                                 |
298
299Note: `free` (SM cell deallocation) and `free_ctx` (PE context slot deallocation) are distinct operations targeting different resources.
300
301SM opcodes use a variable-width bus encoding. See `sm-design.md` for the full opcode table and encoding tiers.
302
303## Literals
304
305| Syntax | Example | Description |
306|--------|---------|-------------|
307| Decimal | `42` | Decimal integer |
308| Hex | `0xFF` | Hexadecimal integer |
309| Char | `'A'` | Single character (ASCII value) |
310| String | `"hello"` | String of char codes |
311| Raw string | `r"no\escapes"` | No escape processing |
312| Byte string | `b"\x00\xFF"` | Explicit byte values |
313
314**Escape sequences** (in regular strings and char literals):
315`\n`, `\t`, `\r`, `\0`, `\\`, `\'`, `\"`, `\xHH`
316
317**Multi-char packing:** when multiple char values appear in a data definition, they are packed big-endian into 16-bit words:
318
319```vhdl
320@data|sm0:0 = 'h', 'i'    ; → 0x6869 (h=0x68 in high byte, i=0x69 in low)
321```
322
323All data values are 16-bit unsigned.
324
325## Complete Example
326
327A simple program that adds two constants and routes the result across PEs:
328
329```vhdl
330; Hardware: 2 PEs, no structure memory
331@system pe=2, sm=0
332
333; Define nodes
334&c1|pe0 <| const, 3          ; constant 3 on PE 0
335&c2|pe0 <| const, 7          ; constant 7 on PE 0
336&result|pe0 <| add            ; adder on PE 0
337&output|pe1 <| pass           ; output relay on PE 1
338
339; Wire the graph
340&c1 |> &result:L              ; const 3 → adder left input
341&c2 |> &result:R              ; const 7 → adder right input
342&result |> &output:L          ; sum → output relay
343```
344
345**What happens at runtime:**
346
3471. The assembler emits two seed tokens (for `&c1` and `&c2`) since they are `CONST` nodes with no incoming edges.
3482. Both tokens arrive at PE 0's matching store. `&result` is a dyadic instruction — it waits for both operands.
3493. When both arrive, the matching store pairs them. The left operand (3) and right operand (7) feed the ALU.
3504. The ALU computes `3 + 7 = 10` and emits a token to `&output` on PE 1.
3515. `&output` is monadic (`pass`) — it bypasses the matching store and immediately emits the value 10.
352
353## Structure Memory Example
354
355Write a value to SM, then read it back:
356
357```vhdl
358@system pe=3, sm=1
359
360; Pre-initialise SM cell 5 with value 0x42
361@val|sm0:5 = 0x42
362
363; Trigger a read of cell 5
364&trigger|pe0 <| const, 1
365&reader|pe0 <| read, 5        ; read SM 0, cell 5
366&relay|pe1 <| pass
367&sink|pe2 <| pass
368
369&trigger |> &reader:L          ; trigger the read operation
370&reader |> &relay:L            ; SM result → relay
371&relay |> &sink:L              ; relay → final sink
372```
373
374The `read` instruction is monadic — it takes a trigger token and issues
375a read request to SM. The `const` argument (`5`) specifies the cell
376address. The SM returns the stored value (0x42 = 66 decimal) as a
377token routed back to the `read` node's destination.
378
379## Switch Routing Example
380
381Branch on equality, routing data to the taken or not-taken path:
382
383```vhdl
384@system pe=3, sm=0
385
386&val|pe0 <| const, 5           ; value to test
387&cmp|pe0 <| const, 5           ; comparison target
388&branch|pe0 <| sweq            ; switch-on-equal
389
390&taken|pe1 <| pass
391&not_taken|pe1 <| pass
392&output|pe2 <| pass
393
394; Wire inputs
395&val |> &branch:L
396&cmp |> &branch:R
397
398; Wire outputs — source port qualifiers select output slot
399&branch:L |> &taken:L          ; taken path (data)
400&branch:R |> &not_taken:L      ; not-taken path (trigger)
401
402; Merge for downstream
403&taken |> &output:L
404&not_taken |> &output:R
405```
406
407Since `val == cmp` (both 5), `sweq` evaluates to true: data (5) goes to `dest_l` (taken) and a trigger token (0) goes to `dest_r` (not_taken).
408
409## Auto-Placement
410
411Nodes without explicit `|peN` qualifiers are automatically placed by the assembler:
412
413```vhdl
414@system pe=3, sm=0
415
416&c1 <| const, 5
417&c2 <| const, 3
418&result <| add
419&output <| pass
420
421&c1 |> &result:L
422&c2 |> &result:R
423&result |> &output:L
424```
425
426The assembler's greedy placer assigns PEs based on connectivity. Nodes connected by edges prefer to share a PE (minimizing cross-PE traffic). The result is functionally identical to explicit placement.
427
428## From Source to Execution
429
430### Lowering and Resolution
431
432After parsing, the assembler lowers the CST to an intermediate representation (`IRGraph`). Names are qualified, scopes are created, and edges are validated. The resolve pass checks that every edge endpoint exists and produces suggestions for typos.
433
434### Placement and Allocation
435
436Unplaced nodes get PE assignments. Then the allocator assigns each node an IRAM offset and context slot. Dyadic instructions are packed at low IRAM offsets (0..D-1), monadic above (D..D+M-1). This layout matches the hardware contract: the token's offset field doubles as the matching store entry for dyadic instructions.
437
438Context slots are assigned per function scope per PE. Each function body sharing a PE gets its own context slot, enabling concurrent activations to coexist without operand interference.
439
440### Code Generation
441
442The assembler currently offers two output modes:
443
444**Direct mode** produces `PEConfig` objects (IRAM contents, route restrictions, context slot count) and `SMConfig` objects (initial cell values), plus seed tokens. This is the fast path for the emulator. Configuration is applied directly.
445
446**Token stream mode** produces a bootstrap sequence: SM initialization writes, IRAM write tokens, then seed tokens. This mirrors the bootstrap process, loading the code stored at the reset vector.
447
448## Macros
449
450Macros define reusable template subgraphs that are expanded inline at their call sites. The macro system supports parameterisation, variadic arguments, repetition blocks, constant arithmetic, token pasting, opcode parameters, parameterized qualifiers, and `@ret` output wiring.
451
452### Macro Definition
453
454```dfasm
455#macro_name param1, param2, *variadic_param |> {
456    ; body — instructions and edges using ${param} substitution
457    &node <| add
458    ${param1} |> &node:L
459    ${param2} |> &node:R
460}
461```
462
463- Macro names use the `#` sigil
464- Parameters are declared before `|>`
465- Variadic parameters are prefixed with `*` and collect remaining arguments
466- The body contains standard dfasm statements with `${param}` placeholders
467
468### Parameter Substitution
469
470Within a macro body, `${name}` references are replaced with the actual argument values during expansion:
471
472```dfasm
473#add_const val |> {
474    &adder <| add
475    &c <| const, ${val}
476    &c |> &adder:R
477}
478```
479
480**Token pasting:** Parameters can be combined with literal text to synthesise unique names. The `${param}` reference within a label name produces a label that incorporates the argument value:
481
482```dfasm
483#make_pair name |> {
484    &${name}_left <| pass
485    &${name}_right <| pass
486}
487```
488
489### Opcode Parameters
490
491Parameters can appear in the opcode position of instruction definitions. This allows a single macro to work with any ALU or memory operation:
492
493```dfasm
494#reduce_2 op |> {
495    &r <| ${op}
496    &r |> @ret
497}
498
499; Usage — the opcode is passed as a bare mnemonic:
500#reduce_2 add |> &result
501#reduce_2 sub |> &result
502```
503
504Opcode arguments are passed as bare identifiers (not strings). The expand pass resolves them via `MNEMONIC_TO_OP` during expansion. An invalid mnemonic produces a MACRO error.
505
506### Parameterized Qualifiers
507
508Parameters can appear in placement (`|pe0`) and port (`:L`) positions within a macro body:
509
510```dfasm
511; Parameterized port
512#wire_to target, port |> {
513    &src <| pass
514    &src |> ${target}:${port}
515}
516#wire_to &dest, L
517
518; Parameterized placement
519#placed_const val, pe |> {
520    &c <| const, ${val} |${pe}
521    &c |> @ret
522}
523#placed_const 42, pe0 |> &target
524
525; Parameterized context slot
526#placed_op op, pe, ctx |> {
527    &n <| ${op} |${pe}[${ctx}]
528    &n |> @ret
529}
530```
531
532The expand pass resolves placement strings (e.g., `"pe0"` → `0`), port strings (`"L"` → `Port.L`), and context slot values to their concrete types. Invalid values produce MACRO errors.
533
534### Repetition Blocks
535
536The `$(  ),*` syntax expands its body once per element of a variadic parameter. Within a repetition block, `${_idx}` provides the current iteration index (0-based):
537
538```dfasm
539#fan_out *targets |> {
540    &src <| pass
541    $(
542        &src |> ${targets}
543    ),*
544}
545```
546
547### Constant Arithmetic
548
549Macro const fields support compile-time arithmetic with `+`, `-`, `*`, `//` on integer values and parameters:
550
551```dfasm
552#indexed_read base, *cells |> {
553    $(
554        &r${_idx} <| read, ${base} + ${_idx}
555    ),*
556}
557```
558
559### Macro Invocation and Output Wiring (@ret)
560
561Macros are invoked as standalone statements. Arguments can be positional or named:
562
563```dfasm
564#fan_out &a:L, &b:R, &c:L
565#indexed_read 10, &dest1, &dest2, &dest3
566#make_pair name=foo
567```
568
569**Output wiring with `@ret`:** Macro bodies can define output points using `@ret` / `@ret_name` markers. At the call site, `|>` wires these outputs to destinations:
570
571```dfasm
572; Macro body defines outputs via @ret markers
573#loop_counted init, limit |> {
574    &counter <| add
575    &compare <| brgt
576    &counter |> &compare:L
577    &body_fan <| pass
578    &compare |> &body_fan:L
579    &inc <| inc
580    &body_fan |> &inc:L
581    &inc |> &counter:R
582    ${init} |> &counter:L
583    ${limit} |> &compare:R
584    &body_fan |> @ret_body       ; named output: body
585    &compare |> @ret_exit:R      ; named output: exit
586}
587
588; Call with named output wiring:
589#loop_counted &init, &limit |> body=&process, exit=&done
590
591; Or positional @ret for single-output macros:
592#reduce_2 add |> &result
593```
594
595Unlike function calls, macro `@ret` wiring is purely edge rewriting — the `@ret_name` destination is replaced with the concrete node reference from the call site. No trampolines, no cross-context routing, no `free_ctx` insertion. Macros inline into the caller's context.
596
597**Bare `@ret`** maps to the first (or only) positional output. **`@ret_name`** maps to the named output `name=&dest` at the call site. Multiple `@ret` edges to different ports on the same output are valid.
598
599### Scoping
600
601Expanded macro names are automatically qualified to prevent collisions between multiple invocations of the same macro:
602
603- Top-level invocation: `#macro_N.&label` (N is the invocation counter)
604- Inside a function: `$func.#macro_N.&label`
605
606### Built-in Macros
607
608The following macros are automatically available in all programs (defined in `asm/builtins.py`):
609
610| Macro | Parameters | Outputs | Purpose |
611|-------|------------|---------|---------|
612| `#loop_counted` | `init, limit` | `body`, `exit` | Counted loop: counter + compare + increment feedback. Call with `#loop_counted &init, &limit \|> body=&process, exit=&done` |
613| `#loop_while` | `test` | `body`, `exit` | Condition-tested loop: gate node. Call with `#loop_while &test_src \|> body=&process, exit=&done` |
614| `#permit_inject` | `*targets` | (none — routes directly to targets) | Inject one const(1) seed per target. `#permit_inject &gate_a, &gate_b` |
615| `#reduce_2` | `op` | (positional) | Binary reduction: 1 node. `#reduce_2 add \|> &result` |
616| `#reduce_3` | `op` | (positional) | Binary reduction tree: 2 nodes. `#reduce_3 sub \|> &result` |
617| `#reduce_4` | `op` | (positional) | Binary reduction tree: 3 nodes. `#reduce_4 add \|> &result` |
618
619All built-in macros use `@ret` output wiring except `#permit_inject`, which routes directly to its variadic target arguments. The `#reduce_*` family accepts any opcode as a parameter.
620
621## Function Calls
622
623Function calls wire argument values across context boundaries using the expand pass. The call syntax declares which arguments feed into the callee and where results flow back.
624
625### Call Syntax
626
627```dfasm
628$func_name arg1=&source1, arg2=&source2 |> @result
629```
630
631The function must be defined as a `$name |> { ... }` region. Arguments are named (matching the function's parameter labels) or positional. Outputs after `|>` specify where results are routed.
632
633Multiple outputs can be named:
634
635```dfasm
636$divmod a=&dividend, b=&divisor |> @quotient, remainder=@remainder
637```
638
639### What the Expand Pass Does
640
641When processing a function call:
642
6431. Allocates a fresh context slot for the callee activation
6442. Generates cross-context edges with `ctx_override=True` (becomes `ctx_mode=01` / CTX_OVRD in hardware)
6453. Creates trampoline `PASS` nodes for return routing
6464. Generates `FREE_CTX` nodes to clean up the callee's context on completion
6475. Synthesises `@ret` marker nodes for return paths
648
649### Return Convention
650
651Inside function bodies, `@ret` and `@ret_name` are reserved markers for return points. The expand pass replaces them with return trampolines — synthetic `pass` nodes that route results back to the caller's context via CTX_OVRD, with auto-inserted `free_ctx` nodes for context teardown. Port-qualified returns (`@ret:L`, `@ret:R`) handle dual-output return nodes. Named returns (`@ret_body`, `@ret_exit`) handle multiple independent return paths, wired at the call site via `name=@dest`.
652
653Note: `@ret` in function bodies creates trampolines with cross-context routing. `@ret` in macro bodies is simpler — pure edge rewriting with no context management, since macros inline into the caller's context.
654
655### Example
656
657```dfasm
658@system pe=2, sm=0
659
660; Define a function that adds two values
661$add_pair |> {
662    &sum <| add
663}
664
665; Call it
666&a <| const, 3
667&b <| const, 7
668$add_pair a=&a, b=&b |> @result
669@result <| pass
670```
671
672After expansion, the assembler generates the cross-context wiring, trampoline nodes, and context cleanup automatically. The programmer does not need to manage context slots or return routing manually.