OR-1 dataflow CPU sketch
at main 672 lines 27 kB view raw view rendered
1# dfasm - Dataflow Graph Assembly Language 2 3A primer on the assembly dialect used in the OR-1 4 5See `assembler-architecture.md` for the assembler's internal pipeline. 6See `architecture-overview.md` for the hardware model dfasm targets. 7 8## What dfasm Is 9 10dfasm is a representation of low-level dataflow program graphs in text form. Each instruction forms a **node** with zero, one, or two inputs, and up to two outputs. Connections between nodes are conceived of as graph edges. Execution is entirely data-driven. A node fires when all its required operands have arrived as tokens. 11 12It is *not* conventional assembly with strong implicit sequential behaviour. There is no program counter, and jumps and execution order are driven primarily by the graph topology. Writing things in a sensible order is up to you, and is for your own benefit. 13## Syntax Overview 14 15### Comments 16 17Semicolons start line comments, as is fairly common for assembly. While the parser is sophisticated, I've chosen to do this to set a specific tone. dfasm, while it has a number of seemingly sophisticated features, remains an *assembly language*, tightly coupled to the low-level functions of the hardware. 18 19```dfasm 20; This is a comment 21&add <| add ; inline comment 22``` 23 24### Names and Sigils 25 26dfasm uses four sigil-prefixed naming conventions: 27 28| Sigil | Scope | Use | 29| -------- | --------------------------------- | --------------------------------- | 30| `@name` | Global (top-level) | Node references, data definitions | 31| `&name` | Local (within enclosing function) | Labels for instructions | 32| `$name` | Global | Function / subgraph definitions | 33| `#name` | Global | Macro definitions and invocations | 34 35Additionally, `${name}` is used within macro bodies for parameter substitution (see Macros section). 36 37Names are composed of `[a-zA-Z_][a-zA-Z0-9_]*`. 38 39### Qualifier Chains 40 41Names can be chained with placement and port qualifiers. No spaces are allowed within a chain. Placement indicators are mostly optional. The assembler will attempt to resolve and auto-place instructions, currently using a basic greedy locality heuristic. If an instruction cannot be placed, or the assembler places it badly, you can manually assign its placement. 42 43```dfasm 44&sum|pe0:L ; label "sum", placed on PE 0, left port 45@data|sm0:5 ; node "data", placed on SM 0, cell address 5 46&branch|pe1:R ; label "branch", PE 1, right port 47``` 48 49| Qualifier | Syntax | Meaning | 50|-----------|--------|---------| 51| Placement | `\|peN` or `\|smN` | Assign to a specific PE or SM | 52| Port | `:L` or `:R` | Left or right input port (for edges) | 53| Cell address | `:N` | SM cell address (for data definitions) | 54 55## Statement Types 56 57### Pragma 58 59Pragmas are built-in nodes that provide specific information about how the program should be assembled. 60 61### `@system` 62 63Declares hardware configuration. Required for programs that need specific PE/SM counts: 64 65```dfasm 66@system pe=4, sm=1, iram=128, ctx=4 67``` 68 69| Parameter | Required | Default | Meaning | 70| --------- | -------- | ------- | ---------------------------------------- | 71| `pe` | yes | — | Number of processing elements | 72| `sm` | yes | — | Number of structure memory modules | 73| `iram` | no | 128 | IRAM capacity per PE (instruction slots) | 74| `ctx` | no | 16 | Context slots per PE | 75 76At most one `@system` pragma per program. 77 78### `@rom_data` 79 80Declares that the following section will be placed in ROM with an optional name and base address. If the address is not specified, it will be placed after the contents of the reset vector. It will *not* be emitted as tokens during bootstrapping. 81 82```dfasm 83@rom_data [name=, addr=...] 84``` 85 86Unlike the `@system` pragma, the `@rom_data` pragma can be used more than once. Functions placed in a `@rom_data` section can be loaded via `exec`, as can a named `@rom_data` section. A function with its seed tokens in its scope while be called with those tokens. 87### Instruction Definition 88 89Defines a named node with an opcode and optional arguments: 90 91```dfasm 92&label <| opcode [, arg ...] 93``` 94 95The `<|` operator reads as "receives from". 96The node receives data from whatever edges point to it. 97 98**Examples:** 99 100```dfasm 101&c1|pe0 <| const, 42 ; constant node, value 42, placed on PE 0 102&adder <| add ; dyadic add, auto-placed 103&reader|pe1 <| read, 5 ; SM read at cell 5, placed on PE 1 104&branch <| sweq ; switch-on-equal (routing op) 105``` 106 107Named arguments are supported for clarity: 108 109```dfasm 110@serial <| ior dest=0x45, addr=0x91, data=0x43 111``` 112 113### Plain Edge 114 115Wires a named source to one or more named destinations: 116 117```dfasm 118&source |> &dest:L ; single edge, left port 119&source |> &dest1:L, &dest2:R ; fan-out to two destinations 120``` 121 122The `|>` operator reads as "flows to". Data flows from source to destination. Port qualifiers on the destination specify which input the data arrives on. Port qualifiers on the source specify which 123output slot it leaves from (relevant for dual-output nodes like switch operations). 124 125> `const` instructions on the left/source side create 'seed' tokens, injected into the machine after loading, at startup, or when their function enters scope. 126 127**When no port is specified, the default is L** 128 129### Strong Edge (Inline Anonymous Node) 130 131Creates an anonymous node with explicit input wiring: 132 133```dfasm 134opcode inputs... |> outputs... 135``` 136 137```dfasm 138add &a, &b |> &result:L ; anonymous add of &a and &b → &result 139``` 140 141This is shorthand. The assembler creates a hidden node (named `&__anon_N`) and wires the inputs and outputs. Useful for small, one-off operations that don't need a label. 142 143### Weak Edge (Reverse Inline) 144 145Same as strong edge but with reversed syntax: 146 147```dfasm 148outputs... opcode <| inputs... 149``` 150 151```dfasm 152&result:L add <| &a, &b ; same as: add &a, &b |> &result:L 153``` 154 155The distinction between strong and weak edges is currently purely syntactic, they produce identical IR. Future iterations of the OR-1 will execute a series of strong edges as a pseudo-sequential block. 156 157### Function Definition 158 159Groups instructions into a named scope: 160 161```dfasm 162$fib |> { 163 &c_n <| const, 10 164 &sub1 <| sub 165 &branch <| sweq 166 167 &c_n |> &branch:L 168 &c_n |> &sub1:L 169 ; ... 170} 171``` 172 173Labels (`&name`) inside a function are scoped to that function. You cannot reference `&sub1` from outside `$fib`. Internally, the assembler qualifies the name as `$fib.&sub1`. 174 175> Node references (`@name`) are always global, and can be referenced from anywhere. 176 177### Data Definition 178 179Initializes a structure memory cell before execution begins: 180 181```dfasm 182@data|sm0:5 = 0x42 ; SM 0, cell 5, value 0x42 183@pair|sm0:0 = 'h', 'i' ; two chars packed big-endian → 0x6869 184@msg|sm1:10 = "hello" ; string chars as packed 16-bit words 185``` 186 187Data definitions require SM placement (`|smN`) and a cell address (`:N`). The assembler translates these into SM write tokens during bootstrap, if placed in RAM, or into a text section of the ROM image if placed in ROM. 188 189### Location Directive 190 191Sets a location context for subsequent definitions: 192 193```dfasm 194@compute_region 195&a <| const, 5 ; these nodes are inside @compute_region 196&b <| add 197``` 198 199Statements following a location directive are collected into that location's scope until the next function or location directive. 200 201## Opcodes 202 203### Arithmetic (dyadic unless noted) 204 205| Mnemonic | Arity | Description | 206| --------- | ------- | -------------------------------- | 207| `add` | dyadic | L + R | 208| `sub` | dyadic | L − R | 209| `inc` | monadic | data + 1 | 210| `dec` | monadic | data − 1 | 211| `shiftl` | monadic | shift left by 1 bits | 212| `shiftr` | monadic | logical shift right by 1 bits | 213| `ashiftr` | monadic | arithmetic shift right by 1 bits | 214 215### Logical 216 217| Mnemonic | Arity | Description | 218| -------- | ------- | ----------- | 219| `and` | dyadic | bitwise AND | 220| `or` | dyadic | bitwise OR | 221| `xor` | dyadic | bitwise XOR | 222| `not` | monadic | bitwise NOT | 223| | | | 224 225### Comparison (dyadic, produce bool_out) 226 227| Mnemonic | Description | 228| -------- | -------------- | 229| `eq` | L == R | 230| `lt` | L < R (signed) | 231| `lte` | L ≤ R (signed) | 232| `gt` | L > R (signed) | 233| `gte` | L ≥ R (signed) | 234 235Comparison results are signed 2's complement interpretation of 16-bit values. 236 237### Routing / Switching / Branching (dyadic) 238 239These operations route tokens based on a comparison result. They are all dyadic — they compare L and R, then route accordingly. 240 241**Branch operations** (`br*`): compare L and R, then emit data to `dest_l` (taken) or `dest_r` (not taken). Both outputs carry the data value; the branch condition selects the destination: 242 243| Mnemonic | Condition | 244| -------- | ---------- | 245| `breq` | L == R | 246| `brgt` | L > R | 247| `brge` | L ≥ R | 248| `brof` | overflow | 249 250> NOTE: 251> `br*` ops use predicate register and internal-to-PE loopback route if supported by hardware. Future strongly-connected block execution will change the behaviour of `br*` ops to support pseudo-sequential execution within a PE. 252 253**Switch operations** (`sw*`): like branch, but when the condition is true, data goes to `dest_l` and a trigger token (value 0) goes to `dest_r`. 254When false, trigger goes to `dest_l` and data goes to `dest_r`: 255 256| Mnemonic | Condition | 257|----------|-----------| 258| `sweq` | L == R | 259| `swgt` | L > R | 260| `swge` | L ≥ R | 261| `swof` | overflow | 262 263**Other routing:** 264 265| Mnemonic | Arity | Description | 266| -------- | ------ | ------------------------------------------------------------------- | 267| `gate` | dyadic | pass data through if bool_out is true, suppress if false | 268| `sel` | dyadic | select between L and R based on a condition | 269| `merge` | dyadic | merge two token streams (non-deterministic: fires on either input) | 270 271### Data 272 273| Mnemonic | Arity | Description | 274| ---------- | ------- | --------------------------------------- | 275| `pass` | monadic | pass data through unchanged | 276| `const` | monadic | emit constant value (from const field) | 277| `free_ctx` | monadic | deallocate context slot, no data output | 278 279- `free_ctx` is a special instruction used to handle function body and loop exits. It frees the context slot so it can be reused. 280 281### Structure Memory 282 283| Mnemonic | Arity | Description | 284| ----------- | ----------------- | --------------------------------------------------------------------------------------------------------------------- | 285| `read` | monadic | read from SM cell (const = cell address) | 286| `write` | context-dependent | write to SM cell — monadic if const is set (cell addr from const), dyadic if const is None (cell addr from L operand) | 287| `clear` | monadic | clear SM cell (reset to EMPTY state) | 288| `alloc` | monadic | allocate SM cell | 289| `free` | monadic | free SM cell | 290| `rd_inc` | monadic | atomic read-and-increment | 291| `rd_dec` | monadic | atomic read-and-decrement | 292| `cmp_sw` | dyadic | compare-and-swap (L = expected, R = new value) | 293| `exec` | monadic | trigger EXEC on SM (inject tokens from T0 storage into network) | 294| `raw_read` | monadic | raw read from T0 storage (no I-structure semantics) | 295| `set_page` | monadic | set SM page register (T0 operation) | 296| `write_imm` | monadic | immediate write to SM cell (T0 operation) | 297| `ext` | monadic | extended SM operation | 298 299Note: `free` (SM cell deallocation) and `free_ctx` (PE context slot deallocation) are distinct operations targeting different resources. 300 301SM opcodes use a variable-width bus encoding. See `sm-design.md` for the full opcode table and encoding tiers. 302 303## Literals 304 305| Syntax | Example | Description | 306|--------|---------|-------------| 307| Decimal | `42` | Decimal integer | 308| Hex | `0xFF` | Hexadecimal integer | 309| Char | `'A'` | Single character (ASCII value) | 310| String | `"hello"` | String of char codes | 311| Raw string | `r"no\escapes"` | No escape processing | 312| Byte string | `b"\x00\xFF"` | Explicit byte values | 313 314**Escape sequences** (in regular strings and char literals): 315`\n`, `\t`, `\r`, `\0`, `\\`, `\'`, `\"`, `\xHH` 316 317**Multi-char packing:** when multiple char values appear in a data definition, they are packed big-endian into 16-bit words: 318 319```vhdl 320@data|sm0:0 = 'h', 'i' ; 0x6869 (h=0x68 in high byte, i=0x69 in low) 321``` 322 323All data values are 16-bit unsigned. 324 325## Complete Example 326 327A simple program that adds two constants and routes the result across PEs: 328 329```vhdl 330; Hardware: 2 PEs, no structure memory 331@system pe=2, sm=0 332 333; Define nodes 334&c1|pe0 <| const, 3 ; constant 3 on PE 0 335&c2|pe0 <| const, 7 ; constant 7 on PE 0 336&result|pe0 <| add ; adder on PE 0 337&output|pe1 <| pass ; output relay on PE 1 338 339; Wire the graph 340&c1 |> &result:L ; const 3 adder left input 341&c2 |> &result:R ; const 7 adder right input 342&result |> &output:L ; sum output relay 343``` 344 345**What happens at runtime:** 346 3471. The assembler emits two seed tokens (for `&c1` and `&c2`) since they are `CONST` nodes with no incoming edges. 3482. Both tokens arrive at PE 0's matching store. `&result` is a dyadic instruction — it waits for both operands. 3493. When both arrive, the matching store pairs them. The left operand (3) and right operand (7) feed the ALU. 3504. The ALU computes `3 + 7 = 10` and emits a token to `&output` on PE 1. 3515. `&output` is monadic (`pass`) — it bypasses the matching store and immediately emits the value 10. 352 353## Structure Memory Example 354 355Write a value to SM, then read it back: 356 357```vhdl 358@system pe=3, sm=1 359 360; Pre-initialise SM cell 5 with value 0x42 361@val|sm0:5 = 0x42 362 363; Trigger a read of cell 5 364&trigger|pe0 <| const, 1 365&reader|pe0 <| read, 5 ; read SM 0, cell 5 366&relay|pe1 <| pass 367&sink|pe2 <| pass 368 369&trigger |> &reader:L ; trigger the read operation 370&reader |> &relay:L ; SM result relay 371&relay |> &sink:L ; relay final sink 372``` 373 374The `read` instruction is monadic — it takes a trigger token and issues 375a read request to SM. The `const` argument (`5`) specifies the cell 376address. The SM returns the stored value (0x42 = 66 decimal) as a 377token routed back to the `read` node's destination. 378 379## Switch Routing Example 380 381Branch on equality, routing data to the taken or not-taken path: 382 383```vhdl 384@system pe=3, sm=0 385 386&val|pe0 <| const, 5 ; value to test 387&cmp|pe0 <| const, 5 ; comparison target 388&branch|pe0 <| sweq ; switch-on-equal 389 390&taken|pe1 <| pass 391&not_taken|pe1 <| pass 392&output|pe2 <| pass 393 394; Wire inputs 395&val |> &branch:L 396&cmp |> &branch:R 397 398; Wire outputs source port qualifiers select output slot 399&branch:L |> &taken:L ; taken path (data) 400&branch:R |> &not_taken:L ; not-taken path (trigger) 401 402; Merge for downstream 403&taken |> &output:L 404&not_taken |> &output:R 405``` 406 407Since `val == cmp` (both 5), `sweq` evaluates to true: data (5) goes to `dest_l` (taken) and a trigger token (0) goes to `dest_r` (not_taken). 408 409## Auto-Placement 410 411Nodes without explicit `|peN` qualifiers are automatically placed by the assembler: 412 413```vhdl 414@system pe=3, sm=0 415 416&c1 <| const, 5 417&c2 <| const, 3 418&result <| add 419&output <| pass 420 421&c1 |> &result:L 422&c2 |> &result:R 423&result |> &output:L 424``` 425 426The assembler's greedy placer assigns PEs based on connectivity. Nodes connected by edges prefer to share a PE (minimizing cross-PE traffic). The result is functionally identical to explicit placement. 427 428## From Source to Execution 429 430### Lowering and Resolution 431 432After parsing, the assembler lowers the CST to an intermediate representation (`IRGraph`). Names are qualified, scopes are created, and edges are validated. The resolve pass checks that every edge endpoint exists and produces suggestions for typos. 433 434### Placement and Allocation 435 436Unplaced nodes get PE assignments. Then the allocator assigns each node an IRAM offset and context slot. Dyadic instructions are packed at low IRAM offsets (0..D-1), monadic above (D..D+M-1). This layout matches the hardware contract: the token's offset field doubles as the matching store entry for dyadic instructions. 437 438Context slots are assigned per function scope per PE. Each function body sharing a PE gets its own context slot, enabling concurrent activations to coexist without operand interference. 439 440### Code Generation 441 442The assembler currently offers two output modes: 443 444**Direct mode** produces `PEConfig` objects (IRAM contents, route restrictions, context slot count) and `SMConfig` objects (initial cell values), plus seed tokens. This is the fast path for the emulator. Configuration is applied directly. 445 446**Token stream mode** produces a bootstrap sequence: SM initialization writes, IRAM write tokens, then seed tokens. This mirrors the bootstrap process, loading the code stored at the reset vector. 447 448## Macros 449 450Macros define reusable template subgraphs that are expanded inline at their call sites. The macro system supports parameterisation, variadic arguments, repetition blocks, constant arithmetic, token pasting, opcode parameters, parameterized qualifiers, and `@ret` output wiring. 451 452### Macro Definition 453 454```dfasm 455#macro_name param1, param2, *variadic_param |> { 456 ; body — instructions and edges using ${param} substitution 457 &node <| add 458 ${param1} |> &node:L 459 ${param2} |> &node:R 460} 461``` 462 463- Macro names use the `#` sigil 464- Parameters are declared before `|>` 465- Variadic parameters are prefixed with `*` and collect remaining arguments 466- The body contains standard dfasm statements with `${param}` placeholders 467 468### Parameter Substitution 469 470Within a macro body, `${name}` references are replaced with the actual argument values during expansion: 471 472```dfasm 473#add_const val |> { 474 &adder <| add 475 &c <| const, ${val} 476 &c |> &adder:R 477} 478``` 479 480**Token pasting:** Parameters can be combined with literal text to synthesise unique names. The `${param}` reference within a label name produces a label that incorporates the argument value: 481 482```dfasm 483#make_pair name |> { 484 &${name}_left <| pass 485 &${name}_right <| pass 486} 487``` 488 489### Opcode Parameters 490 491Parameters can appear in the opcode position of instruction definitions. This allows a single macro to work with any ALU or memory operation: 492 493```dfasm 494#reduce_2 op |> { 495 &r <| ${op} 496 &r |> @ret 497} 498 499; Usage — the opcode is passed as a bare mnemonic: 500#reduce_2 add |> &result 501#reduce_2 sub |> &result 502``` 503 504Opcode arguments are passed as bare identifiers (not strings). The expand pass resolves them via `MNEMONIC_TO_OP` during expansion. An invalid mnemonic produces a MACRO error. 505 506### Parameterized Qualifiers 507 508Parameters can appear in placement (`|pe0`) and port (`:L`) positions within a macro body: 509 510```dfasm 511; Parameterized port 512#wire_to target, port |> { 513 &src <| pass 514 &src |> ${target}:${port} 515} 516#wire_to &dest, L 517 518; Parameterized placement 519#placed_const val, pe |> { 520 &c <| const, ${val} |${pe} 521 &c |> @ret 522} 523#placed_const 42, pe0 |> &target 524 525; Parameterized context slot 526#placed_op op, pe, ctx |> { 527 &n <| ${op} |${pe}[${ctx}] 528 &n |> @ret 529} 530``` 531 532The expand pass resolves placement strings (e.g., `"pe0"``0`), port strings (`"L"``Port.L`), and context slot values to their concrete types. Invalid values produce MACRO errors. 533 534### Repetition Blocks 535 536The `$( ),*` syntax expands its body once per element of a variadic parameter. Within a repetition block, `${_idx}` provides the current iteration index (0-based): 537 538```dfasm 539#fan_out *targets |> { 540 &src <| pass 541 $( 542 &src |> ${targets} 543 ),* 544} 545``` 546 547### Constant Arithmetic 548 549Macro const fields support compile-time arithmetic with `+`, `-`, `*`, `//` on integer values and parameters: 550 551```dfasm 552#indexed_read base, *cells |> { 553 $( 554 &r${_idx} <| read, ${base} + ${_idx} 555 ),* 556} 557``` 558 559### Macro Invocation and Output Wiring (@ret) 560 561Macros are invoked as standalone statements. Arguments can be positional or named: 562 563```dfasm 564#fan_out &a:L, &b:R, &c:L 565#indexed_read 10, &dest1, &dest2, &dest3 566#make_pair name=foo 567``` 568 569**Output wiring with `@ret`:** Macro bodies can define output points using `@ret` / `@ret_name` markers. At the call site, `|>` wires these outputs to destinations: 570 571```dfasm 572; Macro body defines outputs via @ret markers 573#loop_counted init, limit |> { 574 &counter <| add 575 &compare <| brgt 576 &counter |> &compare:L 577 &body_fan <| pass 578 &compare |> &body_fan:L 579 &inc <| inc 580 &body_fan |> &inc:L 581 &inc |> &counter:R 582 ${init} |> &counter:L 583 ${limit} |> &compare:R 584 &body_fan |> @ret_body ; named output: body 585 &compare |> @ret_exit:R ; named output: exit 586} 587 588; Call with named output wiring: 589#loop_counted &init, &limit |> body=&process, exit=&done 590 591; Or positional @ret for single-output macros: 592#reduce_2 add |> &result 593``` 594 595Unlike function calls, macro `@ret` wiring is purely edge rewriting — the `@ret_name` destination is replaced with the concrete node reference from the call site. No trampolines, no cross-context routing, no `free_ctx` insertion. Macros inline into the caller's context. 596 597**Bare `@ret`** maps to the first (or only) positional output. **`@ret_name`** maps to the named output `name=&dest` at the call site. Multiple `@ret` edges to different ports on the same output are valid. 598 599### Scoping 600 601Expanded macro names are automatically qualified to prevent collisions between multiple invocations of the same macro: 602 603- Top-level invocation: `#macro_N.&label` (N is the invocation counter) 604- Inside a function: `$func.#macro_N.&label` 605 606### Built-in Macros 607 608The following macros are automatically available in all programs (defined in `asm/builtins.py`): 609 610| Macro | Parameters | Outputs | Purpose | 611|-------|------------|---------|---------| 612| `#loop_counted` | `init, limit` | `body`, `exit` | Counted loop: counter + compare + increment feedback. Call with `#loop_counted &init, &limit \|> body=&process, exit=&done` | 613| `#loop_while` | `test` | `body`, `exit` | Condition-tested loop: gate node. Call with `#loop_while &test_src \|> body=&process, exit=&done` | 614| `#permit_inject` | `*targets` | (none — routes directly to targets) | Inject one const(1) seed per target. `#permit_inject &gate_a, &gate_b` | 615| `#reduce_2` | `op` | (positional) | Binary reduction: 1 node. `#reduce_2 add \|> &result` | 616| `#reduce_3` | `op` | (positional) | Binary reduction tree: 2 nodes. `#reduce_3 sub \|> &result` | 617| `#reduce_4` | `op` | (positional) | Binary reduction tree: 3 nodes. `#reduce_4 add \|> &result` | 618 619All built-in macros use `@ret` output wiring except `#permit_inject`, which routes directly to its variadic target arguments. The `#reduce_*` family accepts any opcode as a parameter. 620 621## Function Calls 622 623Function calls wire argument values across context boundaries using the expand pass. The call syntax declares which arguments feed into the callee and where results flow back. 624 625### Call Syntax 626 627```dfasm 628$func_name arg1=&source1, arg2=&source2 |> @result 629``` 630 631The function must be defined as a `$name |> { ... }` region. Arguments are named (matching the function's parameter labels) or positional. Outputs after `|>` specify where results are routed. 632 633Multiple outputs can be named: 634 635```dfasm 636$divmod a=&dividend, b=&divisor |> @quotient, remainder=@remainder 637``` 638 639### What the Expand Pass Does 640 641When processing a function call: 642 6431. Allocates a fresh context slot for the callee activation 6442. Generates cross-context edges with `ctx_override=True` (becomes `ctx_mode=01` / CTX_OVRD in hardware) 6453. Creates trampoline `PASS` nodes for return routing 6464. Generates `FREE_CTX` nodes to clean up the callee's context on completion 6475. Synthesises `@ret` marker nodes for return paths 648 649### Return Convention 650 651Inside function bodies, `@ret` and `@ret_name` are reserved markers for return points. The expand pass replaces them with return trampolines — synthetic `pass` nodes that route results back to the caller's context via CTX_OVRD, with auto-inserted `free_ctx` nodes for context teardown. Port-qualified returns (`@ret:L`, `@ret:R`) handle dual-output return nodes. Named returns (`@ret_body`, `@ret_exit`) handle multiple independent return paths, wired at the call site via `name=@dest`. 652 653Note: `@ret` in function bodies creates trampolines with cross-context routing. `@ret` in macro bodies is simpler — pure edge rewriting with no context management, since macros inline into the caller's context. 654 655### Example 656 657```dfasm 658@system pe=2, sm=0 659 660; Define a function that adds two values 661$add_pair |> { 662 &sum <| add 663} 664 665; Call it 666&a <| const, 3 667&b <| const, 7 668$add_pair a=&a, b=&b |> @result 669@result <| pass 670``` 671 672After expansion, the assembler generates the cross-context wiring, trampoline nodes, and context cleanup automatically. The programmer does not need to manage context slots or return routing manually.