OR-1 dataflow CPU sketch

dfasm - Dataflow Graph Assembly Language#

A primer on the assembly dialect used in the OR-1

See assembler-architecture.md for the assembler's internal pipeline. See architecture-overview.md for the hardware model dfasm targets.

What dfasm Is#

dfasm is a representation of low-level dataflow program graphs in text form. Each instruction forms a node with zero, one, or two inputs, and up to two outputs. Connections between nodes are conceived of as graph edges. Execution is entirely data-driven. A node fires when all its required operands have arrived as tokens.

It is not conventional assembly with strong implicit sequential behaviour. There is no program counter, and jumps and execution order are driven primarily by the graph topology. Writing things in a sensible order is up to you, and is for your own benefit.

Syntax Overview#

Comments#

Semicolons start line comments, as is fairly common for assembly. While the parser is sophisticated, I've chosen to do this to set a specific tone. dfasm, while it has a number of seemingly sophisticated features, remains an assembly language, tightly coupled to the low-level functions of the hardware.

; This is a comment
&add <| add   ; inline comment

Names and Sigils#

dfasm uses four sigil-prefixed naming conventions:

Sigil Scope Use
@name Global (top-level) Node references, data definitions
&name Local (within enclosing function) Labels for instructions
$name Global Function / subgraph definitions
#name Global Macro definitions and invocations

Additionally, ${name} is used within macro bodies for parameter substitution (see Macros section).

Names are composed of [a-zA-Z_][a-zA-Z0-9_]*.

Qualifier Chains#

Names can be chained with placement and port qualifiers. No spaces are allowed within a chain. Placement indicators are mostly optional. The assembler will attempt to resolve and auto-place instructions, currently using a basic greedy locality heuristic. If an instruction cannot be placed, or the assembler places it badly, you can manually assign its placement.

&sum|pe0:L       ; label "sum", placed on PE 0, left port
@data|sm0:5      ; node "data", placed on SM 0, cell address 5
&branch|pe1:R    ; label "branch", PE 1, right port
Qualifier Syntax Meaning
Placement |peN or |smN Assign to a specific PE or SM
Port :L or :R Left or right input port (for edges)
Cell address :N SM cell address (for data definitions)

Statement Types#

Pragma#

Pragmas are built-in nodes that provide specific information about how the program should be assembled.

@system#

Declares hardware configuration. Required for programs that need specific PE/SM counts:

@system pe=4, sm=1, iram=128, ctx=4
Parameter Required Default Meaning
pe yes Number of processing elements
sm yes Number of structure memory modules
iram no 128 IRAM capacity per PE (instruction slots)
ctx no 16 Context slots per PE

At most one @system pragma per program.

@rom_data#

Declares that the following section will be placed in ROM with an optional name and base address. If the address is not specified, it will be placed after the contents of the reset vector. It will not be emitted as tokens during bootstrapping.

@rom_data [name=, addr=...]

Unlike the @system pragma, the @rom_data pragma can be used more than once. Functions placed in a @rom_data section can be loaded via exec, as can a named @rom_data section. A function with its seed tokens in its scope while be called with those tokens.

Instruction Definition#

Defines a named node with an opcode and optional arguments:

&label <| opcode [, arg ...]

The <| operator reads as "receives from". The node receives data from whatever edges point to it.

Examples:

&c1|pe0 <| const, 42       ; constant node, value 42, placed on PE 0
&adder <| add               ; dyadic add, auto-placed
&reader|pe1 <| read, 5     ; SM read at cell 5, placed on PE 1
&branch <| sweq             ; switch-on-equal (routing op)

Named arguments are supported for clarity:

@serial <| ior dest=0x45, addr=0x91, data=0x43

Plain Edge#

Wires a named source to one or more named destinations:

&source |> &dest:L              ; single edge, left port
&source |> &dest1:L, &dest2:R   ; fan-out to two destinations

The |> operator reads as "flows to". Data flows from source to destination. Port qualifiers on the destination specify which input the data arrives on. Port qualifiers on the source specify which output slot it leaves from (relevant for dual-output nodes like switch operations).

const instructions on the left/source side create 'seed' tokens, injected into the machine after loading, at startup, or when their function enters scope.

When no port is specified, the default is L

Strong Edge (Inline Anonymous Node)#

Creates an anonymous node with explicit input wiring:

opcode inputs... |> outputs...
add &a, &b |> &result:L        ; anonymous add of &a and &b → &result

This is shorthand. The assembler creates a hidden node (named &__anon_N) and wires the inputs and outputs. Useful for small, one-off operations that don't need a label.

Weak Edge (Reverse Inline)#

Same as strong edge but with reversed syntax:

outputs... opcode <| inputs...
&result:L add <| &a, &b        ; same as: add &a, &b |> &result:L

The distinction between strong and weak edges is currently purely syntactic, they produce identical IR. Future iterations of the OR-1 will execute a series of strong edges as a pseudo-sequential block.

Function Definition#

Groups instructions into a named scope:

$fib |> {
  &c_n <| const, 10
  &sub1 <| sub
  &branch <| sweq

  &c_n |> &branch:L
  &c_n |> &sub1:L
  ; ...
}

Labels (&name) inside a function are scoped to that function. You cannot reference &sub1 from outside $fib. Internally, the assembler qualifies the name as $fib.&sub1.

Node references (@name) are always global, and can be referenced from anywhere.

Data Definition#

Initializes a structure memory cell before execution begins:

@data|sm0:5 = 0x42             ; SM 0, cell 5, value 0x42
@pair|sm0:0 = 'h', 'i'         ; two chars packed big-endian → 0x6869
@msg|sm1:10 = "hello"          ; string chars as packed 16-bit words

Data definitions require SM placement (|smN) and a cell address (:N). The assembler translates these into SM write tokens during bootstrap, if placed in RAM, or into a text section of the ROM image if placed in ROM.

Location Directive#

Sets a location context for subsequent definitions:

@compute_region
&a <| const, 5      ; these nodes are inside @compute_region
&b <| add

Statements following a location directive are collected into that location's scope until the next function or location directive.

Opcodes#

Arithmetic (dyadic unless noted)#

Mnemonic Arity Description
add dyadic L + R
sub dyadic L − R
inc monadic data + 1
dec monadic data − 1
shiftl monadic shift left by 1 bits
shiftr monadic logical shift right by 1 bits
ashiftr monadic arithmetic shift right by 1 bits

Logical#

Mnemonic Arity Description
and dyadic bitwise AND
or dyadic bitwise OR
xor dyadic bitwise XOR
not monadic bitwise NOT

Comparison (dyadic, produce bool_out)#

Mnemonic Description
eq L == R
lt L < R (signed)
lte L ≤ R (signed)
gt L > R (signed)
gte L ≥ R (signed)

Comparison results are signed 2's complement interpretation of 16-bit values.

Routing / Switching / Branching (dyadic)#

These operations route tokens based on a comparison result. They are all dyadic — they compare L and R, then route accordingly.

Branch operations (br*): compare L and R, then emit data to dest_l (taken) or dest_r (not taken). Both outputs carry the data value; the branch condition selects the destination:

Mnemonic Condition
breq L == R
brgt L > R
brge L ≥ R
brof overflow

NOTE: br* ops use predicate register and internal-to-PE loopback route if supported by hardware. Future strongly-connected block execution will change the behaviour of br* ops to support pseudo-sequential execution within a PE.

Switch operations (sw*): like branch, but when the condition is true, data goes to dest_l and a trigger token (value 0) goes to dest_r. When false, trigger goes to dest_l and data goes to dest_r:

Mnemonic Condition
sweq L == R
swgt L > R
swge L ≥ R
swof overflow

Other routing:

Mnemonic Arity Description
gate dyadic pass data through if bool_out is true, suppress if false
sel dyadic select between L and R based on a condition
merge dyadic merge two token streams (non-deterministic: fires on either input)

Data#

Mnemonic Arity Description
pass monadic pass data through unchanged
const monadic emit constant value (from const field)
free_ctx monadic deallocate context slot, no data output
  • free_ctx is a special instruction used to handle function body and loop exits. It frees the context slot so it can be reused.

Structure Memory#

Mnemonic Arity Description
read monadic read from SM cell (const = cell address)
write context-dependent write to SM cell — monadic if const is set (cell addr from const), dyadic if const is None (cell addr from L operand)
clear monadic clear SM cell (reset to EMPTY state)
alloc monadic allocate SM cell
free monadic free SM cell
rd_inc monadic atomic read-and-increment
rd_dec monadic atomic read-and-decrement
cmp_sw dyadic compare-and-swap (L = expected, R = new value)
exec monadic trigger EXEC on SM (inject tokens from T0 storage into network)
raw_read monadic raw read from T0 storage (no I-structure semantics)
set_page monadic set SM page register (T0 operation)
write_imm monadic immediate write to SM cell (T0 operation)
ext monadic extended SM operation

Note: free (SM cell deallocation) and free_ctx (PE context slot deallocation) are distinct operations targeting different resources.

SM opcodes use a variable-width bus encoding. See sm-design.md for the full opcode table and encoding tiers.

Literals#

Syntax Example Description
Decimal 42 Decimal integer
Hex 0xFF Hexadecimal integer
Char 'A' Single character (ASCII value)
String "hello" String of char codes
Raw string r"no\escapes" No escape processing
Byte string b"\x00\xFF" Explicit byte values

Escape sequences (in regular strings and char literals): \n, \t, \r, \0, \\, \', \", \xHH

Multi-char packing: when multiple char values appear in a data definition, they are packed big-endian into 16-bit words:

@data|sm0:0 = 'h', 'i'    ;  0x6869 (h=0x68 in high byte, i=0x69 in low)

All data values are 16-bit unsigned.

Complete Example#

A simple program that adds two constants and routes the result across PEs:

; Hardware: 2 PEs, no structure memory
@system pe=2, sm=0

; Define nodes
&c1|pe0 <| const, 3          ; constant 3 on PE 0
&c2|pe0 <| const, 7          ; constant 7 on PE 0
&result|pe0 <| add            ; adder on PE 0
&output|pe1 <| pass           ; output relay on PE 1

; Wire the graph
&c1 |> &result:L              ; const 3  adder left input
&c2 |> &result:R              ; const 7  adder right input
&result |> &output:L          ; sum  output relay

What happens at runtime:

  1. The assembler emits two seed tokens (for &c1 and &c2) since they are CONST nodes with no incoming edges.
  2. Both tokens arrive at PE 0's matching store. &result is a dyadic instruction — it waits for both operands.
  3. When both arrive, the matching store pairs them. The left operand (3) and right operand (7) feed the ALU.
  4. The ALU computes 3 + 7 = 10 and emits a token to &output on PE 1.
  5. &output is monadic (pass) — it bypasses the matching store and immediately emits the value 10.

Structure Memory Example#

Write a value to SM, then read it back:

@system pe=3, sm=1

; Pre-initialise SM cell 5 with value 0x42
@val|sm0:5 = 0x42

; Trigger a read of cell 5
&trigger|pe0 <| const, 1
&reader|pe0 <| read, 5        ; read SM 0, cell 5
&relay|pe1 <| pass
&sink|pe2 <| pass

&trigger |> &reader:L          ; trigger the read operation
&reader |> &relay:L            ; SM result  relay
&relay |> &sink:L              ; relay  final sink

The read instruction is monadic — it takes a trigger token and issues a read request to SM. The const argument (5) specifies the cell address. The SM returns the stored value (0x42 = 66 decimal) as a token routed back to the read node's destination.

Switch Routing Example#

Branch on equality, routing data to the taken or not-taken path:

@system pe=3, sm=0

&val|pe0 <| const, 5           ; value to test
&cmp|pe0 <| const, 5           ; comparison target
&branch|pe0 <| sweq            ; switch-on-equal

&taken|pe1 <| pass
&not_taken|pe1 <| pass
&output|pe2 <| pass

; Wire inputs
&val |> &branch:L
&cmp |> &branch:R

; Wire outputs  source port qualifiers select output slot
&branch:L |> &taken:L          ; taken path (data)
&branch:R |> &not_taken:L      ; not-taken path (trigger)

; Merge for downstream
&taken |> &output:L
&not_taken |> &output:R

Since val == cmp (both 5), sweq evaluates to true: data (5) goes to dest_l (taken) and a trigger token (0) goes to dest_r (not_taken).

Auto-Placement#

Nodes without explicit |peN qualifiers are automatically placed by the assembler:

@system pe=3, sm=0

&c1 <| const, 5
&c2 <| const, 3
&result <| add
&output <| pass

&c1 |> &result:L
&c2 |> &result:R
&result |> &output:L

The assembler's greedy placer assigns PEs based on connectivity. Nodes connected by edges prefer to share a PE (minimizing cross-PE traffic). The result is functionally identical to explicit placement.

From Source to Execution#

Lowering and Resolution#

After parsing, the assembler lowers the CST to an intermediate representation (IRGraph). Names are qualified, scopes are created, and edges are validated. The resolve pass checks that every edge endpoint exists and produces suggestions for typos.

Placement and Allocation#

Unplaced nodes get PE assignments. Then the allocator assigns each node an IRAM offset and context slot. Dyadic instructions are packed at low IRAM offsets (0..D-1), monadic above (D..D+M-1). This layout matches the hardware contract: the token's offset field doubles as the matching store entry for dyadic instructions.

Context slots are assigned per function scope per PE. Each function body sharing a PE gets its own context slot, enabling concurrent activations to coexist without operand interference.

Code Generation#

The assembler currently offers two output modes:

Direct mode produces PEConfig objects (IRAM contents, route restrictions, context slot count) and SMConfig objects (initial cell values), plus seed tokens. This is the fast path for the emulator. Configuration is applied directly.

Token stream mode produces a bootstrap sequence: SM initialization writes, IRAM write tokens, then seed tokens. This mirrors the bootstrap process, loading the code stored at the reset vector.

Macros#

Macros define reusable template subgraphs that are expanded inline at their call sites. The macro system supports parameterisation, variadic arguments, repetition blocks, constant arithmetic, token pasting, opcode parameters, parameterized qualifiers, and @ret output wiring.

Macro Definition#

#macro_name param1, param2, *variadic_param |> {
    ; body — instructions and edges using ${param} substitution
    &node <| add
    ${param1} |> &node:L
    ${param2} |> &node:R
}
  • Macro names use the # sigil
  • Parameters are declared before |>
  • Variadic parameters are prefixed with * and collect remaining arguments
  • The body contains standard dfasm statements with ${param} placeholders

Parameter Substitution#

Within a macro body, ${name} references are replaced with the actual argument values during expansion:

#add_const val |> {
    &adder <| add
    &c <| const, ${val}
    &c |> &adder:R
}

Token pasting: Parameters can be combined with literal text to synthesise unique names. The ${param} reference within a label name produces a label that incorporates the argument value:

#make_pair name |> {
    &${name}_left <| pass
    &${name}_right <| pass
}

Opcode Parameters#

Parameters can appear in the opcode position of instruction definitions. This allows a single macro to work with any ALU or memory operation:

#reduce_2 op |> {
    &r <| ${op}
    &r |> @ret
}

; Usage — the opcode is passed as a bare mnemonic:
#reduce_2 add |> &result
#reduce_2 sub |> &result

Opcode arguments are passed as bare identifiers (not strings). The expand pass resolves them via MNEMONIC_TO_OP during expansion. An invalid mnemonic produces a MACRO error.

Parameterized Qualifiers#

Parameters can appear in placement (|pe0) and port (:L) positions within a macro body:

; Parameterized port
#wire_to target, port |> {
    &src <| pass
    &src |> ${target}:${port}
}
#wire_to &dest, L

; Parameterized placement
#placed_const val, pe |> {
    &c <| const, ${val} |${pe}
    &c |> @ret
}
#placed_const 42, pe0 |> &target

; Parameterized context slot
#placed_op op, pe, ctx |> {
    &n <| ${op} |${pe}[${ctx}]
    &n |> @ret
}

The expand pass resolves placement strings (e.g., "pe0"0), port strings ("L"Port.L), and context slot values to their concrete types. Invalid values produce MACRO errors.

Repetition Blocks#

The $( ),* syntax expands its body once per element of a variadic parameter. Within a repetition block, ${_idx} provides the current iteration index (0-based):

#fan_out *targets |> {
    &src <| pass
    $(
        &src |> ${targets}
    ),*
}

Constant Arithmetic#

Macro const fields support compile-time arithmetic with +, -, *, // on integer values and parameters:

#indexed_read base, *cells |> {
    $(
        &r${_idx} <| read, ${base} + ${_idx}
    ),*
}

Macro Invocation and Output Wiring (@ret)#

Macros are invoked as standalone statements. Arguments can be positional or named:

#fan_out &a:L, &b:R, &c:L
#indexed_read 10, &dest1, &dest2, &dest3
#make_pair name=foo

Output wiring with @ret: Macro bodies can define output points using @ret / @ret_name markers. At the call site, |> wires these outputs to destinations:

; Macro body defines outputs via @ret markers
#loop_counted init, limit |> {
    &counter <| add
    &compare <| brgt
    &counter |> &compare:L
    &body_fan <| pass
    &compare |> &body_fan:L
    &inc <| inc
    &body_fan |> &inc:L
    &inc |> &counter:R
    ${init} |> &counter:L
    ${limit} |> &compare:R
    &body_fan |> @ret_body       ; named output: body
    &compare |> @ret_exit:R      ; named output: exit
}

; Call with named output wiring:
#loop_counted &init, &limit |> body=&process, exit=&done

; Or positional @ret for single-output macros:
#reduce_2 add |> &result

Unlike function calls, macro @ret wiring is purely edge rewriting — the @ret_name destination is replaced with the concrete node reference from the call site. No trampolines, no cross-context routing, no free_ctx insertion. Macros inline into the caller's context.

Bare @ret maps to the first (or only) positional output. @ret_name maps to the named output name=&dest at the call site. Multiple @ret edges to different ports on the same output are valid.

Scoping#

Expanded macro names are automatically qualified to prevent collisions between multiple invocations of the same macro:

  • Top-level invocation: #macro_N.&label (N is the invocation counter)
  • Inside a function: $func.#macro_N.&label

Built-in Macros#

The following macros are automatically available in all programs (defined in asm/builtins.py):

Macro Parameters Outputs Purpose
#loop_counted init, limit body, exit Counted loop: counter + compare + increment feedback. Call with #loop_counted &init, &limit |> body=&process, exit=&done
#loop_while test body, exit Condition-tested loop: gate node. Call with #loop_while &test_src |> body=&process, exit=&done
#permit_inject *targets (none — routes directly to targets) Inject one const(1) seed per target. #permit_inject &gate_a, &gate_b
#reduce_2 op (positional) Binary reduction: 1 node. #reduce_2 add |> &result
#reduce_3 op (positional) Binary reduction tree: 2 nodes. #reduce_3 sub |> &result
#reduce_4 op (positional) Binary reduction tree: 3 nodes. #reduce_4 add |> &result

All built-in macros use @ret output wiring except #permit_inject, which routes directly to its variadic target arguments. The #reduce_* family accepts any opcode as a parameter.

Function Calls#

Function calls wire argument values across context boundaries using the expand pass. The call syntax declares which arguments feed into the callee and where results flow back.

Call Syntax#

$func_name arg1=&source1, arg2=&source2 |> @result

The function must be defined as a $name |> { ... } region. Arguments are named (matching the function's parameter labels) or positional. Outputs after |> specify where results are routed.

Multiple outputs can be named:

$divmod a=&dividend, b=&divisor |> @quotient, remainder=@remainder

What the Expand Pass Does#

When processing a function call:

  1. Allocates a fresh context slot for the callee activation
  2. Generates cross-context edges with ctx_override=True (becomes ctx_mode=01 / CTX_OVRD in hardware)
  3. Creates trampoline PASS nodes for return routing
  4. Generates FREE_CTX nodes to clean up the callee's context on completion
  5. Synthesises @ret marker nodes for return paths

Return Convention#

Inside function bodies, @ret and @ret_name are reserved markers for return points. The expand pass replaces them with return trampolines — synthetic pass nodes that route results back to the caller's context via CTX_OVRD, with auto-inserted free_ctx nodes for context teardown. Port-qualified returns (@ret:L, @ret:R) handle dual-output return nodes. Named returns (@ret_body, @ret_exit) handle multiple independent return paths, wired at the call site via name=@dest.

Note: @ret in function bodies creates trampolines with cross-context routing. @ret in macro bodies is simpler — pure edge rewriting with no context management, since macros inline into the caller's context.

Example#

@system pe=2, sm=0

; Define a function that adds two values
$add_pair |> {
    &sum <| add
}

; Call it
&a <| const, 3
&b <| const, 7
$add_pair a=&a, b=&b |> @result
@result <| pass

After expansion, the assembler generates the cross-context wiring, trampoline nodes, and context cleanup automatically. The programmer does not need to manage context slots or return routing manually.