CLAUDE.md — OR1 Dataflow CPU#

Version Control: jj (Jujutsu)#

This is a jj-colocated repository (both .jj and .git present). You MUST use jj commands, NOT git commands, for all version control operations.

Critical Rules#

NEVER use git add, git commit, git status, git diff, or git log. Using raw git commands in a jj repo creates orphan commits and corrupts working copy tracking.
All file changes are automatically tracked by jj. There is no staging area.

Command Mapping#

Instead of	Use
`git status`	`jj status`
`git diff`	`jj diff`
`git diff --staged`	`jj diff` (no staging concept)
`git log`	`jj log`
`git add . && git commit -m "msg"`	`jj commit -m "msg"`
`git add file && git commit -m "msg"`	`jj commit -m "msg"` (tracks all changes)
`git log --oneline -10`	`jj log --limit 10`
`git diff HEAD~3..HEAD`	`jj diff --from 'ancestors(@,3)'`
`git rev-parse HEAD`	`jj log -r @ --no-graph -T 'commit_id ++ "\n"'`
`git rev-parse HEAD~N`	`jj log -r 'ancestors(@,N)' --no-graph -T 'commit_id ++ "\n"' --limit 1 --reversed`

Commit Workflow#

# Make your changes to files (no git add needed)
# Then commit all changes:
jj commit -m "feat: description of change"

# The working copy (@) is always a new empty change after committing.
# To see what you just committed:
jj log --limit 5

Bookmarks (Branches)#

# List bookmarks
jj bookmark list

# Current bookmark is 'emu'
# After committing, move bookmark forward:
jj bookmark set emu -r @-

Project Structure#

cm_inst.py — Instruction set definitions (Port, MemOp, ALUOp hierarchy, Instruction, FrameDest, OutputStyle, TokenKind, FrameOp, FrameSlotValue)
tokens.py — Token type hierarchy (Token -> PEToken -> CMToken -> DyadToken/MonadToken; PEToken -> PELocalWriteToken/FrameControlToken; Token -> SMToken). Imports ISA enums from cm_inst.
encoding.py — Pack/unpack boundary between semantic types and 16-bit hardware words (instruction encoding, flit1 routing words)
sm_mod.py — Structure Memory cell model (Presence enum, SMCell dataclass with is_wide metadata flag)
dfasm.lark — Lark grammar for dfasm graph assembly language
emu/ — Behavioural emulator package (SimPy-based discrete event simulation)
- emu/events.py — Simulation event types: SimEvent union (TokenReceived, Matched, Executed, Emitted, IRAMWritten, FrameSlotWritten, FrameAllocated, FrameFreed, TokenRejected, CellWritten, DeferredRead, DeferredSatisfied, ResultSent) and EventCallback type alias
- emu/types.py — Config and internal types (PEConfig, SMConfig, DeferredRead). PEConfig/SMConfig accept on_event: EventCallback | None for simulation observability.
- emu/alu.py — Pure-function ALU: execute(op, left, right, const) -> (result, bool_out)
- emu/pe.py — ProcessingElement: matching store, IRAM fetch, output routing, event emission via on_event callback
- emu/sm.py — StructureMemory: I-structure semantics with deferred reads, event emission via on_event callback
- emu/network.py — build_topology() wiring and System container; passes on_event from configs to PE/SM constructors
- emu/__init__.py — Public API: exports System, build_topology, PEConfig, SMConfig, all event types from emu/events
asm/ — Assembler package: dfasm source to emulator-ready config (see asm/CLAUDE.md)
- asm/__init__.py — Public API: assemble(), assemble_to_tokens(), run_pipeline(), round_trip(), serialize_graph()
- asm/ir.py — IR types (IRNode, IREdge, IRGraph, IRDataDef, IRRegion, SystemConfig, MacroDef, IRMacroCall, CallSite, etc.)
- asm/errors.py — Structured error types with source context (ErrorCategory includes MACRO and CALL)
- asm/opcodes.py — Opcode mnemonic mapping and arity classification
- asm/lower.py — CST to IRGraph lowering pass
- asm/expand.py — Macro expansion (opcode params, parameterized qualifiers, @ret wiring, variadic repetition) and function call wiring pass
- asm/builtins.py — Built-in macro library (#loop_counted, #loop_while, #permit_inject, #reduce_2/_3/_4)
- asm/resolve.py — Name resolution pass
- asm/place.py — Placement validation and auto-placement
- asm/allocate.py — IRAM offset and context slot allocation
- asm/codegen.py — Code generation (direct mode + token stream mode)
- asm/serialize.py — IRGraph to dfasm source serializer
dfgraph/ — Interactive dataflow graph renderer (see dfgraph/CLAUDE.md)
- dfgraph/__main__.py — CLI: python -m dfgraph path/to/file.dfasm [--port 8420]
- dfgraph/pipeline.py — Progressive pipeline runner (parse -> lower -> expand -> resolve -> place -> allocate with error accumulation)
- dfgraph/categories.py — Opcode-to-category mapping via isinstance dispatch on ALUOp hierarchy
- dfgraph/graph_json.py — IRGraph-to-JSON conversion for frontend consumption
- dfgraph/server.py — FastAPI backend with WebSocket push and file watcher (watchdog, 300ms debounce)
- dfgraph/frontend/ — TypeScript frontend: Cytoscape.js graph with ELK layout, SVG/PNG export (uses frontend-common/)
frontend-common/ — Shared TypeScript modules extracted from dfgraph for reuse by monitor frontend
- frontend-common/src/layout.ts — ELK layout configurations
- frontend-common/src/style.ts — Cytoscape style definitions
- frontend-common/src/export.ts — SVG/PNG export utilities
- frontend-common/src/types.ts — Shared TypeScript interfaces for graph data
monitor/ — Interactive simulation monitor with CLI REPL and web UI (see monitor/CLAUDE.md)
- monitor/__init__.py — Public API: exports SimulationBackend, command types, result types, StateSnapshot, capture()
- monitor/__main__.py — CLI: python -m monitor [path/to/file.dfasm] [--web] [--port 8421]
- monitor/backend.py — SimulationBackend: threaded simulation controller with command/result queue protocol
- monitor/commands.py — Command types (LoadCmd, StepTickCmd, StepEventCmd, RunUntilCmd, InjectCmd, SendCmd, ResetCmd, StopCmd) and result types (GraphLoaded, StepResult, ErrorResult)
- monitor/snapshot.py — StateSnapshot frozen dataclass and capture(system) function for state extraction
- monitor/graph_json.py — IRGraph + StateSnapshot to JSON serialization with execution overlay
- monitor/server.py — FastAPI server with bidirectional WebSocket protocol
- monitor/repl.py — MonitorREPL(cmd.Cmd) interactive CLI for simulation control
- monitor/formatting.py — ANSI colour formatting for REPL output
- monitor/frontend/ — TypeScript frontend: Cytoscape.js graph with execution overlay, event log, state inspector (uses frontend-common/)
tests/ — pytest + hypothesis test suite
- tests/conftest.py — Hypothesis strategies for token/op generation
- tests/test_sm_tiers.py — T0/T1 memory tier and EXEC bootstrap tests
- tests/test_exec_bootstrap.py — EXEC opcode acceptance criteria tests
- tests/test_migration_cleanup.py — Verifies removed types (SysToken, CfgOp, etc.) are absent from codebase
- tests/test_pe_events.py — PE event emission tests (TokenReceived, Matched, Executed, Emitted, IRAMWritten, FrameAllocated, FrameFreed, FrameSlotWritten, TokenRejected)
- tests/test_pe_frames.py — Frame-based PE matching, routing, and lifecycle tests
- tests/test_pe_lanes.py — Lane-based matching tests (ALLOC_SHARED, FREE_LANE, smart FREE, lane exhaustion, pipelining)
- tests/test_sm_events.py — SM event emission tests (CellWritten, DeferredRead, DeferredSatisfied, ResultSent)
- tests/test_cycle_timing.py — Cycle-accurate timing verification tests
- tests/test_network_events.py — Network-level event propagation tests
- tests/test_network_routing.py — Network routing and connectivity tests
- tests/test_foundation_types.py — Foundation type (Instruction, FrameDest, encoding) tests
- tests/test_encoding.py — Pack/unpack encoding tests for instructions and flit1 words
- tests/test_ir_frame_types.py — IR frame-related type tests
- tests/test_allocate_frames.py — Frame layout allocation tests
- tests/test_codegen_frames.py — Frame-based code generation tests
- tests/test_sm_t0_raw.py — SM T0 raw storage tests
- tests/test_backend.py — SimulationBackend command/result protocol tests
- tests/test_snapshot.py — StateSnapshot capture tests
- tests/test_repl.py — MonitorREPL command tests
- tests/test_monitor_graph_json.py — Monitor graph JSON serialization tests
- tests/test_monitor_server.py — Monitor FastAPI server and WebSocket protocol tests
docs/ — Design documents, implementation plans, test plans

Tech Stack#

Python 3.12
SimPy 4.1 (discrete event simulation)
Lark (Earley parser for dfasm grammar)
FastAPI + uvicorn (dfgraph and monitor web servers)
watchdog (file system monitoring for dfgraph live reload)
Cytoscape.js + cytoscape-elk (frontend graph rendering and layout)
pytest + hypothesis (property-based testing)
Nix flake for dev environment

Running Tests#

python -m pytest tests/ -v

Architecture Contracts#

Token Hierarchy (tokens.py)#

All tokens inherit from Token(target: int). The hierarchy:

PEToken(Token) -- base for all PE-targeted tokens (used for routing in network.py)
- CMToken(PEToken) -- adds offset: int, act_id: int, data: int (frozen dataclass, base for data-carrying tokens)
  - DyadToken(CMToken) -- adds port: Port (dyadic operand requiring match)
  - MonadToken(CMToken) -- adds inline: bool (monadic operand, no match required)
- PELocalWriteToken(PEToken) -- adds act_id, region, slot, data, is_dest (writes frame slots directly)
- FrameControlToken(PEToken) -- adds act_id, op: FrameOp, payload: int (frame alloc/free)
SMToken(Token) -- addr: int, op: MemOp, flags, data, ret: Optional[CMToken]

Instruction Set (cm_inst.py)#

ALUOp(IntEnum) base with subclasses: ArithOp, LogicOp, RoutingOp
MemOp(IntEnum) -- read/write/atomic ops (READ, WRITE, EXEC, ALLOC, FREE, EXT, CLEAR, RD_INC, RD_DEC, CMP_SW, RAW_READ, SET_PAGE, WRITE_IMM)
Instruction(opcode, output: OutputStyle, has_const, dest_count, wide, fref) -- unified instruction type for both ALU and memory ops, stored in PE IRAM
FrameDest(target_pe: int, offset: int, act_id: int, port: Port, token_kind: TokenKind) -- destination address resolved from frame slot
FrameSlotValue = int | FrameDest | None -- type alias for frame slot contents
OutputStyle enum -- INHERIT, CHANGE_TAG, SINK for output routing decisions
TokenKind enum -- DYADIC, MONADIC, INLINE for token kind classification
FrameOp(IntEnum) -- ALLOC, FREE, ALLOC_SHARED, FREE_LANE for frame lifecycle control tokens
is_monadic_alu(op: ALUOp) -> bool -- canonical source of truth for monadic ALU op classification (used by emu/pe.py and asm/opcodes.py)

ALU (emu/alu.py)#

Pure function, no state. execute(op, left, right, const) -> (result: int, bool_out: bool).

Invariants:

All results masked to 16-bit unsigned (& 0xFFFF)
Comparisons interpret values as signed 2's complement via to_signed()
ArithOp: bool_out always False
RoutingOp: bool_out drives branch/switch/gate decisions

Processing Element (emu/pe.py)#

Frame-based processing element with activation context management.

Frame Storage:

frames: list[list[FrameSlotValue]] -- 2D array [frame_id][slot_idx] holding FrameDest objects and constants (shared across all lanes)
tag_store: dict[int, tuple[int, int]] -- maps act_id → (frame_id, lane) for activation-to-frame-and-lane lookup
match_data: list[list[list[Optional[int]]]] -- 3D array [frame_id][match_slot][lane] for operand values waiting for partner
presence: list[list[list[bool]]] -- 3D array [frame_id][match_slot][lane] for dyadic operand waiting state
port_store: list[list[list[Optional[Port]]]] -- 3D array [frame_id][match_slot][lane] for operand port metadata
lane_count: int -- number of matching lanes per frame
lane_free: dict[int, set[int]] -- per-frame set of available lane IDs (created on ALLOC, deleted on full FREE)
free_frames: list[int] -- pool of unallocated frame IDs
iram: dict[int, Instruction] -- instruction memory indexed by offset

Token Processing Pipeline:

Side paths (FrameControlToken, PELocalWriteToken): 1 cycle
Dyadic CMToken: 5 cycles (dequeue + IFETCH + MATCH + EXECUTE + EMIT)
Monadic CMToken: 4 cycles (dequeue + IFETCH + EXECUTE + EMIT)

Matching Logic:

DyadToken arrives with act_id: look up (frame_id, lane) via tag_store
Match slot is derived from token.offset: match_slot = token.offset % matchable_offsets
If presence[frame_id][match_slot][lane] is False: store token.data in match_data[frame_id][match_slot][lane], store token.port in port_store[frame_id][match_slot][lane], set presence bit to True, wait for partner
If presence[frame_id][match_slot][lane] is True: retrieve partner data and port from match_data and port_store, clear presence bit, fire instruction with both operands
Port ordering: partner with Port.L goes to left operand; Port.R to right operand
Match data, presence, and port storage are per-lane; frame constants/destinations (in frames) remain shared across all lanes

Output Routing (determined by Instruction.output):

OutputStyle.INHERIT -- routes to destinations specified in frame slots
OutputStyle.CHANGE_TAG -- routes with different act_id tag (context switch)
OutputStyle.SINK -- writes result to frame slot, emits no token

Frame Initialization:

PE constructor loads initial_frames from PEConfig: dict[int, dict[int, FrameSlotValue] | list[FrameSlotValue]]
Dict value format: {slot_idx: value} where int values are unpacked via unpack_flit1() to FrameDest
List value format: [slot0, slot1, ...] raw values assigned by index
Handles both codegen-produced packed integers and test-produced FrameDest objects

Output logging:

PE.output_log: list records every token emitted (for testing and tracing)

Frame Control Operations (_handle_frame_control):

ALLOC -- allocates a fresh frame from free_frames, assigns lane 0, initializes lane_free with remaining lanes
FREE -- smart free: removes act_id from tag_store, clears lane match state. If other activations share the frame, returns lane to lane_free (frame_freed=False). If last lane, returns frame to free_frames and clears frame slots (frame_freed=True)
ALLOC_SHARED -- shared allocation: looks up parent act_id (from payload) in tag_store, finds parent's frame_id, assigns next free lane from lane_free. Rejects if parent not found or no free lanes
FREE_LANE -- lane-only free: removes act_id, clears lane match state, returns lane to lane_free. Never returns frame to free_frames (frame_freed always False)

ALLOC_REMOTE (RoutingOp in _run pipeline):

Reads fref+0 (target PE), fref+1 (target act_id), fref+2 (parent act_id) from frame constants
If fref+2 is non-zero: emits FrameControlToken with ALLOC_SHARED op and parent act_id as payload
If fref+2 is zero: emits FrameControlToken with ALLOC op (fresh frame allocation)

PELocalWriteToken handling:

Writes data to frame slot at specified region/slot within the act_id's frame (1 cycle)

Structure Memory (emu/sm.py)#

SimPy process with I-structure (single-assignment) semantics.

Cell states (Presence enum): EMPTY, RESERVED, FULL, WAITING

Cycle-accurate pipeline timing:

Dequeue: 1 cycle (yields env.timeout(1) after input_store.get())
Each handler yields 1 cycle for its processing stage (process/write/read-modify-write cycle)
_send_result() yields 1 cycle for the response/delivery cycle (blocks the SM while delivering)
Total varies by operation: READ on FULL = dequeue(1) + process(1) + result(1) + put = 3+ cycles; WRITE = dequeue(1) + write(1) = 2 cycles

Deferred read contract:

READ on non-FULL cell: sets cell to WAITING, stores DeferredRead
Subsequent WRITE to that cell: satisfies deferred read, sends result via return route
Only one deferred read at a time per SM instance
CLEAR on a WAITING cell cancels the deferred read

Atomic operations (RD_INC, RD_DEC, CMP_SW):

Restricted to cell addresses < ATOMIC_CELL_LIMIT (256)
Cell must be FULL; returns old value via return route
CMP_SW: compares token.flags (expected) with current; swaps to token.data on match

Memory Tiers:

T1 (below tier_boundary): Per-SM I-structure cells with presence tracking, deferred reads, atomic ops. Default tier_boundary: 256.
T0 (at/above tier_boundary): Shared raw storage across all SMs. No presence tracking. list[Token] shared by all SM instances.
T0 operations: READ (immediate return), WRITE (no presence check), EXEC (inject tokens from T0 into network)
I-structure ops on T0 addresses are errors (logged and dropped)

Network Topology (emu/network.py)#

build_topology(env, pe_configs, sm_configs, fifo_capacity) -> System

Wiring contract:

Every PE gets a route_table mapping pe_id -> simpy.Store for all PEs
Every PE gets sm_routes mapping sm_id -> simpy.Store for all SMs
Every SM gets a route_table mapping pe_id -> simpy.Store for all PEs
Default is full-mesh connectivity (any PE can send to any PE or SM)
If PEConfig.allowed_pe_routes or allowed_sm_routes is set, build_topology restricts routes at construction time

System API:

System.inject(token: Token) -- route token by type: SMToken → target SM, PEToken → target PE (direct append, bypasses FIFO)
System.send(token: Token) -- same routing as inject() but yields env.timeout(1) for 1-cycle delivery latency then store.put() (SimPy generator, respects FIFO backpressure)
System.load(tokens: list[Token]) -- spawns SimPy process that calls send() for each token in order

PEConfig (emu/types.py):

pe_id: int, iram: dict[int, Instruction] | None, frame_count: int = 8, frame_slots: int = 64, matchable_offsets: int = 8, lane_count: int = 4
initial_frames: Optional[dict[int, list[FrameSlotValue]]] -- pre-loaded frame data
initial_tag_store: Optional[dict[int, tuple[int, int]]] -- pre-loaded act_id → (frame_id, lane) mappings
allowed_pe_routes: Optional[set[int]] -- if set, restrict PE route_table to these PE IDs
allowed_sm_routes: Optional[set[int]] -- if set, restrict PE sm_routes to these SM IDs
on_event: EventCallback | None -- if set, PE fires SimEvent for every token receive, match, execute, emit, frame alloc/free, slot write, and rejection

SMConfig (emu/types.py):

sm_id: int, cell_count: int = 512, initial_cells: Optional[dict], tier_boundary: int = 256
on_event: EventCallback | None -- if set, SM fires SimEvent for every token receive, cell write, deferred read/satisfy, and result send
tier_boundary controls the T0/T1 split: addresses below are T1 (I-structure), at/above are T0 (shared raw storage)
All SM instances share the same t0_store: list[Token] (wired by build_topology)

Simulation Events (emu/events.py)#

Frozen dataclass event types emitted by PE and SM when on_event callback is set.

Event types:

TokenReceived(time, component, token) -- PE/SM received a token
Matched(time, component, left, right, act_id, offset, frame_id) -- PE matched a dyadic pair
Executed(time, component, op, result, bool_out) -- PE executed an ALU instruction
Emitted(time, component, token) -- PE emitted an output token
IRAMWritten(time, component, offset, count) -- PE wrote instructions to IRAM
FrameAllocated(time, component, act_id, frame_id, lane) -- PE allocated a frame (lane indicates which matching lane was assigned)
FrameFreed(time, component, act_id, frame_id, lane, frame_freed) -- PE freed a frame lane (frame_freed=True if physical frame returned to pool)
FrameSlotWritten(time, component, frame_id, slot, value) -- PE wrote to a frame slot
TokenRejected(time, component, token, reason) -- PE rejected a token (e.g., act_id not in tag store)
CellWritten(time, component, addr, old_pres, new_pres) -- SM cell presence changed
DeferredRead(time, component, addr) -- SM registered a deferred read
DeferredSatisfied(time, component, addr, data) -- SM satisfied a deferred read
ResultSent(time, component, token) -- SM sent a result token back

Callback type: EventCallback = Callable[[SimEvent], None]

Invariants:

component field is always "pe:{id}" or "sm:{id}" format
All events have time: float matching env.now when the event occurred
Events are emitted synchronously within the SimPy process step; no buffering

Monitor (monitor/)#

Interactive simulation monitor providing both CLI REPL and web UI for controlling and observing OR1 simulations.

Command/Result protocol (monitor/commands.py):

Commands: LoadCmd(source), StepTickCmd(), StepEventCmd(), RunUntilCmd(until), InjectCmd(token), SendCmd(token), ResetCmd(reload), StopCmd()
Results: GraphLoaded(ir_graph, snapshot), StepResult(events, snapshot, sim_time, finished), ErrorResult(message, errors)
SimCommand union: LoadCmd | StepTickCmd | StepEventCmd | RunUntilCmd | InjectCmd | SendCmd | ResetCmd | StopCmd

SimulationBackend (monitor/backend.py):

Owns SimPy environment in a dedicated daemon thread
start() / stop() lifecycle; send_command(cmd, timeout) -> result
Wires on_event callback into all PEConfig/SMConfig during LoadCmd handling
Reset with reload=True reloads last source; reload=False tears down and awaits new LoadCmd

StateSnapshot (monitor/snapshot.py):

capture(system) -> StateSnapshot reads live PE/SM state into frozen dataclasses
StateSnapshot(sim_time, next_time, pes: dict[int, PESnapshot], sms: dict[int, SMSnapshot])
PESnapshot(pe_id, iram, frames, tag_store, presence, port_store, match_data, free_frames, lane_count, input_queue, output_log) -- frame-based PE state with 3D match storage (presence, port_store, match_data are all [frame_id][match_slot][lane]), tag_store mapping act_id → (frame_id, lane) tuples, and lane_count field
SMSnapshot(sm_id, cells: dict[int, SMCellSnapshot], deferred_read, t0_store, input_queue)

WebSocket protocol (monitor/server.py):

Client sends JSON commands: {"type": "load", "source": "..."}, {"type": "step_tick"}, etc.
Server responds with JSON results containing event arrays and state snapshots
create_app(backend) -> FastAPI factory function

CLI REPL (monitor/repl.py):

MonitorREPL(cmd.Cmd) with commands: load, step, event, run, inject, send, reset, pe, sm, state, quit

Module Dependency Graph#

cm_inst.py defines ISA enums and instruction types (no dependencies). tokens.py imports from cm_inst.py and defines the token hierarchy. sm_mod.py is independent. emu/events.py imports from cm_inst, sm_mod, and tokens. The emu/ package imports from root-level modules but root-level modules never import from emu/. The asm/ package imports from both root-level modules and emu/types.py (for PEConfig/SMConfig), but neither root-level modules nor emu/ import from asm/. The dfgraph/ package imports from cm_inst, asm/ (ir, lower, resolve, place, allocate, errors, opcodes), and internally between its own modules. The monitor/ package imports from cm_inst, tokens, sm_mod, emu/ (events, types, network), asm/ (ir, codegen, run_pipeline), and dfgraph/ (categories). Neither root-level modules, emu/, nor asm/ import from dfgraph/ or monitor/.

cm_inst.py  <--  tokens.py  <--  emu/events.py  <--  emu/types.py
    |               |                  |                    |
    v               v                  v                    v
 emu/alu.py     sm_mod.py         emu/pe.py  <-->  emu/sm.py
                                       \           /
                                     emu/network.py
                                           ^
                                           |
asm/ir.py  <--  asm/opcodes.py        asm/codegen.py
    |               |                      |
    v               v                      v
asm/lower.py    asm/resolve.py    asm/allocate.py
    |                                  |
    |                             asm/place.py
    |                                  |
    +--- dfgraph/pipeline.py ----------+
    |         |
    |   dfgraph/categories.py    dfgraph/graph_json.py
    |        (cm_inst)               (asm/ir, asm/opcodes)
    |                                      |
    |                             dfgraph/server.py
    |                                      |
    |                             dfgraph/frontend/
    |                                      |
    +--- monitor/backend.py --------+      |
    |        (asm, emu)             |      |
    |                               v      |
    +--- monitor/snapshot.py    monitor/commands.py
    |        (emu/network)          |
    |                               v
    +--- monitor/graph_json.py -----+
    |    (asm/ir, dfgraph/categories, emu/events)
    |                               |
    +--- monitor/server.py ---------+
    |        (FastAPI, WebSocket)
    |                               |
    +--- monitor/repl.py           monitor/frontend/
         (cmd.Cmd)