CLAUDE.md — OR1 Dataflow CPU#
Version Control: jj (Jujutsu)#
This is a jj-colocated repository (both .jj and .git present). You MUST use jj commands, NOT git commands, for all version control operations.
Critical Rules#
- NEVER use
git add,git commit,git status,git diff, orgit log. Using raw git commands in a jj repo creates orphan commits and corrupts working copy tracking. - All file changes are automatically tracked by jj. There is no staging area.
Command Mapping#
| Instead of | Use |
|---|---|
git status |
jj status |
git diff |
jj diff |
git diff --staged |
jj diff (no staging concept) |
git log |
jj log |
git add . && git commit -m "msg" |
jj commit -m "msg" |
git add file && git commit -m "msg" |
jj commit -m "msg" (tracks all changes) |
git log --oneline -10 |
jj log --limit 10 |
git diff HEAD~3..HEAD |
jj diff --from 'ancestors(@,3)' |
git rev-parse HEAD |
jj log -r @ --no-graph -T 'commit_id ++ "\n"' |
git rev-parse HEAD~N |
jj log -r 'ancestors(@,N)' --no-graph -T 'commit_id ++ "\n"' --limit 1 --reversed |
Commit Workflow#
# Make your changes to files (no git add needed)
# Then commit all changes:
jj commit -m "feat: description of change"
# The working copy (@) is always a new empty change after committing.
# To see what you just committed:
jj log --limit 5
Bookmarks (Branches)#
# List bookmarks
jj bookmark list
# Current bookmark is 'emu'
# After committing, move bookmark forward:
jj bookmark set emu -r @-
Project Structure#
cm_inst.py— Instruction set definitions (Port, MemOp, ALUOp hierarchy, Instruction, FrameDest, OutputStyle, TokenKind, FrameOp, FrameSlotValue)tokens.py— Token type hierarchy (Token -> PEToken -> CMToken -> DyadToken/MonadToken; PEToken -> PELocalWriteToken/FrameControlToken; Token -> SMToken). Imports ISA enums from cm_inst.encoding.py— Pack/unpack boundary between semantic types and 16-bit hardware words (instruction encoding, flit1 routing words)sm_mod.py— Structure Memory cell model (Presence enum, SMCell dataclass withis_widemetadata flag)dfasm.lark— Lark grammar for dfasm graph assembly languageemu/— Behavioural emulator package (SimPy-based discrete event simulation)emu/events.py— Simulation event types:SimEventunion (TokenReceived, Matched, Executed, Emitted, IRAMWritten, FrameSlotWritten, FrameAllocated, FrameFreed, TokenRejected, CellWritten, DeferredRead, DeferredSatisfied, ResultSent) andEventCallbacktype aliasemu/types.py— Config and internal types (PEConfig, SMConfig, DeferredRead). PEConfig/SMConfig accepton_event: EventCallback | Nonefor simulation observability.emu/alu.py— Pure-function ALU:execute(op, left, right, const) -> (result, bool_out)emu/pe.py— ProcessingElement: matching store, IRAM fetch, output routing, event emission viaon_eventcallbackemu/sm.py— StructureMemory: I-structure semantics with deferred reads, event emission viaon_eventcallbackemu/network.py—build_topology()wiring andSystemcontainer; passeson_eventfrom configs to PE/SM constructorsemu/__init__.py— Public API: exportsSystem,build_topology,PEConfig,SMConfig, all event types fromemu/events
asm/— Assembler package: dfasm source to emulator-ready config (seeasm/CLAUDE.md)asm/__init__.py— Public API:assemble(),assemble_to_tokens(),run_pipeline(),round_trip(),serialize_graph()asm/ir.py— IR types (IRNode, IREdge, IRGraph, IRDataDef, IRRegion, SystemConfig, MacroDef, IRMacroCall, CallSite, etc.)asm/errors.py— Structured error types with source context (ErrorCategory includes MACRO and CALL)asm/opcodes.py— Opcode mnemonic mapping and arity classificationasm/lower.py— CST to IRGraph lowering passasm/expand.py— Macro expansion (opcode params, parameterized qualifiers,@retwiring, variadic repetition) and function call wiring passasm/builtins.py— Built-in macro library (#loop_counted,#loop_while,#permit_inject,#reduce_2/_3/_4)asm/resolve.py— Name resolution passasm/place.py— Placement validation and auto-placementasm/allocate.py— IRAM offset and context slot allocationasm/codegen.py— Code generation (direct mode + token stream mode)asm/serialize.py— IRGraph to dfasm source serializer
dfgraph/— Interactive dataflow graph renderer (seedfgraph/CLAUDE.md)dfgraph/__main__.py— CLI:python -m dfgraph path/to/file.dfasm [--port 8420]dfgraph/pipeline.py— Progressive pipeline runner (parse -> lower -> expand -> resolve -> place -> allocate with error accumulation)dfgraph/categories.py— Opcode-to-category mapping via isinstance dispatch on ALUOp hierarchydfgraph/graph_json.py— IRGraph-to-JSON conversion for frontend consumptiondfgraph/server.py— FastAPI backend with WebSocket push and file watcher (watchdog, 300ms debounce)dfgraph/frontend/— TypeScript frontend: Cytoscape.js graph with ELK layout, SVG/PNG export (usesfrontend-common/)
frontend-common/— Shared TypeScript modules extracted from dfgraph for reuse by monitor frontendfrontend-common/src/layout.ts— ELK layout configurationsfrontend-common/src/style.ts— Cytoscape style definitionsfrontend-common/src/export.ts— SVG/PNG export utilitiesfrontend-common/src/types.ts— Shared TypeScript interfaces for graph data
monitor/— Interactive simulation monitor with CLI REPL and web UI (seemonitor/CLAUDE.md)monitor/__init__.py— Public API: exportsSimulationBackend, command types, result types,StateSnapshot,capture()monitor/__main__.py— CLI:python -m monitor [path/to/file.dfasm] [--web] [--port 8421]monitor/backend.py—SimulationBackend: threaded simulation controller with command/result queue protocolmonitor/commands.py— Command types (LoadCmd, StepTickCmd, StepEventCmd, RunUntilCmd, InjectCmd, SendCmd, ResetCmd, StopCmd) and result types (GraphLoaded, StepResult, ErrorResult)monitor/snapshot.py—StateSnapshotfrozen dataclass andcapture(system)function for state extractionmonitor/graph_json.py— IRGraph + StateSnapshot to JSON serialization with execution overlaymonitor/server.py— FastAPI server with bidirectional WebSocket protocolmonitor/repl.py—MonitorREPL(cmd.Cmd)interactive CLI for simulation controlmonitor/formatting.py— ANSI colour formatting for REPL outputmonitor/frontend/— TypeScript frontend: Cytoscape.js graph with execution overlay, event log, state inspector (usesfrontend-common/)
tests/— pytest + hypothesis test suitetests/conftest.py— Hypothesis strategies for token/op generationtests/test_sm_tiers.py— T0/T1 memory tier and EXEC bootstrap teststests/test_exec_bootstrap.py— EXEC opcode acceptance criteria teststests/test_migration_cleanup.py— Verifies removed types (SysToken, CfgOp, etc.) are absent from codebasetests/test_pe_events.py— PE event emission tests (TokenReceived, Matched, Executed, Emitted, IRAMWritten, FrameAllocated, FrameFreed, FrameSlotWritten, TokenRejected)tests/test_pe_frames.py— Frame-based PE matching, routing, and lifecycle teststests/test_pe_lanes.py— Lane-based matching tests (ALLOC_SHARED, FREE_LANE, smart FREE, lane exhaustion, pipelining)tests/test_sm_events.py— SM event emission tests (CellWritten, DeferredRead, DeferredSatisfied, ResultSent)tests/test_cycle_timing.py— Cycle-accurate timing verification teststests/test_network_events.py— Network-level event propagation teststests/test_network_routing.py— Network routing and connectivity teststests/test_foundation_types.py— Foundation type (Instruction, FrameDest, encoding) teststests/test_encoding.py— Pack/unpack encoding tests for instructions and flit1 wordstests/test_ir_frame_types.py— IR frame-related type teststests/test_allocate_frames.py— Frame layout allocation teststests/test_codegen_frames.py— Frame-based code generation teststests/test_sm_t0_raw.py— SM T0 raw storage teststests/test_backend.py— SimulationBackend command/result protocol teststests/test_snapshot.py— StateSnapshot capture teststests/test_repl.py— MonitorREPL command teststests/test_monitor_graph_json.py— Monitor graph JSON serialization teststests/test_monitor_server.py— Monitor FastAPI server and WebSocket protocol tests
docs/— Design documents, implementation plans, test plans
Tech Stack#
- Python 3.12
- SimPy 4.1 (discrete event simulation)
- Lark (Earley parser for dfasm grammar)
- FastAPI + uvicorn (dfgraph and monitor web servers)
- watchdog (file system monitoring for dfgraph live reload)
- Cytoscape.js + cytoscape-elk (frontend graph rendering and layout)
- pytest + hypothesis (property-based testing)
- Nix flake for dev environment
Running Tests#
python -m pytest tests/ -v
Architecture Contracts#
Token Hierarchy (tokens.py)#
All tokens inherit from Token(target: int). The hierarchy:
PEToken(Token)-- base for all PE-targeted tokens (used for routing in network.py)CMToken(PEToken)-- addsoffset: int,act_id: int,data: int(frozen dataclass, base for data-carrying tokens)DyadToken(CMToken)-- addsport: Port(dyadic operand requiring match)MonadToken(CMToken)-- addsinline: bool(monadic operand, no match required)
PELocalWriteToken(PEToken)-- addsact_id,region,slot,data,is_dest(writes frame slots directly)FrameControlToken(PEToken)-- addsact_id,op: FrameOp,payload: int(frame alloc/free)
SMToken(Token)--addr: int,op: MemOp,flags,data,ret: Optional[CMToken]
Instruction Set (cm_inst.py)#
ALUOp(IntEnum)base with subclasses:ArithOp,LogicOp,RoutingOpMemOp(IntEnum)-- read/write/atomic ops (READ, WRITE, EXEC, ALLOC, FREE, EXT, CLEAR, RD_INC, RD_DEC, CMP_SW, RAW_READ, SET_PAGE, WRITE_IMM)Instruction(opcode, output: OutputStyle, has_const, dest_count, wide, fref)-- unified instruction type for both ALU and memory ops, stored in PE IRAMFrameDest(target_pe: int, offset: int, act_id: int, port: Port, token_kind: TokenKind)-- destination address resolved from frame slotFrameSlotValue = int | FrameDest | None-- type alias for frame slot contentsOutputStyleenum -- INHERIT, CHANGE_TAG, SINK for output routing decisionsTokenKindenum -- DYADIC, MONADIC, INLINE for token kind classificationFrameOp(IntEnum)-- ALLOC, FREE, ALLOC_SHARED, FREE_LANE for frame lifecycle control tokensis_monadic_alu(op: ALUOp) -> bool-- canonical source of truth for monadic ALU op classification (used byemu/pe.pyandasm/opcodes.py)
ALU (emu/alu.py)#
Pure function, no state. execute(op, left, right, const) -> (result: int, bool_out: bool).
Invariants:
- All results masked to 16-bit unsigned (
& 0xFFFF) - Comparisons interpret values as signed 2's complement via
to_signed() - ArithOp:
bool_outalwaysFalse - RoutingOp:
bool_outdrives branch/switch/gate decisions
Processing Element (emu/pe.py)#
Frame-based processing element with activation context management.
Frame Storage:
frames: list[list[FrameSlotValue]]-- 2D array [frame_id][slot_idx] holding FrameDest objects and constants (shared across all lanes)tag_store: dict[int, tuple[int, int]]-- maps act_id → (frame_id, lane) for activation-to-frame-and-lane lookupmatch_data: list[list[list[Optional[int]]]]-- 3D array [frame_id][match_slot][lane] for operand values waiting for partnerpresence: list[list[list[bool]]]-- 3D array [frame_id][match_slot][lane] for dyadic operand waiting stateport_store: list[list[list[Optional[Port]]]]-- 3D array [frame_id][match_slot][lane] for operand port metadatalane_count: int-- number of matching lanes per framelane_free: dict[int, set[int]]-- per-frame set of available lane IDs (created on ALLOC, deleted on full FREE)free_frames: list[int]-- pool of unallocated frame IDsiram: dict[int, Instruction]-- instruction memory indexed by offset
Token Processing Pipeline:
- Side paths (FrameControlToken, PELocalWriteToken): 1 cycle
- Dyadic CMToken: 5 cycles (dequeue + IFETCH + MATCH + EXECUTE + EMIT)
- Monadic CMToken: 4 cycles (dequeue + IFETCH + EXECUTE + EMIT)
Matching Logic:
- DyadToken arrives with act_id: look up (frame_id, lane) via tag_store
- Match slot is derived from token.offset: match_slot = token.offset % matchable_offsets
- If presence[frame_id][match_slot][lane] is False: store token.data in match_data[frame_id][match_slot][lane], store token.port in port_store[frame_id][match_slot][lane], set presence bit to True, wait for partner
- If presence[frame_id][match_slot][lane] is True: retrieve partner data and port from match_data and port_store, clear presence bit, fire instruction with both operands
- Port ordering: partner with Port.L goes to left operand; Port.R to right operand
- Match data, presence, and port storage are per-lane; frame constants/destinations (in frames) remain shared across all lanes
Output Routing (determined by Instruction.output):
OutputStyle.INHERIT-- routes to destinations specified in frame slotsOutputStyle.CHANGE_TAG-- routes with different act_id tag (context switch)OutputStyle.SINK-- writes result to frame slot, emits no token
Frame Initialization:
- PE constructor loads initial_frames from PEConfig:
dict[int, dict[int, FrameSlotValue] | list[FrameSlotValue]] - Dict value format:
{slot_idx: value}where int values are unpacked viaunpack_flit1()to FrameDest - List value format:
[slot0, slot1, ...]raw values assigned by index - Handles both codegen-produced packed integers and test-produced FrameDest objects
Output logging:
PE.output_log: listrecords every token emitted (for testing and tracing)
Frame Control Operations (_handle_frame_control):
ALLOC-- allocates a fresh frame from free_frames, assigns lane 0, initializes lane_free with remaining lanesFREE-- smart free: removes act_id from tag_store, clears lane match state. If other activations share the frame, returns lane to lane_free (frame_freed=False). If last lane, returns frame to free_frames and clears frame slots (frame_freed=True)ALLOC_SHARED-- shared allocation: looks up parent act_id (from payload) in tag_store, finds parent's frame_id, assigns next free lane from lane_free. Rejects if parent not found or no free lanesFREE_LANE-- lane-only free: removes act_id, clears lane match state, returns lane to lane_free. Never returns frame to free_frames (frame_freed always False)
ALLOC_REMOTE (RoutingOp in _run pipeline):
- Reads fref+0 (target PE), fref+1 (target act_id), fref+2 (parent act_id) from frame constants
- If fref+2 is non-zero: emits FrameControlToken with ALLOC_SHARED op and parent act_id as payload
- If fref+2 is zero: emits FrameControlToken with ALLOC op (fresh frame allocation)
PELocalWriteToken handling:
- Writes data to frame slot at specified region/slot within the act_id's frame (1 cycle)
Structure Memory (emu/sm.py)#
SimPy process with I-structure (single-assignment) semantics.
Cell states (Presence enum): EMPTY, RESERVED, FULL, WAITING
Cycle-accurate pipeline timing:
- Dequeue: 1 cycle (yields
env.timeout(1)afterinput_store.get()) - Each handler yields 1 cycle for its processing stage (process/write/read-modify-write cycle)
_send_result()yields 1 cycle for the response/delivery cycle (blocks the SM while delivering)- Total varies by operation: READ on FULL = dequeue(1) + process(1) + result(1) + put = 3+ cycles; WRITE = dequeue(1) + write(1) = 2 cycles
Deferred read contract:
- READ on non-FULL cell: sets cell to
WAITING, storesDeferredRead - Subsequent WRITE to that cell: satisfies deferred read, sends result via return route
- Only one deferred read at a time per SM instance
- CLEAR on a WAITING cell cancels the deferred read
Atomic operations (RD_INC, RD_DEC, CMP_SW):
- Restricted to cell addresses <
ATOMIC_CELL_LIMIT(256) - Cell must be
FULL; returns old value via return route - CMP_SW: compares
token.flags(expected) with current; swaps totoken.dataon match
Memory Tiers:
- T1 (below tier_boundary): Per-SM I-structure cells with presence tracking, deferred reads, atomic ops. Default tier_boundary: 256.
- T0 (at/above tier_boundary): Shared raw storage across all SMs. No presence tracking.
list[Token]shared by all SM instances. - T0 operations: READ (immediate return), WRITE (no presence check), EXEC (inject tokens from T0 into network)
- I-structure ops on T0 addresses are errors (logged and dropped)
Network Topology (emu/network.py)#
build_topology(env, pe_configs, sm_configs, fifo_capacity) -> System
Wiring contract:
- Every PE gets a
route_tablemappingpe_id -> simpy.Storefor all PEs - Every PE gets
sm_routesmappingsm_id -> simpy.Storefor all SMs - Every SM gets a
route_tablemappingpe_id -> simpy.Storefor all PEs - Default is full-mesh connectivity (any PE can send to any PE or SM)
- If
PEConfig.allowed_pe_routesorallowed_sm_routesis set,build_topologyrestricts routes at construction time
System API:
System.inject(token: Token)-- route token by type: SMToken → target SM, PEToken → target PE (direct append, bypasses FIFO)System.send(token: Token)-- same routing as inject() but yieldsenv.timeout(1)for 1-cycle delivery latency thenstore.put()(SimPy generator, respects FIFO backpressure)System.load(tokens: list[Token])-- spawns SimPy process that calls send() for each token in order
PEConfig (emu/types.py):
pe_id: int,iram: dict[int, Instruction] | None,frame_count: int = 8,frame_slots: int = 64,matchable_offsets: int = 8,lane_count: int = 4initial_frames: Optional[dict[int, list[FrameSlotValue]]]-- pre-loaded frame datainitial_tag_store: Optional[dict[int, tuple[int, int]]]-- pre-loaded act_id → (frame_id, lane) mappingsallowed_pe_routes: Optional[set[int]]-- if set, restrict PE route_table to these PE IDsallowed_sm_routes: Optional[set[int]]-- if set, restrict PE sm_routes to these SM IDson_event: EventCallback | None-- if set, PE firesSimEventfor every token receive, match, execute, emit, frame alloc/free, slot write, and rejection
SMConfig (emu/types.py):
sm_id: int,cell_count: int = 512,initial_cells: Optional[dict],tier_boundary: int = 256on_event: EventCallback | None-- if set, SM firesSimEventfor every token receive, cell write, deferred read/satisfy, and result sendtier_boundarycontrols the T0/T1 split: addresses below are T1 (I-structure), at/above are T0 (shared raw storage)- All SM instances share the same
t0_store: list[Token](wired bybuild_topology)
Simulation Events (emu/events.py)#
Frozen dataclass event types emitted by PE and SM when on_event callback is set.
Event types:
TokenReceived(time, component, token)-- PE/SM received a tokenMatched(time, component, left, right, act_id, offset, frame_id)-- PE matched a dyadic pairExecuted(time, component, op, result, bool_out)-- PE executed an ALU instructionEmitted(time, component, token)-- PE emitted an output tokenIRAMWritten(time, component, offset, count)-- PE wrote instructions to IRAMFrameAllocated(time, component, act_id, frame_id, lane)-- PE allocated a frame (lane indicates which matching lane was assigned)FrameFreed(time, component, act_id, frame_id, lane, frame_freed)-- PE freed a frame lane (frame_freed=True if physical frame returned to pool)FrameSlotWritten(time, component, frame_id, slot, value)-- PE wrote to a frame slotTokenRejected(time, component, token, reason)-- PE rejected a token (e.g., act_id not in tag store)CellWritten(time, component, addr, old_pres, new_pres)-- SM cell presence changedDeferredRead(time, component, addr)-- SM registered a deferred readDeferredSatisfied(time, component, addr, data)-- SM satisfied a deferred readResultSent(time, component, token)-- SM sent a result token back
Union type: SimEvent = TokenReceived | Matched | Executed | Emitted | IRAMWritten | FrameAllocated | FrameFreed | FrameSlotWritten | TokenRejected | CellWritten | DeferredRead | DeferredSatisfied | ResultSent
Callback type: EventCallback = Callable[[SimEvent], None]
Invariants:
componentfield is always"pe:{id}"or"sm:{id}"format- All events have
time: floatmatchingenv.nowwhen the event occurred - Events are emitted synchronously within the SimPy process step; no buffering
Monitor (monitor/)#
Interactive simulation monitor providing both CLI REPL and web UI for controlling and observing OR1 simulations.
Command/Result protocol (monitor/commands.py):
- Commands:
LoadCmd(source),StepTickCmd(),StepEventCmd(),RunUntilCmd(until),InjectCmd(token),SendCmd(token),ResetCmd(reload),StopCmd() - Results:
GraphLoaded(ir_graph, snapshot),StepResult(events, snapshot, sim_time, finished),ErrorResult(message, errors) SimCommandunion:LoadCmd | StepTickCmd | StepEventCmd | RunUntilCmd | InjectCmd | SendCmd | ResetCmd | StopCmd
SimulationBackend (monitor/backend.py):
- Owns SimPy environment in a dedicated daemon thread
start()/stop()lifecycle;send_command(cmd, timeout) -> result- Wires
on_eventcallback into all PEConfig/SMConfig duringLoadCmdhandling - Reset with
reload=Truereloads last source;reload=Falsetears down and awaits newLoadCmd
StateSnapshot (monitor/snapshot.py):
capture(system) -> StateSnapshotreads live PE/SM state into frozen dataclassesStateSnapshot(sim_time, next_time, pes: dict[int, PESnapshot], sms: dict[int, SMSnapshot])PESnapshot(pe_id, iram, frames, tag_store, presence, port_store, match_data, free_frames, lane_count, input_queue, output_log)-- frame-based PE state with 3D match storage (presence, port_store, match_data are all [frame_id][match_slot][lane]), tag_store mapping act_id → (frame_id, lane) tuples, and lane_count fieldSMSnapshot(sm_id, cells: dict[int, SMCellSnapshot], deferred_read, t0_store, input_queue)
WebSocket protocol (monitor/server.py):
- Client sends JSON commands:
{"type": "load", "source": "..."},{"type": "step_tick"}, etc. - Server responds with JSON results containing event arrays and state snapshots
create_app(backend) -> FastAPIfactory function
CLI REPL (monitor/repl.py):
MonitorREPL(cmd.Cmd)with commands: load, step, event, run, inject, send, reset, pe, sm, state, quit
Module Dependency Graph#
cm_inst.py defines ISA enums and instruction types (no dependencies). tokens.py imports from cm_inst.py and defines the token hierarchy. sm_mod.py is independent. emu/events.py imports from cm_inst, sm_mod, and tokens. The emu/ package imports from root-level modules but root-level modules never import from emu/. The asm/ package imports from both root-level modules and emu/types.py (for PEConfig/SMConfig), but neither root-level modules nor emu/ import from asm/. The dfgraph/ package imports from cm_inst, asm/ (ir, lower, resolve, place, allocate, errors, opcodes), and internally between its own modules. The monitor/ package imports from cm_inst, tokens, sm_mod, emu/ (events, types, network), asm/ (ir, codegen, run_pipeline), and dfgraph/ (categories). Neither root-level modules, emu/, nor asm/ import from dfgraph/ or monitor/.
cm_inst.py <-- tokens.py <-- emu/events.py <-- emu/types.py
| | | |
v v v v
emu/alu.py sm_mod.py emu/pe.py <--> emu/sm.py
\ /
emu/network.py
^
|
asm/ir.py <-- asm/opcodes.py asm/codegen.py
| | |
v v v
asm/lower.py asm/resolve.py asm/allocate.py
| |
| asm/place.py
| |
+--- dfgraph/pipeline.py ----------+
| |
| dfgraph/categories.py dfgraph/graph_json.py
| (cm_inst) (asm/ir, asm/opcodes)
| |
| dfgraph/server.py
| |
| dfgraph/frontend/
| |
+--- monitor/backend.py --------+ |
| (asm, emu) | |
| v |
+--- monitor/snapshot.py monitor/commands.py
| (emu/network) |
| v
+--- monitor/graph_json.py -----+
| (asm/ir, dfgraph/categories, emu/events)
| |
+--- monitor/server.py ---------+
| (FastAPI, WebSocket)
| |
+--- monitor/repl.py monitor/frontend/
(cmd.Cmd)