# Emulator Redesign Plan: Frame-Based PE Model This document describes the changes required in the OR1 emulator code to match the frame-based PE redesign specified in `design-notes/pe-design.md` (sections 3-11) and `design-notes/architecture-overview.md` (token format section). It is scoped to the emulator (`emu/`, `tokens.py`, `cm_inst.py`) and immediate downstream consumers (`monitor/`). Assembler changes are out of scope. ## 1. Overview The current emulator models a context/generation-based matching scheme with flat `matching_store[ctx][offset]` arrays, generation counters for ABA protection, and `Addr`-based output routing that constructs tokens at emit time. The target design replaces this with: - **Frame-based matching**: a tag store maps `activation_id` to `frame_id`; presence/port metadata is per-frame, per-matchable-offset. Operands, constants, and destinations all live in frame slots (flat SRAM array). - **Reversed pipeline**: IFETCH runs before MATCH, so the instruction word drives match behaviour. - **Unified instruction format**: 16-bit `[type:1][opcode:5][mode:3][wide:1][fref:6]` replaces the separate `ALUInst`/`SMInst` dataclasses. Mode encodes output routing (INHERIT/CHANGE_TAG/SINK) and frame access pattern. - **Pre-formed destinations**: output flit 1 is a 16-bit value read from a frame slot, not constructed from `Addr` fields at emit time. - **New token types**: `FrameControlToken` (ALLOC/FREE) and `PELocalWriteToken` (IRAM write + frame slot write) replace `IRAMWriteToken`. - **`ctx`/`gen` replaced by `act_id`**: 3-bit activation ID with ABA protection via tag store valid bits, not generation counters. ## 2. Token Hierarchy Changes (`tokens.py`) ### Current Hierarchy ```python Token(target: int) CMToken(Token): offset: int, ctx: int, data: int DyadToken(CMToken): port: Port, gen: int, wide: bool MonadToken(CMToken): inline: bool IRAMWriteToken(CMToken): instructions: tuple[ALUInst | SMInst, ...] SMToken(Token): addr: int, op: MemOp, flags: Optional[int], data: Optional[int], ret: Optional[CMToken] ``` ### Target Hierarchy ```python Token(target: int) CMToken(Token): offset: int, act_id: int, data: int DyadToken(CMToken): port: Port, wide: bool MonadToken(CMToken): inline: bool FrameControlToken(Token): pe: int, act_id: int, op: FrameOp, payload: int PELocalWriteToken(Token): pe: int, act_id: int, region: int, slot: int, data: int SMToken(Token): addr: int, op: MemOp, flags: Optional[int], data: Optional[int], ret: Optional[CMToken] ``` ### Specific Field Changes | Change | Detail | |--------|--------| | `CMToken.ctx` renamed to `CMToken.act_id` | 3-bit activation ID (0-7) | | `DyadToken.gen` removed | ABA protection via tag store valid bit, not generation counters | | `IRAMWriteToken` removed | Replaced by `PELocalWriteToken` with `region=0` | | `FrameControlToken` added | New token type for ALLOC (`op=0`) and FREE (`op=1`). `payload` carries return routing for ALLOC confirmation. | | `PELocalWriteToken` added | Unified IRAM/frame write. `region=0`: IRAM write at `slot` address. `region=1`: frame write at `(act_id, slot)`. | | `FrameOp` enum added | `ALLOC = 0`, `FREE = 1` | ### What Stays the Same - `Token` base class (unchanged) - `SMToken` (unchanged; `ret` field still holds a `CMToken` template) - `Port` enum (unchanged) - `MonadToken.inline` (unchanged) - `DyadToken.wide` (unchanged) ### Notes - `FrameControlToken` and `PELocalWriteToken` do NOT inherit from `CMToken` because they lack `offset` and `data` fields. They inherit directly from `Token`. The `target` field from `Token` serves as the PE destination (equivalent to the `pe` field in the bit layout; one of these is redundant and should be reconciled -- either use `target` as the PE ID or add a separate `pe` field). - `CMToken.act_id` replaces `CMToken.ctx` everywhere. All downstream code referencing `token.ctx` must change to `token.act_id`. ## 3. ISA Changes (`cm_inst.py`) ### Current Instruction Types ```python ALUInst(op: ALUOp, dest_l: Optional[Addr], dest_r: Optional[Addr], const: Optional[int], ctx_mode: int = 0) SMInst(op: MemOp, sm_id: int, const: Optional[int] = None, ret: Optional[Addr] = None, ret_dyadic: bool = False) Addr(a: int, port: Port, pe: Optional[int]) ``` ### Target Instruction Type ```python @dataclass(frozen=True) class Instruction: type: int # 0 = CM (ALU), 1 = SM opcode: ALUOp | MemOp mode: int # 0-7, see mode table wide: bool # 16-bit vs 32-bit frame values fref: int # frame slot base index (0-63) ``` This is a unified dataclass matching the 16-bit hardware instruction word `[type:1][opcode:5][mode:3][wide:1][fref:6]`. ### Mode Enum ```python class Mode(IntEnum): INHERIT_DEST = 0 # frame[fref] = dest, no const INHERIT_CONST_DEST = 1 # frame[fref] = const, frame[fref+1] = dest INHERIT_FANOUT = 2 # frame[fref] = dest1, frame[fref+1] = dest2 INHERIT_CONST_FAN = 3 # frame[fref] = const, frame[fref+1..+2] = dest1, dest2 CHANGE_TAG = 4 # flit 1 from left operand, no const CHANGE_TAG_CONST = 5 # flit 1 from left operand, frame[fref] = const SINK = 6 # write result -> frame[fref], no output SINK_CONST_RMW = 7 # read frame[fref] as const, write result back ``` ### Decode Equations (for PE emulator) ```python output_enable = mode < 4 # modes 0-3 change_tag = mode in (4, 5) # modes 4-5 sink = mode >= 6 # modes 6-7 has_const = mode & 1 # modes 1, 3, 5, 7 has_fanout = mode in (2, 3) # modes 2-3 ``` ### What Happens to Existing Types | Type | Disposition | |------|-------------| | `ALUOp` hierarchy (`ArithOp`, `LogicOp`, `RoutingOp`) | Stays. Opcode values used in `Instruction.opcode` for CM type. | | `MemOp` | Stays. Used in `Instruction.opcode` for SM type. | | `ALUInst` | Removed. Replaced by `Instruction`. | | `SMInst` | Removed. Replaced by `Instruction` with `type=1`. SM-specific parameters (target SM, address, return routing) move to frame slots. | | `Addr` | Stays as assembler concept in `asm/`. Not used in `Instruction` or at emulator level. Destinations are pre-formed flit 1 values in frame slots. | | `ctx_mode` field on `ALUInst` | Removed. Output context is determined by mode (INHERIT uses frame dest, CHANGE_TAG uses left operand). | | `RoutingOp.FREE_CTX` | Renamed to `RoutingOp.FREE_FRAME`. Triggers frame deallocation. | | `is_monadic_alu()` | Stays. Still needed for the PE to determine whether a DyadToken at a monadic instruction bypasses matching. | ### New Addition: EXTRACT_TAG A new `RoutingOp.EXTRACT_TAG` value for capturing runtime act_id + offset as a data value (packed flit 1). The ALU returns a packed 16-bit value encoding `[prefix][port][PE_id][offset][act_id]`. ## 4. PE Emulator Changes (`emu/pe.py`) This is the largest change. The `ProcessingElement` class is substantially rewritten. ### Constructor Changes **Remove:** - `iram: dict[int, ALUInst | SMInst]` parameter type - `ctx_slots: int` parameter - `offsets: int` parameter - `matching_store: list[list[MatchEntry]]` attribute - `gen_counters: list[int]` attribute - `_ctx_slots`, `_offsets` internal attributes **Add:** - `iram: dict[int, Instruction]` parameter type - `frame_count: int = 4` parameter (number of physical frames) - `frame_slots: int = 64` parameter (slots per frame) - `matchable_offsets: int = 8` parameter (dyadic-capable offsets per frame) - `frames: list[list[int]]` attribute -- `[frame_count][frame_slots]` SRAM - `tag_store: dict[int, int]` attribute -- `act_id -> frame_id` mapping (models the 670 lookup; `None` or absent = invalid) - `presence: list[list[bool]]` attribute -- `[frame_count][matchable_offsets]` - `port_store: list[list[Port]]` attribute -- `[frame_count][matchable_offsets]` - `free_frames: list[int]` attribute -- free frame ID pool ### Pipeline Reorder: IFETCH Before MATCH **Current order** (`_process_token`): 1. Classify token type 2. Match (dyadic) or pass through (monadic) 3. Fetch instruction from IRAM 4. Execute (ALU or SM) 5. Emit output **Target order** (`_process_token`): 1. Classify token type; side-path frame control and PE-local writes 2. IFETCH: read instruction from IRAM at `token.offset` 3. Resolve `act_id -> frame_id` via tag store (parallel with IFETCH in HW; sequential in emulator but within same cycle) 4. MATCH/FRAME: use instruction to drive match behaviour - Dyadic + presence set: read stored operand from frame, clear presence - Dyadic + presence clear: write operand to frame, set presence, consume token - Monadic: bypass matching; read constant from frame if `has_const` 5. EXECUTE: ALU or SM token construction 6. OUTPUT: read destination(s) from frame slots, emit tokens ### Side Paths (New) Two new token types are handled before the main pipeline: **`FrameControlToken` handling** (replaces nothing; new capability): ```python def _handle_frame_control(self, token: FrameControlToken) -> None: if token.op == FrameOp.ALLOC: frame_id = self._alloc_frame() self.tag_store[token.act_id] = frame_id # Clear presence bits for the new frame self.presence[frame_id] = [False] * self.matchable_offsets # Optionally emit confirmation token elif token.op == FrameOp.FREE: frame_id = self.tag_store.pop(token.act_id, None) if frame_id is not None: self.free_frames.append(frame_id) self.presence[frame_id] = [False] * self.matchable_offsets ``` **`PELocalWriteToken` handling** (replaces `_handle_iram_write`): ```python def _handle_pe_local_write(self, token: PELocalWriteToken) -> None: if token.region == 0: # IRAM write self.iram[token.slot] = decode_instruction(token.data) elif token.region == 1: # Frame write frame_id = self.tag_store.get(token.act_id) if frame_id is not None: self.frames[frame_id][token.slot] = token.data ``` ### Method-Level Changes | Current Method | Change | |---------------|--------| | `_run()` | Minimal change: add dispatch for `FrameControlToken`, `PELocalWriteToken` | | `_process_token()` | Rewrite: IFETCH before MATCH, mode-driven frame access | | `_handle_iram_write()` | Remove. Replaced by `_handle_pe_local_write()` | | `_match_monadic()` | Remove. Monadic path integrated into MATCH/FRAME stage | | `_match_dyadic()` | Rewrite as `_match_frame()`: uses tag store, presence bits, frame SRAM | | `_fetch()` | Stays (same: `self.iram.get(offset)`) | | `_is_monadic_instruction()` | Stays (needed to decide whether DyadToken bypasses match) | | `_do_emit()` | Rewrite: mode-driven output routing from frame slots | | `_build_and_emit_sm()` | Rewrite: SM parameters from frame slots, not from `SMInst` fields | | `_deliver()` | Stays (unchanged) | | `_output_mode()` | Rewrite: derive from `Instruction.mode` instead of inspecting `ALUInst` fields | | `_make_output_token()` | Rewrite: read pre-formed flit 1 from frame slot, not construct from `Addr` | | New: `_alloc_frame()` | Allocate next free frame_id from `free_frames` pool | | New: `_free_frame()` | Release frame_id back to pool, clear presence | | New: `_handle_frame_control()` | Process ALLOC/FREE tokens | | New: `_handle_pe_local_write()` | Process IRAM/frame writes | | New: `_read_frame_slot()` | Read `frames[frame_id][slot]` | | New: `_write_frame_slot()` | Write `frames[frame_id][slot]` | ### Output Routing: Mode-Driven **Current**: `_output_mode()` inspects `inst.op` (FREE_CTX, GATE, SW*) and `inst.dest_l`/`inst.dest_r` to determine SUPPRESS/SINGLE/DUAL/SWITCH. `_make_output_token()` constructs a `DyadToken` from `Addr` fields. **Target**: the `Instruction.mode` field determines output behaviour: ```python def _do_output(self, inst: Instruction, result: int, bool_out: bool, frame_id: int, left: int) -> None: if inst.mode >= 6: # SINK # Write result back to frame[fref] self.frames[frame_id][inst.fref] = result & 0xFFFF return if inst.opcode == RoutingOp.FREE_FRAME: self._free_frame_by_inst(frame_id) return if inst.opcode == RoutingOp.GATE and not bool_out: return # suppressed if inst.mode in (4, 5): # CHANGE_TAG # flit 1 from left operand (pre-formed destination) flit1 = left else: # INHERIT (modes 0-3) # Read destination from frame if inst.mode & 1: # has_const: dest is at fref+1 flit1 = self.frames[frame_id][inst.fref + 1] else: flit1 = self.frames[frame_id][inst.fref] # Decode flit1 to extract target PE, offset, act_id, port out_token = self._flit1_to_token(flit1, result) self._emit(out_token) if inst.mode in (2, 3): # has_fanout: second destination if inst.mode == 3: # const+fan: dest2 at fref+2 flit1_2 = self.frames[frame_id][inst.fref + 2] else: # fan: dest2 at fref+1 flit1_2 = self.frames[frame_id][inst.fref + 1] out_token_2 = self._flit1_to_token(flit1_2, result) self._emit(out_token_2) ``` **`_flit1_to_token(flit1: int, data: int) -> CMToken`**: decodes a 16-bit pre-formed flit 1 value into a `DyadToken` or `MonadToken` based on its prefix bits. This replaces `_make_output_token()` which constructed tokens from `Addr` fields. ```python def _flit1_to_token(self, flit1: int, data: int) -> CMToken: """Decode a pre-formed flit 1 value into a CMToken.""" bit15 = (flit1 >> 15) & 1 if bit15: # SM token -- should not appear as a CM destination raise ValueError("SM destination in CM output path") bit14 = (flit1 >> 14) & 1 if bit14 == 0: # Dyadic wide: [0][0][port:1][PE:2][offset:8][act_id:3] port = Port((flit1 >> 13) & 1) pe = (flit1 >> 11) & 0x3 offset = (flit1 >> 3) & 0xFF act_id = flit1 & 0x7 return DyadToken(target=pe, offset=offset, act_id=act_id, data=data, port=port, wide=False) bit13 = (flit1 >> 13) & 1 if bit13 == 0: # Monadic normal: [0][1][0][PE:2][offset:8][act_id:3] pe = (flit1 >> 11) & 0x3 offset = (flit1 >> 3) & 0xFF act_id = flit1 & 0x7 return MonadToken(target=pe, offset=offset, act_id=act_id, data=data, inline=False) # Misc bucket: [0][1][1][PE:2][sub:2][...] sub = (flit1 >> 9) & 0x3 if sub == 2: # Monadic inline: [0][1][1][PE:2][10][offset:7][spare:2] pe = (flit1 >> 11) & 0x3 offset = (flit1 >> 2) & 0x7F return MonadToken(target=pe, offset=offset, act_id=0, data=0, inline=True) raise ValueError(f"Unexpected flit1 misc sub={sub} in output path") ``` ### SWITCH Output (Branch/Switch Routing Ops) For SWITCH operations (SWEQ, SWGT, SWGE, SWOF), the output routing uses the same mode-driven frame read for the taken path. The not-taken path emits a monadic inline token. The destination for the not-taken path comes from: - Mode 2/3 (fan-out): the second destination frame slot - The bool_out determines which destination is taken vs not-taken ### SM Operation Changes SM operations (`Instruction.type == 1`) read SM parameters from frame slots instead of from `SMInst` fields: - `frame[fref]`: SM target slot (packed SM_id + address) - `frame[fref+1]`: return routing slot (pre-formed CM token flit 1), for ops that produce results The PE constructs `SMToken` by unpacking frame slot contents: ```python def _build_sm_token(self, inst: Instruction, frame_id: int, left: int, right: int | None) -> SMToken: target_slot = self.frames[frame_id][inst.fref] sm_id = (target_slot >> 14) & 0x3 addr = (target_slot >> 4) & 0x3FF # tier 1: 10-bit addr ret = None if inst.mode & 1: # has return routing in frame[fref+1] ret_flit1 = self.frames[frame_id][inst.fref + 1] ret = self._flit1_to_token(ret_flit1, data=0) return SMToken(target=sm_id, addr=addr, op=inst.opcode, flags=..., data=..., ret=ret) ``` ## 5. PE Config Changes (`emu/types.py`) ### Remove - `MatchEntry` dataclass (entire class) - `PEConfig.ctx_slots` field - `PEConfig.offsets` field - `PEConfig.gen_counters` field ### Change - `PEConfig.iram` type: `dict[int, ALUInst | SMInst]` becomes `dict[int, Instruction]` ### Add ```python @dataclass(frozen=True) class PEConfig: pe_id: int iram: dict[int, Instruction] frame_count: int = 4 frame_slots: int = 64 matchable_offsets: int = 8 initial_frames: Optional[dict[int, dict[int, int]]] = None # act_id -> {slot_index -> value} for pre-loaded frame contents initial_tag_store: Optional[dict[int, int]] = None # act_id -> frame_id mappings pre-loaded at init allowed_pe_routes: Optional[set[int]] = None allowed_sm_routes: Optional[set[int]] = None on_event: EventCallback | None = None ``` `initial_frames` replaces the old pattern of encoding constants and destinations in `ALUInst.const` / `ALUInst.dest_l` / `ALUInst.dest_r`. Frame contents are loaded before simulation starts or via `PELocalWriteToken` during simulation. `initial_tag_store` replaces `gen_counters` for pre-configuring which act_ids are valid at simulation start. ### Keep - `DeferredRead` dataclass (used by SM, not PE) - `SMConfig` dataclass (unchanged) ## 6. ALU Changes (`emu/alu.py`) Minimal changes. The ALU is a pure function that does not know about frames or tokens. ### Specific Changes | Change | Detail | |--------|--------| | `RoutingOp.FREE_CTX` | Rename to `RoutingOp.FREE_FRAME` in the match case | | Add `RoutingOp.EXTRACT_TAG` handling | Returns a packed flit 1 value. The ALU needs PE_id and act_id as additional inputs, or EXTRACT_TAG is handled in the PE before the ALU call. | ### Const Source **Current**: `const` comes from `ALUInst.const` (instruction field). **Target**: `const` comes from a frame slot (`frame[frame_id][inst.fref]` when `has_const`). This is transparent to the ALU -- the PE reads the constant from the frame and passes it to `execute()` the same way. The `execute()` signature stays the same: ```python def execute(op: ALUOp, left: int, right: int | None, const: int | None) -> tuple[int, bool]: ``` ## 7. SM Changes (`emu/sm.py`) Core I-structure semantics are unchanged. Minor adjustments only. ### Return Token Construction **Current**: `_send_result()` uses `dataclasses.replace(return_route, data=data)` to set the data field on the pre-formed return route `CMToken`. **Target**: same pattern, but the return route `CMToken` now has `act_id` instead of `ctx`, and `DyadToken` no longer has a `gen` field. Since the return route is constructed by the PE and embedded in the `SMToken.ret` field, the SM just uses `replace()` as before -- the field name change (`ctx` -> `act_id`) is transparent to the SM's `_send_result` logic because it operates on the token object generically. ### No Other Changes - Cell states (`Presence`), deferred reads, atomics, T0/T1 tiers: all unchanged. - `StructureMemory` constructor, `_run()` loop, all handlers: unchanged. ## 8. Network Changes (`emu/network.py`) ### Token Routing **Current**: `_target_store()` routes `SMToken` to SM, `CMToken` to PE. `IRAMWriteToken` routes to PE (inherits from `CMToken`). **Target**: `_target_store()` must also handle: - `FrameControlToken` -> target PE (route on `token.target` or `token.pe`) - `PELocalWriteToken` -> target PE (route on `token.target` or `token.pe`) Since these new types do not inherit from `CMToken`, add explicit isinstance checks: ```python def _target_store(self, token: Token) -> simpy.Store: if isinstance(token, SMToken): return self.sms[token.target].input_store if isinstance(token, (CMToken, FrameControlToken, PELocalWriteToken)): return self.pes[token.target].input_store raise TypeError(f"Unknown token type: {type(token).__name__}") ``` ### `build_topology` Changes - Pass `frame_count`, `frame_slots`, `matchable_offsets` from `PEConfig` to `ProcessingElement` constructor - Apply `initial_frames` and `initial_tag_store` from `PEConfig` after PE construction - Remove `gen_counters` application - Remove `ctx_slots` and `offsets` parameters from PE constructor call ## 9. Event Changes (`emu/events.py`) ### Modified Events **`Matched`**: replace `ctx: int` with `act_id: int`, add `frame_id: int`: ```python @dataclass(frozen=True) class Matched: time: float component: str left: int right: int act_id: int # was: ctx frame_id: int # new offset: int ``` ### New Events ```python @dataclass(frozen=True) class FrameAllocated: time: float component: str act_id: int frame_id: int @dataclass(frozen=True) class FrameFreed: time: float component: str act_id: int frame_id: int @dataclass(frozen=True) class FrameSlotWritten: time: float component: str frame_id: int slot: int value: int @dataclass(frozen=True) class TokenRejected: time: float component: str token: Token reason: str # e.g. "invalid_act_id", "invalid_iram_page" ``` ### Updated Union ```python SimEvent = ( TokenReceived | Matched | Executed | Emitted | IRAMWritten | CellWritten | DeferredRead | DeferredSatisfied | ResultSent | FrameAllocated | FrameFreed | FrameSlotWritten | TokenRejected ) ``` ### `IRAMWritten` Stays `IRAMWritten` remains for IRAM writes via `PELocalWriteToken(region=0)`. The event semantics are the same; only the source token type changes. ## 10. Monitor/Dfgraph Downstream Impact ### `monitor/snapshot.py` **`PESnapshot` changes:** ```python @dataclass(frozen=True) class PESnapshot: pe_id: int iram: dict[int, Instruction] # was: dict[int, ALUInst | SMInst] frames: tuple[tuple[int, ...], ...] # replaces matching_store tag_store: dict[int, int] # replaces gen_counters (act_id -> frame_id) presence: tuple[tuple[bool, ...], ...] # new: per-frame presence bits port_store: tuple[tuple[int, ...], ...] # new: per-frame port metadata free_frames: tuple[int, ...] # new: free frame pool input_queue: tuple[Token, ...] output_log: tuple[Token, ...] ``` Remove: `matching_store: tuple[tuple[dict, ...], ...]`, `gen_counters: tuple[int, ...]`. **`capture()` function**: rewrite PE snapshot capture to read `pe.frames`, `pe.tag_store`, `pe.presence`, `pe.port_store`, `pe.free_frames` instead of `pe.matching_store` and `pe.gen_counters`. **`SMSnapshot`, `SMCellSnapshot`, `StateSnapshot`**: unchanged. ### `monitor/graph_json.py` **`_serialise_pe_state()`**: replace `matching_store` and `gen_counters` serialisation with frame state serialisation: ```python def _serialise_pe_state(pe_id: int, snapshot: StateSnapshot) -> dict: pe_snap = snapshot.pes.get(pe_id) if not pe_snap: return {} return { "iram": ..., # same as before "frames": [...], # new: per-frame slot contents "tag_store": dict(pe_snap.tag_store), # new "presence": [...], # new "free_frames": list(pe_snap.free_frames), # new "input_queue_depth": len(pe_snap.input_queue), "output_count": len(pe_snap.output_log), } ``` Remove: `matching_store`, `gen_counters` keys from serialised output. **`_serialise_event()`**: update `Matched` event serialisation to use `act_id` and `frame_id` instead of `ctx`. Add serialisation for new event types (`FrameAllocated`, `FrameFreed`, `FrameSlotWritten`, `TokenRejected`). **`_serialise_node()`**: remove `ctx` field from node JSON (or rename to `act_id` if nodes carry activation information). **`graph_to_monitor_json()`**: update `Matched` event overlay handler to use `event.act_id` instead of `event.ctx`. Add overlay handling for new event types. ## 11. Dependency Order Modules should be changed in this order, based on import dependencies. Each step should leave the codebase in a testable state. 1. **`cm_inst.py`** -- no dependencies. Add `Instruction` dataclass, `Mode` enum, `FrameOp` enum. Rename `FREE_CTX` to `FREE_FRAME`. Add `EXTRACT_TAG`. Keep `ALUInst`, `SMInst`, `Addr` temporarily for backward compatibility. 2. **`tokens.py`** -- imports `cm_inst`. Rename `ctx` to `act_id` in `CMToken`. Remove `gen` from `DyadToken`. Add `FrameControlToken`, `PELocalWriteToken`. Remove `IRAMWriteToken`. 3. **`emu/events.py`** -- imports `cm_inst`, `tokens`, `sm_mod`. Update `Matched` event (ctx -> act_id, add frame_id). Add new event types. Update `SimEvent` union. 4. **`emu/types.py`** -- imports `cm_inst`. Remove `MatchEntry`. Update `PEConfig` (remove ctx_slots/offsets/gen_counters, add frame params, change iram type). Keep `DeferredRead`, `SMConfig` unchanged. 5. **`emu/alu.py`** -- imports `cm_inst`. Rename `FREE_CTX` to `FREE_FRAME` in match cases. Add `EXTRACT_TAG` handler if ALU-level. 6. **`emu/pe.py`** -- imports everything above. Full rewrite of `ProcessingElement`: constructor, pipeline, matching, output routing, side paths. This is the critical change. 7. **`emu/sm.py`** -- imports `cm_inst`, `emu/events`, `emu/types`, `sm_mod`, `tokens`. Minimal changes: field name `ctx` -> `act_id` in any token construction (but SM doesn't construct tokens directly; it uses `replace()` on the return route). Essentially no functional change. 8. **`emu/network.py`** -- imports `emu/*`. Update `_target_store()` for new token types. Update `build_topology()` for new PEConfig fields. 9. **`emu/__init__.py`** -- update exports for new event types. 10. **`monitor/snapshot.py`** -- update `PESnapshot`, `capture()`. 11. **`monitor/graph_json.py`** -- update PE state serialisation, event serialisation, event overlay handlers. ## 12. What Stays the Same The following are explicitly unchanged to prevent scope creep: - **SM I-structure semantics** (`emu/sm.py`): cell states, deferred reads, atomic operations, T0/T1 tiers, EXEC. - **ALU pure function model** (`emu/alu.py`): `execute()` signature and dispatch. 16-bit masking. Signed comparison semantics. - **SimPy-based simulation**: `simpy.Environment`, `simpy.Store`, process-per-token model. - **Network topology**: full mesh, route tables, `System.inject()`, `System.send()`, `System.load()`. - **SM cell model** (`sm_mod.py`): `Presence` enum, `SMCell` dataclass. - **`Token` base class**: `target: int` field. - **`SMToken`**: all fields unchanged. `ret` still holds a `CMToken`. - **`Port` enum**: `L = 0`, `R = 1`. - **`MonadToken.inline`**: inline monadic token concept unchanged. - **`DyadToken.wide`**: wide value concept unchanged. - **Event callback mechanism**: `on_event: EventCallback | None` pattern. - **Test infrastructure**: pytest + hypothesis. Strategies will need updating for new types but the framework is unchanged. - **Monitor backend** (`monitor/backend.py`): command/result protocol, threaded architecture. Only the wiring of `PEConfig` during `LoadCmd` changes. - **Monitor REPL** (`monitor/repl.py`): commands stay the same; output formatting adapts to new snapshot structure.