# Emulator Redesign Plan: Frame-Based PE Model

This document describes the changes required in the OR1 emulator code to
match the frame-based PE redesign specified in `design-notes/pe-design.md`
(sections 3-11) and `design-notes/architecture-overview.md` (token format
section). It is scoped to the emulator (`emu/`, `tokens.py`, `cm_inst.py`)
and immediate downstream consumers (`monitor/`). Assembler changes are out
of scope.

## 1. Overview

The current emulator models a context/generation-based matching scheme with
flat `matching_store[ctx][offset]` arrays, generation counters for ABA
protection, and `Addr`-based output routing that constructs tokens at emit
time. The target design replaces this with:

- **Frame-based matching**: a tag store maps `activation_id` to `frame_id`;
  presence/port metadata is per-frame, per-matchable-offset. Operands,
  constants, and destinations all live in frame slots (flat SRAM array).
- **Reversed pipeline**: IFETCH runs before MATCH, so the instruction word
  drives match behaviour.
- **Unified instruction format**: 16-bit `[type:1][opcode:5][mode:3][wide:1][fref:6]`
  replaces the separate `ALUInst`/`SMInst` dataclasses. Mode encodes output
  routing (INHERIT/CHANGE_TAG/SINK) and frame access pattern.
- **Pre-formed destinations**: output flit 1 is a 16-bit value read from a
  frame slot, not constructed from `Addr` fields at emit time.
- **New token types**: `FrameControlToken` (ALLOC/FREE) and
  `PELocalWriteToken` (IRAM write + frame slot write) replace
  `IRAMWriteToken`.
- **`ctx`/`gen` replaced by `act_id`**: 3-bit activation ID with ABA
  protection via tag store valid bits, not generation counters.

## 2. Token Hierarchy Changes (`tokens.py`)

### Current Hierarchy

```python
Token(target: int)
  CMToken(Token):          offset: int, ctx: int, data: int
    DyadToken(CMToken):    port: Port, gen: int, wide: bool
    MonadToken(CMToken):   inline: bool
    IRAMWriteToken(CMToken): instructions: tuple[ALUInst | SMInst, ...]
  SMToken(Token):          addr: int, op: MemOp, flags: Optional[int],
                           data: Optional[int], ret: Optional[CMToken]
```

### Target Hierarchy

```python
Token(target: int)
  CMToken(Token):              offset: int, act_id: int, data: int
    DyadToken(CMToken):        port: Port, wide: bool
    MonadToken(CMToken):       inline: bool
  FrameControlToken(Token):    pe: int, act_id: int, op: FrameOp, payload: int
  PELocalWriteToken(Token):    pe: int, act_id: int, region: int,
                               slot: int, data: int
  SMToken(Token):              addr: int, op: MemOp, flags: Optional[int],
                               data: Optional[int], ret: Optional[CMToken]
```

### Specific Field Changes

| Change | Detail |
|--------|--------|
| `CMToken.ctx` renamed to `CMToken.act_id` | 3-bit activation ID (0-7) |
| `DyadToken.gen` removed | ABA protection via tag store valid bit, not generation counters |
| `IRAMWriteToken` removed | Replaced by `PELocalWriteToken` with `region=0` |
| `FrameControlToken` added | New token type for ALLOC (`op=0`) and FREE (`op=1`). `payload` carries return routing for ALLOC confirmation. |
| `PELocalWriteToken` added | Unified IRAM/frame write. `region=0`: IRAM write at `slot` address. `region=1`: frame write at `(act_id, slot)`. |
| `FrameOp` enum added | `ALLOC = 0`, `FREE = 1` |

### What Stays the Same

- `Token` base class (unchanged)
- `SMToken` (unchanged; `ret` field still holds a `CMToken` template)
- `Port` enum (unchanged)
- `MonadToken.inline` (unchanged)
- `DyadToken.wide` (unchanged)

### Notes

- `FrameControlToken` and `PELocalWriteToken` do NOT inherit from `CMToken`
  because they lack `offset` and `data` fields. They inherit directly from
  `Token`. The `target` field from `Token` serves as the PE destination
  (equivalent to the `pe` field in the bit layout; one of these is
  redundant and should be reconciled -- either use `target` as the PE ID
  or add a separate `pe` field).
- `CMToken.act_id` replaces `CMToken.ctx` everywhere. All downstream code
  referencing `token.ctx` must change to `token.act_id`.

## 3. ISA Changes (`cm_inst.py`)

### Current Instruction Types

```python
ALUInst(op: ALUOp, dest_l: Optional[Addr], dest_r: Optional[Addr],
        const: Optional[int], ctx_mode: int = 0)

SMInst(op: MemOp, sm_id: int, const: Optional[int] = None,
       ret: Optional[Addr] = None, ret_dyadic: bool = False)

Addr(a: int, port: Port, pe: Optional[int])
```

### Target Instruction Type

```python
@dataclass(frozen=True)
class Instruction:
    type: int           # 0 = CM (ALU), 1 = SM
    opcode: ALUOp | MemOp
    mode: int           # 0-7, see mode table
    wide: bool          # 16-bit vs 32-bit frame values
    fref: int           # frame slot base index (0-63)
```

This is a unified dataclass matching the 16-bit hardware instruction word
`[type:1][opcode:5][mode:3][wide:1][fref:6]`.

### Mode Enum

```python
class Mode(IntEnum):
    INHERIT_DEST       = 0  # frame[fref] = dest, no const
    INHERIT_CONST_DEST = 1  # frame[fref] = const, frame[fref+1] = dest
    INHERIT_FANOUT     = 2  # frame[fref] = dest1, frame[fref+1] = dest2
    INHERIT_CONST_FAN  = 3  # frame[fref] = const, frame[fref+1..+2] = dest1, dest2
    CHANGE_TAG         = 4  # flit 1 from left operand, no const
    CHANGE_TAG_CONST   = 5  # flit 1 from left operand, frame[fref] = const
    SINK               = 6  # write result -> frame[fref], no output
    SINK_CONST_RMW     = 7  # read frame[fref] as const, write result back
```

### Decode Equations (for PE emulator)

```python
output_enable = mode < 4        # modes 0-3
change_tag    = mode in (4, 5)  # modes 4-5
sink          = mode >= 6       # modes 6-7
has_const     = mode & 1        # modes 1, 3, 5, 7
has_fanout    = mode in (2, 3)  # modes 2-3
```

### What Happens to Existing Types

| Type | Disposition |
|------|-------------|
| `ALUOp` hierarchy (`ArithOp`, `LogicOp`, `RoutingOp`) | Stays. Opcode values used in `Instruction.opcode` for CM type. |
| `MemOp` | Stays. Used in `Instruction.opcode` for SM type. |
| `ALUInst` | Removed. Replaced by `Instruction`. |
| `SMInst` | Removed. Replaced by `Instruction` with `type=1`. SM-specific parameters (target SM, address, return routing) move to frame slots. |
| `Addr` | Stays as assembler concept in `asm/`. Not used in `Instruction` or at emulator level. Destinations are pre-formed flit 1 values in frame slots. |
| `ctx_mode` field on `ALUInst` | Removed. Output context is determined by mode (INHERIT uses frame dest, CHANGE_TAG uses left operand). |
| `RoutingOp.FREE_CTX` | Renamed to `RoutingOp.FREE_FRAME`. Triggers frame deallocation. |
| `is_monadic_alu()` | Stays. Still needed for the PE to determine whether a DyadToken at a monadic instruction bypasses matching. |

### New Addition: EXTRACT_TAG

A new `RoutingOp.EXTRACT_TAG` value for capturing runtime act_id + offset
as a data value (packed flit 1). The ALU returns a packed 16-bit value
encoding `[prefix][port][PE_id][offset][act_id]`.

## 4. PE Emulator Changes (`emu/pe.py`)

This is the largest change. The `ProcessingElement` class is substantially
rewritten.

### Constructor Changes

**Remove:**
- `iram: dict[int, ALUInst | SMInst]` parameter type
- `ctx_slots: int` parameter
- `offsets: int` parameter
- `matching_store: list[list[MatchEntry]]` attribute
- `gen_counters: list[int]` attribute
- `_ctx_slots`, `_offsets` internal attributes

**Add:**
- `iram: dict[int, Instruction]` parameter type
- `frame_count: int = 4` parameter (number of physical frames)
- `frame_slots: int = 64` parameter (slots per frame)
- `matchable_offsets: int = 8` parameter (dyadic-capable offsets per frame)
- `frames: list[list[int]]` attribute -- `[frame_count][frame_slots]` SRAM
- `tag_store: dict[int, int]` attribute -- `act_id -> frame_id` mapping
  (models the 670 lookup; `None` or absent = invalid)
- `presence: list[list[bool]]` attribute -- `[frame_count][matchable_offsets]`
- `port_store: list[list[Port]]` attribute -- `[frame_count][matchable_offsets]`
- `free_frames: list[int]` attribute -- free frame ID pool

### Pipeline Reorder: IFETCH Before MATCH

**Current order** (`_process_token`):
1. Classify token type
2. Match (dyadic) or pass through (monadic)
3. Fetch instruction from IRAM
4. Execute (ALU or SM)
5. Emit output

**Target order** (`_process_token`):
1. Classify token type; side-path frame control and PE-local writes
2. IFETCH: read instruction from IRAM at `token.offset`
3. Resolve `act_id -> frame_id` via tag store (parallel with IFETCH in HW;
   sequential in emulator but within same cycle)
4. MATCH/FRAME: use instruction to drive match behaviour
   - Dyadic + presence set: read stored operand from frame, clear presence
   - Dyadic + presence clear: write operand to frame, set presence, consume token
   - Monadic: bypass matching; read constant from frame if `has_const`
5. EXECUTE: ALU or SM token construction
6. OUTPUT: read destination(s) from frame slots, emit tokens

### Side Paths (New)

Two new token types are handled before the main pipeline:

**`FrameControlToken` handling** (replaces nothing; new capability):
```python
def _handle_frame_control(self, token: FrameControlToken) -> None:
    if token.op == FrameOp.ALLOC:
        frame_id = self._alloc_frame()
        self.tag_store[token.act_id] = frame_id
        # Clear presence bits for the new frame
        self.presence[frame_id] = [False] * self.matchable_offsets
        # Optionally emit confirmation token
    elif token.op == FrameOp.FREE:
        frame_id = self.tag_store.pop(token.act_id, None)
        if frame_id is not None:
            self.free_frames.append(frame_id)
            self.presence[frame_id] = [False] * self.matchable_offsets
```

**`PELocalWriteToken` handling** (replaces `_handle_iram_write`):
```python
def _handle_pe_local_write(self, token: PELocalWriteToken) -> None:
    if token.region == 0:  # IRAM write
        self.iram[token.slot] = decode_instruction(token.data)
    elif token.region == 1:  # Frame write
        frame_id = self.tag_store.get(token.act_id)
        if frame_id is not None:
            self.frames[frame_id][token.slot] = token.data
```

### Method-Level Changes

| Current Method | Change |
|---------------|--------|
| `_run()` | Minimal change: add dispatch for `FrameControlToken`, `PELocalWriteToken` |
| `_process_token()` | Rewrite: IFETCH before MATCH, mode-driven frame access |
| `_handle_iram_write()` | Remove. Replaced by `_handle_pe_local_write()` |
| `_match_monadic()` | Remove. Monadic path integrated into MATCH/FRAME stage |
| `_match_dyadic()` | Rewrite as `_match_frame()`: uses tag store, presence bits, frame SRAM |
| `_fetch()` | Stays (same: `self.iram.get(offset)`) |
| `_is_monadic_instruction()` | Stays (needed to decide whether DyadToken bypasses match) |
| `_do_emit()` | Rewrite: mode-driven output routing from frame slots |
| `_build_and_emit_sm()` | Rewrite: SM parameters from frame slots, not from `SMInst` fields |
| `_deliver()` | Stays (unchanged) |
| `_output_mode()` | Rewrite: derive from `Instruction.mode` instead of inspecting `ALUInst` fields |
| `_make_output_token()` | Rewrite: read pre-formed flit 1 from frame slot, not construct from `Addr` |
| New: `_alloc_frame()` | Allocate next free frame_id from `free_frames` pool |
| New: `_free_frame()` | Release frame_id back to pool, clear presence |
| New: `_handle_frame_control()` | Process ALLOC/FREE tokens |
| New: `_handle_pe_local_write()` | Process IRAM/frame writes |
| New: `_read_frame_slot()` | Read `frames[frame_id][slot]` |
| New: `_write_frame_slot()` | Write `frames[frame_id][slot]` |

### Output Routing: Mode-Driven

**Current**: `_output_mode()` inspects `inst.op` (FREE_CTX, GATE, SW*) and
`inst.dest_l`/`inst.dest_r` to determine SUPPRESS/SINGLE/DUAL/SWITCH.
`_make_output_token()` constructs a `DyadToken` from `Addr` fields.

**Target**: the `Instruction.mode` field determines output behaviour:

```python
def _do_output(self, inst: Instruction, result: int, bool_out: bool,
               frame_id: int, left: int) -> None:
    if inst.mode >= 6:  # SINK
        # Write result back to frame[fref]
        self.frames[frame_id][inst.fref] = result & 0xFFFF
        return

    if inst.opcode == RoutingOp.FREE_FRAME:
        self._free_frame_by_inst(frame_id)
        return

    if inst.opcode == RoutingOp.GATE and not bool_out:
        return  # suppressed

    if inst.mode in (4, 5):  # CHANGE_TAG
        # flit 1 from left operand (pre-formed destination)
        flit1 = left
    else:  # INHERIT (modes 0-3)
        # Read destination from frame
        if inst.mode & 1:  # has_const: dest is at fref+1
            flit1 = self.frames[frame_id][inst.fref + 1]
        else:
            flit1 = self.frames[frame_id][inst.fref]

    # Decode flit1 to extract target PE, offset, act_id, port
    out_token = self._flit1_to_token(flit1, result)
    self._emit(out_token)

    if inst.mode in (2, 3):  # has_fanout: second destination
        if inst.mode == 3:  # const+fan: dest2 at fref+2
            flit1_2 = self.frames[frame_id][inst.fref + 2]
        else:  # fan: dest2 at fref+1
            flit1_2 = self.frames[frame_id][inst.fref + 1]
        out_token_2 = self._flit1_to_token(flit1_2, result)
        self._emit(out_token_2)
```

**`_flit1_to_token(flit1: int, data: int) -> CMToken`**: decodes a 16-bit
pre-formed flit 1 value into a `DyadToken` or `MonadToken` based on its
prefix bits. This replaces `_make_output_token()` which constructed tokens
from `Addr` fields.

```python
def _flit1_to_token(self, flit1: int, data: int) -> CMToken:
    """Decode a pre-formed flit 1 value into a CMToken."""
    bit15 = (flit1 >> 15) & 1
    if bit15:
        # SM token -- should not appear as a CM destination
        raise ValueError("SM destination in CM output path")

    bit14 = (flit1 >> 14) & 1
    if bit14 == 0:
        # Dyadic wide: [0][0][port:1][PE:2][offset:8][act_id:3]
        port = Port((flit1 >> 13) & 1)
        pe = (flit1 >> 11) & 0x3
        offset = (flit1 >> 3) & 0xFF
        act_id = flit1 & 0x7
        return DyadToken(target=pe, offset=offset, act_id=act_id,
                         data=data, port=port, wide=False)

    bit13 = (flit1 >> 13) & 1
    if bit13 == 0:
        # Monadic normal: [0][1][0][PE:2][offset:8][act_id:3]
        pe = (flit1 >> 11) & 0x3
        offset = (flit1 >> 3) & 0xFF
        act_id = flit1 & 0x7
        return MonadToken(target=pe, offset=offset, act_id=act_id,
                          data=data, inline=False)

    # Misc bucket: [0][1][1][PE:2][sub:2][...]
    sub = (flit1 >> 9) & 0x3
    if sub == 2:
        # Monadic inline: [0][1][1][PE:2][10][offset:7][spare:2]
        pe = (flit1 >> 11) & 0x3
        offset = (flit1 >> 2) & 0x7F
        return MonadToken(target=pe, offset=offset, act_id=0,
                          data=0, inline=True)

    raise ValueError(f"Unexpected flit1 misc sub={sub} in output path")
```

### SWITCH Output (Branch/Switch Routing Ops)

For SWITCH operations (SWEQ, SWGT, SWGE, SWOF), the output routing uses
the same mode-driven frame read for the taken path. The not-taken path
emits a monadic inline token. The destination for the not-taken path comes
from:
- Mode 2/3 (fan-out): the second destination frame slot
- The bool_out determines which destination is taken vs not-taken

### SM Operation Changes

SM operations (`Instruction.type == 1`) read SM parameters from frame slots
instead of from `SMInst` fields:

- `frame[fref]`: SM target slot (packed SM_id + address)
- `frame[fref+1]`: return routing slot (pre-formed CM token flit 1), for
  ops that produce results

The PE constructs `SMToken` by unpacking frame slot contents:
```python
def _build_sm_token(self, inst: Instruction, frame_id: int,
                    left: int, right: int | None) -> SMToken:
    target_slot = self.frames[frame_id][inst.fref]
    sm_id = (target_slot >> 14) & 0x3
    addr = (target_slot >> 4) & 0x3FF  # tier 1: 10-bit addr

    ret = None
    if inst.mode & 1:  # has return routing in frame[fref+1]
        ret_flit1 = self.frames[frame_id][inst.fref + 1]
        ret = self._flit1_to_token(ret_flit1, data=0)

    return SMToken(target=sm_id, addr=addr, op=inst.opcode,
                   flags=..., data=..., ret=ret)
```

## 5. PE Config Changes (`emu/types.py`)

### Remove

- `MatchEntry` dataclass (entire class)
- `PEConfig.ctx_slots` field
- `PEConfig.offsets` field
- `PEConfig.gen_counters` field

### Change

- `PEConfig.iram` type: `dict[int, ALUInst | SMInst]` becomes
  `dict[int, Instruction]`

### Add

```python
@dataclass(frozen=True)
class PEConfig:
    pe_id: int
    iram: dict[int, Instruction]
    frame_count: int = 4
    frame_slots: int = 64
    matchable_offsets: int = 8
    initial_frames: Optional[dict[int, dict[int, int]]] = None
    # act_id -> {slot_index -> value} for pre-loaded frame contents
    initial_tag_store: Optional[dict[int, int]] = None
    # act_id -> frame_id mappings pre-loaded at init
    allowed_pe_routes: Optional[set[int]] = None
    allowed_sm_routes: Optional[set[int]] = None
    on_event: EventCallback | None = None
```

`initial_frames` replaces the old pattern of encoding constants and
destinations in `ALUInst.const` / `ALUInst.dest_l` / `ALUInst.dest_r`.
Frame contents are loaded before simulation starts or via
`PELocalWriteToken` during simulation.

`initial_tag_store` replaces `gen_counters` for pre-configuring which
act_ids are valid at simulation start.

### Keep

- `DeferredRead` dataclass (used by SM, not PE)
- `SMConfig` dataclass (unchanged)

## 6. ALU Changes (`emu/alu.py`)

Minimal changes. The ALU is a pure function that does not know about frames
or tokens.

### Specific Changes

| Change | Detail |
|--------|--------|
| `RoutingOp.FREE_CTX` | Rename to `RoutingOp.FREE_FRAME` in the match case |
| Add `RoutingOp.EXTRACT_TAG` handling | Returns a packed flit 1 value. The ALU needs PE_id and act_id as additional inputs, or EXTRACT_TAG is handled in the PE before the ALU call. |

### Const Source

**Current**: `const` comes from `ALUInst.const` (instruction field).
**Target**: `const` comes from a frame slot (`frame[frame_id][inst.fref]`
when `has_const`). This is transparent to the ALU -- the PE reads the
constant from the frame and passes it to `execute()` the same way.

The `execute()` signature stays the same:
```python
def execute(op: ALUOp, left: int, right: int | None, const: int | None) -> tuple[int, bool]:
```

## 7. SM Changes (`emu/sm.py`)

Core I-structure semantics are unchanged. Minor adjustments only.

### Return Token Construction

**Current**: `_send_result()` uses `dataclasses.replace(return_route, data=data)`
to set the data field on the pre-formed return route `CMToken`.

**Target**: same pattern, but the return route `CMToken` now has `act_id`
instead of `ctx`, and `DyadToken` no longer has a `gen` field. Since the
return route is constructed by the PE and embedded in the `SMToken.ret`
field, the SM just uses `replace()` as before -- the field name change
(`ctx` -> `act_id`) is transparent to the SM's `_send_result` logic
because it operates on the token object generically.

### No Other Changes

- Cell states (`Presence`), deferred reads, atomics, T0/T1 tiers: all
  unchanged.
- `StructureMemory` constructor, `_run()` loop, all handlers: unchanged.

## 8. Network Changes (`emu/network.py`)

### Token Routing

**Current**: `_target_store()` routes `SMToken` to SM, `CMToken` to PE.
`IRAMWriteToken` routes to PE (inherits from `CMToken`).

**Target**: `_target_store()` must also handle:
- `FrameControlToken` -> target PE (route on `token.target` or `token.pe`)
- `PELocalWriteToken` -> target PE (route on `token.target` or `token.pe`)

Since these new types do not inherit from `CMToken`, add explicit isinstance
checks:

```python
def _target_store(self, token: Token) -> simpy.Store:
    if isinstance(token, SMToken):
        return self.sms[token.target].input_store
    if isinstance(token, (CMToken, FrameControlToken, PELocalWriteToken)):
        return self.pes[token.target].input_store
    raise TypeError(f"Unknown token type: {type(token).__name__}")
```

### `build_topology` Changes

- Pass `frame_count`, `frame_slots`, `matchable_offsets` from `PEConfig`
  to `ProcessingElement` constructor
- Apply `initial_frames` and `initial_tag_store` from `PEConfig` after PE
  construction
- Remove `gen_counters` application
- Remove `ctx_slots` and `offsets` parameters from PE constructor call

## 9. Event Changes (`emu/events.py`)

### Modified Events

**`Matched`**: replace `ctx: int` with `act_id: int`, add `frame_id: int`:

```python
@dataclass(frozen=True)
class Matched:
    time: float
    component: str
    left: int
    right: int
    act_id: int     # was: ctx
    frame_id: int   # new
    offset: int
```

### New Events

```python
@dataclass(frozen=True)
class FrameAllocated:
    time: float
    component: str
    act_id: int
    frame_id: int

@dataclass(frozen=True)
class FrameFreed:
    time: float
    component: str
    act_id: int
    frame_id: int

@dataclass(frozen=True)
class FrameSlotWritten:
    time: float
    component: str
    frame_id: int
    slot: int
    value: int

@dataclass(frozen=True)
class TokenRejected:
    time: float
    component: str
    token: Token
    reason: str  # e.g. "invalid_act_id", "invalid_iram_page"
```

### Updated Union

```python
SimEvent = (
    TokenReceived | Matched | Executed | Emitted | IRAMWritten
    | CellWritten | DeferredRead | DeferredSatisfied | ResultSent
    | FrameAllocated | FrameFreed | FrameSlotWritten | TokenRejected
)
```

### `IRAMWritten` Stays

`IRAMWritten` remains for IRAM writes via `PELocalWriteToken(region=0)`.
The event semantics are the same; only the source token type changes.

## 10. Monitor/Dfgraph Downstream Impact

### `monitor/snapshot.py`

**`PESnapshot` changes:**

```python
@dataclass(frozen=True)
class PESnapshot:
    pe_id: int
    iram: dict[int, Instruction]          # was: dict[int, ALUInst | SMInst]
    frames: tuple[tuple[int, ...], ...]   # replaces matching_store
    tag_store: dict[int, int]             # replaces gen_counters (act_id -> frame_id)
    presence: tuple[tuple[bool, ...], ...]  # new: per-frame presence bits
    port_store: tuple[tuple[int, ...], ...]  # new: per-frame port metadata
    free_frames: tuple[int, ...]          # new: free frame pool
    input_queue: tuple[Token, ...]
    output_log: tuple[Token, ...]
```

Remove: `matching_store: tuple[tuple[dict, ...], ...]`,
`gen_counters: tuple[int, ...]`.

**`capture()` function**: rewrite PE snapshot capture to read `pe.frames`,
`pe.tag_store`, `pe.presence`, `pe.port_store`, `pe.free_frames` instead of
`pe.matching_store` and `pe.gen_counters`.

**`SMSnapshot`, `SMCellSnapshot`, `StateSnapshot`**: unchanged.

### `monitor/graph_json.py`

**`_serialise_pe_state()`**: replace `matching_store` and `gen_counters`
serialisation with frame state serialisation:

```python
def _serialise_pe_state(pe_id: int, snapshot: StateSnapshot) -> dict:
    pe_snap = snapshot.pes.get(pe_id)
    if not pe_snap:
        return {}
    return {
        "iram": ...,  # same as before
        "frames": [...],  # new: per-frame slot contents
        "tag_store": dict(pe_snap.tag_store),  # new
        "presence": [...],  # new
        "free_frames": list(pe_snap.free_frames),  # new
        "input_queue_depth": len(pe_snap.input_queue),
        "output_count": len(pe_snap.output_log),
    }
```

Remove: `matching_store`, `gen_counters` keys from serialised output.

**`_serialise_event()`**: update `Matched` event serialisation to use
`act_id` and `frame_id` instead of `ctx`. Add serialisation for new event
types (`FrameAllocated`, `FrameFreed`, `FrameSlotWritten`, `TokenRejected`).

**`_serialise_node()`**: remove `ctx` field from node JSON (or rename to
`act_id` if nodes carry activation information).

**`graph_to_monitor_json()`**: update `Matched` event overlay handler to
use `event.act_id` instead of `event.ctx`. Add overlay handling for new
event types.

## 11. Dependency Order

Modules should be changed in this order, based on import dependencies.
Each step should leave the codebase in a testable state.

1. **`cm_inst.py`** -- no dependencies. Add `Instruction` dataclass,
   `Mode` enum, `FrameOp` enum. Rename `FREE_CTX` to `FREE_FRAME`. Add
   `EXTRACT_TAG`. Keep `ALUInst`, `SMInst`, `Addr` temporarily for
   backward compatibility.

2. **`tokens.py`** -- imports `cm_inst`. Rename `ctx` to `act_id` in
   `CMToken`. Remove `gen` from `DyadToken`. Add `FrameControlToken`,
   `PELocalWriteToken`. Remove `IRAMWriteToken`.

3. **`emu/events.py`** -- imports `cm_inst`, `tokens`, `sm_mod`. Update
   `Matched` event (ctx -> act_id, add frame_id). Add new event types.
   Update `SimEvent` union.

4. **`emu/types.py`** -- imports `cm_inst`. Remove `MatchEntry`. Update
   `PEConfig` (remove ctx_slots/offsets/gen_counters, add frame params,
   change iram type). Keep `DeferredRead`, `SMConfig` unchanged.

5. **`emu/alu.py`** -- imports `cm_inst`. Rename `FREE_CTX` to
   `FREE_FRAME` in match cases. Add `EXTRACT_TAG` handler if ALU-level.

6. **`emu/pe.py`** -- imports everything above. Full rewrite of
   `ProcessingElement`: constructor, pipeline, matching, output routing,
   side paths. This is the critical change.

7. **`emu/sm.py`** -- imports `cm_inst`, `emu/events`, `emu/types`,
   `sm_mod`, `tokens`. Minimal changes: field name `ctx` -> `act_id` in
   any token construction (but SM doesn't construct tokens directly; it
   uses `replace()` on the return route). Essentially no functional change.

8. **`emu/network.py`** -- imports `emu/*`. Update `_target_store()` for
   new token types. Update `build_topology()` for new PEConfig fields.

9. **`emu/__init__.py`** -- update exports for new event types.

10. **`monitor/snapshot.py`** -- update `PESnapshot`, `capture()`.

11. **`monitor/graph_json.py`** -- update PE state serialisation, event
    serialisation, event overlay handlers.

## 12. What Stays the Same

The following are explicitly unchanged to prevent scope creep:

- **SM I-structure semantics** (`emu/sm.py`): cell states, deferred reads,
  atomic operations, T0/T1 tiers, EXEC.
- **ALU pure function model** (`emu/alu.py`): `execute()` signature and
  dispatch. 16-bit masking. Signed comparison semantics.
- **SimPy-based simulation**: `simpy.Environment`, `simpy.Store`,
  process-per-token model.
- **Network topology**: full mesh, route tables, `System.inject()`,
  `System.send()`, `System.load()`.
- **SM cell model** (`sm_mod.py`): `Presence` enum, `SMCell` dataclass.
- **`Token` base class**: `target: int` field.
- **`SMToken`**: all fields unchanged. `ret` still holds a `CMToken`.
- **`Port` enum**: `L = 0`, `R = 1`.
- **`MonadToken.inline`**: inline monadic token concept unchanged.
- **`DyadToken.wide`**: wide value concept unchanged.
- **Event callback mechanism**: `on_event: EventCallback | None` pattern.
- **Test infrastructure**: pytest + hypothesis. Strategies will need
  updating for new types but the framework is unchanged.
- **Monitor backend** (`monitor/backend.py`): command/result protocol,
  threaded architecture. Only the wiring of `PEConfig` during `LoadCmd`
  changes.
- **Monitor REPL** (`monitor/repl.py`): commands stay the same; output
  formatting adapts to new snapshot structure.