Dynamic Dataflow CPU: I/O & Bootstrap#

Covers IO as memory-mapped structure or raw memory, the bootstrap sequence via SM00 EXEC, and the path from microcontroller-assisted bring-up to self-hosted boot.

See architecture-overview.md for module taxonomy and token format. See sm-design.md for SM interface protocol, EXEC operation, and SM00 bootstrap details. See bus-architecture-and-width-decoupling.md for bus width rationale and IRAM write token format.

IO Model: Memory-Mapped raw or structure memory#

How It Works#

An SM (typically SM00 at v0) maps IO devices into a reserved address range. From the CM's perspective, reading from a UART is identical to reading from any other SM cell:

READ from IO-mapped address:
  1. CM sends SM READ token to SM00, IO-mapped address range
  2. If IO device has data: cell is FULL, SM returns result immediately
  3. If IO device has no data: cell is EMPTY, SM defers read
  4. When IO device receives data: cell transitions to FULL,
     deferred read is satisfied, result token emitted

WRITE to IO-mapped address:
  1. CM sends SM WRITE token to SM00, IO-mapped address range
  2. SM forwards write data to the IO device

The deferred read/write aspects of I-structure semantics provide natural interrupt-free IO. The CM does not poll or busy-wait. It issues a READ, and the result arrives when data is available. This is the dataflow-native IO model: external events are just writes to SM cells that satisfy deferred reads.

IO Address Mapping#

IO devices occupy a contiguous range within SM00's address space. The mapping is configured at bootstrap (or hardwired for v0):

Example SM00 address map (indicative, not final):
  0x000 - 0x0FF:  I-structure cells (tier 1, presence-tracked)
  0x100 - 0x1FF:  IO devices (tier 0 or tier 1 memory semantics)
  0x200 - 0x3FF:  Bootstrap ROM (tier 0, read-only)

Tier boundary direction is a hardware design decision, not yet finalised. The current design intent places I-structure cells at low addresses (below the boundary) where they are directly addressable by 2-flit SM tokens and reachable by atomic operations (rd_inc, rd_dec, cas — 5-bit opcode tier, 256-cell range). T0 raw storage sits above the boundary; it does not need atomics and can use extended addressing when necessary.

The emulator currently uses tier_boundary=256 with T1 below and T0 at/above, but the exact mapping may change during physical build. See sm-design.md for the full tier model and encoding details.

Within the IO range, specific addresses map to device registers:

Example IO register map (v0, UART only):
  0x000:  UART TX data (write)
  0x001:  UART RX data (read, defers if no byte available)
  0x002:  UART status (read, raw — no deferral)
  0x003:  UART config (write)

Unsolicited Events#

When an external event occurs (e.g., UART receives a byte), the IO hardware writes the data into the corresponding SM cell. If a deferred read is pending on that cell, the SM satisfies it immediately and the requesting CM receives a result token with the data.

If no deferred read is pending, the data sits in the cell (FULL state) until a CM reads it. This is natural flow control: the IO device produces, the CM consumes, with I-structure semantics as the synchronization primitive.

Implications:

IO devices are write sources into SM cells
Backpressure: if the CM hasn't issued a READ, the IO device can still write (cell becomes FULL). Subsequent writes before a READ overwrite; the SM cell is a 1-deep buffer. For deeper buffering, the IO hardware can use a range of cells as a circular buffer.
- Possible a read_len instruction is warranted, though would require either very restricted address space, additional setup, or a 3-flit token.
Standard SMs never spontaneously generate tokens; they only satisfy pending deferred reads. The IO device's write triggers the deferred read satisfaction, which is when the result token is emitted.

Spontaneous Token Generation (interrupts, sort of)#

The deferred-read model works well when the CM knows in advance that it wants IO data (it issues a READ, the read defers, the IO device eventually satisfies it). But some IO patterns are genuinely unsolicited, an interrupt-like event where external hardware needs to inject a token into the network without a prior READ request.

SM00 could be specialized with a dispatch register (or small dispatch table) that maps IO events to pre-formed token templates. When an IO device signals an event and no deferred read is pending:

IO device asserts an event line (directly wired or via address decoder)
SM00 reads the dispatch register for that event source
The dispatch register contains a pre-formed token template (flit 1 routing + flit 2 data source), similar to the SM return routing mechanism (see sm-and-token-format-discussion.md)
SM00 emits the token onto the network spontaneously

The dispatch register is loaded at bootstrap (or via SM WRITE to a reserved address range). It tells SM00 "when UART RX fires and nobody is waiting, send this token to this PE at this offset."

This makes SM00 the only SM that can act as a token source rather than purely a token responder. All other SMs remain reactive.

Hardware cost: one additional register file (or a few reserved SM cells reinterpreted as dispatch entries) + event detection logic (edge detect on IO device status lines) + arbitration with normal SM operations. Estimated: 3-5 TTL chips beyond the base SM.

Alternative: always-pending deferred read. The compiler ensures a READ is always pending on the IO cell. As soon as one is satisfied, the handler re-issues a READ immediately (feedback loop in the dataflow graph). This avoids SM00 specialization entirely but has a significant resource cost: the current SM design supports only one deferred read at a time per SM instance. An always-pending IO read permanently occupies SM00's single deferred read slot, blocking all other deferred reads on SM00 (including other IO cells and any I-structure operations). If SM00 also serves as bootstrap SM and T0 shared storage, this is a real constraint.

The always-pending pattern works for a single IO source on a dedicated SM, but scales poorly. Multiple IO sources (UART + SPI + timer) would each need their own SM instance just to have a deferred read slot, which is wasteful.

The spontaneous emission model (dispatch registers) avoids this entirely. No deferred read slot consumed, SM00's normal memory operations remain unblocked. This tips the balance toward SM00 specialization for any system with more than trivial IO.

Peripheral controller with batch notification. A smarter peripheral controller (external hardware on SM00's address bus) manages its own buffering in a reserved cell range, similar to DMA/USART on STM32. The controller writes incoming data to a circular buffer of SM cells, tracks a write pointer internally, and writes a status/notification cell only at thresholds (half-complete, complete). The dataflow graph keeps one deferred read pending on the notification cell, processes a batch when it fires, and re-issues the read. This amortizes the single-deferred-read cost across many IO events and keeps SM00's deferred read slot occupied only during the inter-batch interval, not per-byte. Still consumes the slot, but the duty cycle is much lower.

Multi-slot deferred reads. The single-slot constraint that makes always-pending problematic could also be addressed by expanding the SM's deferred read storage to 2-4 entries using a small CAM. See sm-design.md "Multi-Slot Deferred Read CAM" section. Even 2 slots (one for IO, one for normal I-structure) would resolve the resource conflict without SM00 specialization. A 4-entry CAM covers multiple IO sources simultaneously. This is architecturally the cleanest option: no special cases, no spontaneous emission, just a slightly larger deferred read store that benefits all SMs uniformly.

For v0 with a single UART at low baud rates, the always-pending pattern with a single deferred read slot is sufficient. The multi-slot CAM, peripheral controller, and/or spontaneous emission models are refinements for systems with multiple IO sources or higher throughput requirements.

Hardware#

IO hardware sits on SM00's internal data bus alongside the SRAM banks. The address decoder routes IO-range addresses to IO device registers instead of SRAM:

SM00 Internal Bus
       |
  [Address Decoder]
       |
       +---> addr < tier_boundary? --> [SRAM Banks] (I-structure cells)
       |
       +---> addr >= tier_boundary?
               |
               +---> IO range? --> [IO Device Registers]
               |                       |
               |                       +---> [UART chip (6850/16550/etc.)]
               |                       +---> [future: SPI, GPIO, timer]
               |
               +---> ROM range? --> [Bootstrap ROM]
               |
               +---> else --> [T0 raw SRAM]

The IO device registers behave like SM cells from the SM controller's perspective: they have presence state, support READ/WRITE, and can satisfy deferred reads. The difference is that the data source is an external device rather than SRAM.

Estimated additional hardware for IO: ~8-12 TTL chips (address decode, IO device interface, presence state for IO cells) + UART chip.

IRAM Writes#

IRAM writes use CM misc-bucket tokens (prefix 011+01). They are addressed to a specific PE and carry instruction word data. See bus-architecture-and-width-decoupling.md for the IRAM write token format and valid-bit protection protocol.

IRAM writes can originate from:

An external microcontroller (development/prototyping)
SM00's EXEC sequencer during bootstrap (reading pre-formed IRAM write tokens from ROM)
A CM running a loader program (runtime code loading)

The network routes IRAM writes like any other CM token — bit[15]=0, PE_id in the appropriate flit 1 position. The target PE recognizes the 011+01 prefix and routes the token to the instruction memory write port.

Bootstrap Sequence#

Self-Hosted Bootstrap (via SM00 EXEC)#

On system reset:

SM00 is wired to the reset signal
SM00's sequencer triggers EXEC on a predetermined ROM base address
The ROM region contains pre-formed tokens stored as 2-cell entries:
- IRAM write tokens (prefix 011+01) to load PE instruction memories
- Seed tokens (dyadic wide, monadic normal) to start execution
SM00 reads each 2-cell entry and emits it as a 2-flit token on the bus
PEs receive IRAM writes and load their instruction memories
Seed tokens fire and execution begins

The program image in ROM is a flat sequence of pre-formed token pairs. The compiler outputs this format directly — each entry is (flit 1, flit 2) as stored in two consecutive SM cells. SM00's EXEC sequencer is just an address counter with a limit comparator; it does not interpret the tokens it emits.

Development / Early Prototyping#

For Phase 0-2, an external microcontroller (RP2040, ESP32) acts as the bootstrap source. It is NOT part of the architecture — it is a test fixture.

The microcontroller:

Formats IRAM write tokens (prefix 011+01, 2 flits per instruction)
Injects flits into the 16-bit bus
Writes instruction words to each PE's instruction memory
Optionally writes initial SM contents via SM tokens (bit[15]=1)
Injects seed tokens to start execution
Releases the bus (goes high-impedance or disconnects)

This lets PE and SM hardware be tested without bootstrap ROM or EXEC sequencer existing. The microcontroller is the bootstrap, the debug interface, and the test harness all in one.

Routing During Bootstrap#

During bootstrap, routing tables are not yet configured. The network uses fixed-address default routing (see network-and-communication.md):

Each PE has a unique ID (EEPROM / DIP switches)
Routing nodes forward by destination ID without consulting tables
At v0 scale (shared bus), this is trivially true; everything sees everything
SM00's tokens reach all PEs via default routing since SM00 is on the shared bus

Seed Token Injection#

The bootstrap ROM includes seed tokens after the IRAM write tokens. These are standard CM tokens addressed to the entry point(s) of the loaded program. SM00's EXEC sequencer emits them in order after the IRAM writes.

Open Design Questions#

IO address range size — how many cells reserved for IO in SM00? Depends on device count and register depth per device.
IO cell buffering depth — single cell (1-deep) per IO register, or a range of cells for circular buffering? 1-deep is simplest; circular buffer needs SM-side write pointer management.
IO write-overwrite semantics — if an IO device writes to a FULL cell (previous data not yet consumed), overwrite or error? Overwrite is simpler but loses data. Error needs a signalling mechanism.
Flash/EEPROM interface for ROM — SPI? Parallel? What storage device? ROM could be physical ROM chips on SM00's address bus, or flash accessed via page register.
Program image format — flat sequence of (flit1, flit2) token pairs in ROM. Needs a terminator or length prefix so EXEC knows when to stop. Length is the EXEC count parameter.
SM00 spontaneous token emission — SM00 could be specialised with a dispatch register that maps unsolicited IO events to pre-formed token templates for spontaneous emission (interrupt equivalent without prior READ). See "Spontaneous Token Emission" section above. Not committed for v0 — always-pending deferred read pattern is sufficient for basic IO. The dispatch register mechanism is a future refinement for lower-latency interrupt response.