design-notes/io-and-bootstrap.md at main · nonbinary.computer/or1-design

nonbinary.computer / or1-design
fork atom
OR-1 dataflow CPU sketch
fork atom
or1-design / design-notes / io-and-bootstrap.md
at main 209 lines 14 kB view raw view rendered
wrap content
Orual feat: rewrite ProcessingElement with frame-based matching, output routing, and unified instruction set 3w ago
65613978
  1# Dynamic Dataflow CPU: I/O & Bootstrap
  2
  3Covers IO as memory-mapped structure or raw memory, the bootstrap sequence via SM00 EXEC, and the path from microcontroller-assisted bring-up to self-hosted boot.
  4
  5See `architecture-overview.md` for module taxonomy and token format. See `sm-design.md` for SM interface protocol, EXEC operation, and SM00 bootstrap details. See `bus-architecture-and-width-decoupling.md` for bus width rationale and IRAM write token format.
  6
  7## IO Model: Memory-Mapped raw or structure memory
  8
  9### How It Works
 10
 11An SM (typically SM00 at v0) maps IO devices into a reserved address range. From the CM's perspective, reading from a UART is identical to reading from any other SM cell:
 12
 13```
 14READ from IO-mapped address:
 15  1. CM sends SM READ token to SM00, IO-mapped address range
 16  2. If IO device has data: cell is FULL, SM returns result immediately
 17  3. If IO device has no data: cell is EMPTY, SM defers read
 18  4. When IO device receives data: cell transitions to FULL,
 19     deferred read is satisfied, result token emitted
 20
 21WRITE to IO-mapped address:
 22  1. CM sends SM WRITE token to SM00, IO-mapped address range
 23  2. SM forwards write data to the IO device
 24```
 25
 26The deferred read/write aspects of I-structure semantics provide natural interrupt-free IO. The CM does not poll or busy-wait. It issues a READ, and the result arrives when data is available. This is the dataflow-native IO model: external events are just writes to SM cells that satisfy deferred reads.
 27
 28### IO Address Mapping
 29
 30IO devices occupy a contiguous range within SM00's address space. The
 31mapping is configured at bootstrap (or hardwired for v0):
 32
 33```
 34Example SM00 address map (indicative, not final):
 35  0x000 - 0x0FF:  I-structure cells (tier 1, presence-tracked)
 36  0x100 - 0x1FF:  IO devices (tier 0 or tier 1 memory semantics)
 37  0x200 - 0x3FF:  Bootstrap ROM (tier 0, read-only)
 38```
 39
 40**Tier boundary direction** is a hardware design decision, not yet finalised. The current design intent places I-structure cells at low addresses (below the boundary) where they are directly addressable by 2-flit SM tokens and reachable by atomic operations (`rd_inc`, `rd_dec`,
 41`cas` — 5-bit opcode tier, 256-cell range). T0 raw storage sits above the boundary; it does not need atomics and can use extended addressing when necessary.
 42
 43The emulator currently uses `tier_boundary=256` with T1 below and T0 at/above, but the exact mapping may change during physical build. See `sm-design.md` for the full tier model and encoding details.
 44
 45Within the IO range, specific addresses map to device registers:
 46
 47```
 48Example IO register map (v0, UART only):
 49  0x000:  UART TX data (write)
 50  0x001:  UART RX data (read, defers if no byte available)
 51  0x002:  UART status (read, raw — no deferral)
 52  0x003:  UART config (write)
 53```
 54
 55### Unsolicited Events 
 56
 57When an external event occurs (e.g., UART receives a byte), the IO hardware writes the data into the corresponding SM cell. If a deferred read is pending on that cell, the SM satisfies it immediately and the requesting CM receives a result token with the data.
 58
 59If no deferred read is pending, the data sits in the cell (FULL state) until a CM reads it. This is natural flow control: the IO device produces, the CM consumes, with I-structure semantics as the synchronization primitive.
 60
 61Implications:
 62- IO devices are **write sources** into SM cells
 63- Backpressure: if the CM hasn't issued a READ, the IO device can still  write (cell becomes FULL). Subsequent writes before a READ overwrite; the SM cell is a 1-deep buffer. For deeper buffering, the IO hardware can use a range of cells as a circular buffer.
 64    - Possible a `read_len` instruction is warranted, though would require either very restricted address space, additional setup, or a 3-flit token.
 65- Standard SMs never spontaneously generate tokens; they only satisfy pending deferred reads. The IO device's write *triggers* the deferred read satisfaction, which is when the result token is emitted.
 66
 67### Spontaneous Token Generation (interrupts, sort of)
 68
 69The deferred-read model works well when the CM knows in advance that it wants IO data (it issues a READ, the read defers, the IO device eventually satisfies it). But some IO patterns are genuinely unsolicited, an interrupt-like event where external hardware needs to inject a token into the network without a prior READ request.
 70
 71SM00 could be specialized with a **dispatch register** (or small dispatch table) that maps IO events to pre-formed token templates. When an IO device signals an event and no deferred read is pending:
 72
 731. IO device asserts an event line (directly wired or via address decoder)
 742. SM00 reads the dispatch register for that event source
 753. The dispatch register contains a pre-formed token template (flit 1 routing + flit 2 data source), similar to the SM return routing mechanism (see `sm-and-token-format-discussion.md`)
 764. SM00 emits the token onto the network spontaneously
 77
 78The dispatch register is loaded at bootstrap (or via SM WRITE to a reserved address range). It tells SM00 "when UART RX fires and nobody is waiting, send this token to this PE at this offset."
 79
 80This makes SM00 the only SM that can act as a **token source** rather than purely a token responder. All other SMs remain reactive.
 81
 82**Hardware cost:** one additional register file (or a few reserved SM cells reinterpreted as dispatch entries) + event detection logic (edge detect on IO device status lines) + arbitration with normal SM operations. Estimated: 3-5 TTL chips beyond the base SM.
 83
 84**Alternative: always-pending deferred read.** The compiler ensures a READ is always pending on the IO cell. As soon as one is satisfied, the handler re-issues a READ immediately (feedback loop in the dataflow graph). This avoids SM00 specialization entirely but has a significant resource cost: the current SM design supports only **one deferred read at a time per SM instance**. An always-pending IO read permanently occupies SM00's single deferred read slot, blocking all
 85other deferred reads on SM00 (including other IO cells and any I-structure operations). If SM00 also serves as bootstrap SM and T0 shared storage, this is a real constraint.
 86
 87The always-pending pattern works for a single IO source on a dedicated SM, but scales poorly. Multiple IO sources (UART + SPI + timer) would each need their own SM instance just to have a deferred read slot, which is wasteful.
 88
 89The spontaneous emission model (dispatch registers) avoids this entirely. No deferred read slot consumed, SM00's normal memory operations remain unblocked. This tips the balance toward SM00 specialization for any system with more than trivial IO.
 90
 91**Peripheral controller with batch notification.** A smarter peripheral controller (external hardware on SM00's address bus) manages its own buffering in a reserved cell range, similar to
 92DMA/USART on STM32. The controller writes incoming data to a circular buffer of SM cells, tracks a write pointer internally, and writes a status/notification cell only at thresholds (half-complete, complete). The dataflow graph keeps one deferred read pending on the notification cell, processes a batch when it fires, and re-issues the read. This amortizes the single-deferred-read
 93cost across many IO events and keeps SM00's deferred read slot occupied only during the inter-batch interval, not per-byte. Still consumes the slot, but the duty cycle is much lower.
 94
 95**Multi-slot deferred reads.** The single-slot constraint that makes always-pending problematic could also be addressed by expanding the SM's deferred read storage to 2-4 entries using a small CAM. See `sm-design.md` "Multi-Slot Deferred Read CAM" section. Even 2 slots (one for IO, one for normal I-structure) would resolve the resource conflict without SM00 specialization. A 4-entry CAM covers multiple IO sources simultaneously. This is architecturally the cleanest option: no special cases, no spontaneous emission, just a slightly larger deferred read store that benefits all SMs uniformly.
 96
 97For v0 with a single UART at low baud rates, the always-pending pattern with a single deferred read slot is sufficient. The multi-slot CAM, peripheral controller, and/or spontaneous emission models are refinements for systems with multiple IO sources or higher throughput
 98requirements.
 99
100### Hardware
101
102IO hardware sits on SM00's internal data bus alongside the SRAM banks. The address decoder routes IO-range addresses to IO device registers instead of SRAM:
103
104```
105SM00 Internal Bus
106       |
107  [Address Decoder]
108       |
109       +---> addr < tier_boundary? --> [SRAM Banks] (I-structure cells)
110       |
111       +---> addr >= tier_boundary?
112               |
113               +---> IO range? --> [IO Device Registers]
114               |                       |
115               |                       +---> [UART chip (6850/16550/etc.)]
116               |                       +---> [future: SPI, GPIO, timer]
117               |
118               +---> ROM range? --> [Bootstrap ROM]
119               |
120               +---> else --> [T0 raw SRAM]
121```
122
123The IO device registers behave like SM cells from the SM controller's perspective: they have presence state, support READ/WRITE, and can satisfy deferred reads. The difference is that the data source is an external device rather than SRAM.
124
125Estimated additional hardware for IO: ~8-12 TTL chips (address decode, IO device interface, presence state for IO cells) + UART chip.
126
127---
128
129## IRAM Writes
130
131IRAM writes use CM misc-bucket tokens (prefix 011+01). They are addressed to a specific PE and carry instruction word data. See `bus-architecture-and-width-decoupling.md` for the IRAM write token format and valid-bit protection protocol.
132
133IRAM writes can originate from:
134- An external microcontroller (development/prototyping)
135- SM00's EXEC sequencer during bootstrap (reading pre-formed IRAM write tokens from ROM)
136- A CM running a loader program (runtime code loading)
137
138The network routes IRAM writes like any other CM token — bit[15]=0, PE_id in the appropriate flit 1 position. The target PE recognizes the 011+01 prefix and routes the token to the instruction memory write port.
139
140---
141
142## Bootstrap Sequence
143
144### Self-Hosted Bootstrap (via SM00 EXEC)
145
146On system reset:
147
1481. SM00 is wired to the reset signal
1492. SM00's sequencer triggers EXEC on a predetermined ROM base address
1503. The ROM region contains pre-formed tokens stored as 2-cell entries:
151   - IRAM write tokens (prefix 011+01) to load PE instruction memories
152   - Seed tokens (dyadic wide, monadic normal) to start execution
1534. SM00 reads each 2-cell entry and emits it as a 2-flit token on the bus
1545. PEs receive IRAM writes and load their instruction memories
1556. Seed tokens fire and execution begins
156
157The program image in ROM is a flat sequence of pre-formed token pairs. The compiler outputs this format directly — each entry is (flit 1, flit 2) as stored in two consecutive SM cells. SM00's EXEC sequencer is just an address counter with a limit comparator; it does not interpret the tokens it emits.
158
159### Development / Early Prototyping
160
161For Phase 0-2, an external microcontroller (RP2040, ESP32) acts as the bootstrap source. It is NOT part of the architecture — it is a test fixture.
162
163The microcontroller:
1641. Formats IRAM write tokens (prefix 011+01, 2 flits per instruction)
1652. Injects flits into the 16-bit bus
1663. Writes instruction words to each PE's instruction memory
1674. Optionally writes initial SM contents via SM tokens (bit[15]=1)
1685. Injects seed tokens to start execution
1696. Releases the bus (goes high-impedance or disconnects)
170
171This lets PE and SM hardware be tested without bootstrap ROM or EXEC sequencer existing. The microcontroller is the bootstrap, the debug interface, and the test harness all in one.
172
173### Routing During Bootstrap
174
175During bootstrap, routing tables are not yet configured. The network uses fixed-address default routing (see `network-and-communication.md`):
176
177- Each PE has a unique ID (EEPROM / DIP switches)
178- Routing nodes forward by destination ID without consulting tables
179- At v0 scale (shared bus), this is trivially true; everything sees everything
180- SM00's tokens reach all PEs via default routing since SM00 is on the shared bus
181
182### Seed Token Injection
183
184The bootstrap ROM includes seed tokens after the IRAM write tokens. These are standard CM tokens addressed to the entry point(s) of the loaded program. SM00's EXEC sequencer emits them in order after the IRAM writes.
185
186---
187## Open Design Questions
188
1891. **IO address range size** — how many cells reserved for IO in SM00?
190   Depends on device count and register depth per device.
1912. **IO cell buffering depth** — single cell (1-deep) per IO register, or
192   a range of cells for circular buffering? 1-deep is simplest; circular
193   buffer needs SM-side write pointer management.
1943. **IO write-overwrite semantics** — if an IO device writes to a FULL
195   cell (previous data not yet consumed), overwrite or error? Overwrite
196   is simpler but loses data. Error needs a signalling mechanism.
1974. **Flash/EEPROM interface for ROM** — SPI? Parallel? What storage device?
198   ROM could be physical ROM chips on SM00's address bus, or flash accessed
199   via page register.
2005. **Program image format** — flat sequence of (flit1, flit2) token pairs
201   in ROM. Needs a terminator or length prefix so EXEC knows when to stop.
202   Length is the EXEC count parameter.
2036. **SM00 spontaneous token emission** — SM00 could be specialised
204   with a dispatch register that maps unsolicited IO events to
205   pre-formed token templates for spontaneous emission (interrupt
206   equivalent without prior READ). See "Spontaneous Token Emission"
207   section above. Not committed for v0 — always-pending deferred read
208   pattern is sufficient for basic IO. The dispatch register mechanism
209   is a future refinement for lower-latency interrupt response.