Observe Module#
Multimodal capture and AI-powered analysis of desktop activity.
Observer Architecture#
Observers are independent capture agents that upload segments to solstone via the HTTP ingest API (/app/observer/ingest/<key>). Each observer runs as its own process with its own lifecycle — solstone core is the journal + processing engine.
| Observer | What it captures | Repo | Runs as |
|---|---|---|---|
| solstone-linux | Screen + audio on Linux | solstone-linux |
systemd user service / standalone |
| solstone-macos | Screen + audio on macOS | solstone-macos |
Native menu bar app |
| solstone-tmux | Tmux terminal sessions | solstone-tmux |
systemd user service / standalone |
Managing observers#
# List all registered observers
sol observer list
# Register a new observer
sol observer create <name>
# Check observer status
sol observer status <name>
# Rename an observer
sol observer rename <old> <new>
# Revoke an observer's key
sol observer revoke <name>
Commands#
| Command | Purpose |
|---|---|
sol observer |
Screen and audio capture (auto-detects platform) |
sol observe-linux |
Screen and audio capture on Linux (direct) |
sol transcribe |
Audio transcription with faster-whisper |
sol describe |
Visual analysis of screen recordings |
sol sense |
Unified observation coordination |
Architecture#
Observers (standalone or built-in)
↓ HTTP multipart upload
Observer Ingest API (/app/observer/ingest/<key>)
↓
Raw media files (*.flac, *.webm, tmux_*.jsonl)
↓
sol sense (coordination)
├── sol transcribe → audio.jsonl
└── sol describe → screen.jsonl
Linux Observer State Machine#
The Linux observer operates in two modes based on desktop activity:
SCREENCAST ←→ IDLE
| Mode | Trigger | Captures |
|---|---|---|
| SCREENCAST | Screen active (not idle/locked/power-save) | Video + Audio |
| IDLE | Screen idle, locked, or power-save | Audio only (if threshold met) |
Segment boundaries are triggered by:
- Transitions between SCREENCAST and IDLE modes
- Mute state changes
- 5-minute window elapsed
Key Components#
- observer.py — Unified entry point with platform detection
- linux/observer.py — Linux capture: audio + screencast + activity detection
- linux/screencast.py — XDG Portal screencast with PipeWire + GStreamer
- gnome/activity.py — GNOME-specific activity detection (idle, lock, power save)
- observer_client.py — HTTP upload client for observer → server communication
- sense.py — File watcher that dispatches transcription and description jobs
- transcribe.py — Audio transcription with faster-whisper and sentence-level embeddings
- describe.py — Vision analysis with Gemini, category-based prompts
- categories/ — Category-specific prompts for screen content (see SCREEN_CATEGORIES.md)
Standalone Observers#
Tmux capture is handled by the solstone-tmux package, which runs as its own systemd user service. See solstone-tmux repo for setup instructions.
macOS capture is handled by the solstone-macos native Swift app. See solstone-macos repo.
Both upload segments via the same HTTP ingest API used by the built-in Linux observer.
Output Formats#
See JOURNAL.md for detailed extract schemas:
- Audio transcripts:
audio.jsonlwith timestamps (speaker detection not included) - Screen analysis:
screen.jsonlwith frame-by-frame categorization
Configuration#
Requires the journal directory at project root. API keys for transcription/vision services configured in .env.