personal memory agent

Observe Module#

Multimodal capture and AI-powered analysis of desktop activity.

Observer Architecture#

Observers are independent capture agents that upload segments to solstone via the HTTP ingest API (/app/observer/ingest/<key>). Each observer runs as its own process with its own lifecycle — solstone core is the journal + processing engine.

Observer What it captures Repo Runs as
solstone-linux Screen + audio on Linux solstone-linux systemd user service / standalone
solstone-macos Screen + audio on macOS solstone-macos Native menu bar app
solstone-tmux Tmux terminal sessions solstone-tmux systemd user service / standalone

Managing observers#

# List all registered observers
sol observer list

# Register a new observer
sol observer create <name>

# Check observer status
sol observer status <name>

# Rename an observer
sol observer rename <old> <new>

# Revoke an observer's key
sol observer revoke <name>

Commands#

Command Purpose
sol observer Screen and audio capture (auto-detects platform)
sol observe-linux Screen and audio capture on Linux (direct)
sol transcribe Audio transcription with faster-whisper
sol describe Visual analysis of screen recordings
sol sense Unified observation coordination

Architecture#

Observers (standalone or built-in)
       ↓ HTTP multipart upload
Observer Ingest API (/app/observer/ingest/<key>)
       ↓
   Raw media files (*.flac, *.webm, tmux_*.jsonl)
       ↓
sol sense (coordination)
   ├── sol transcribe → audio.jsonl
   └── sol describe → screen.jsonl

Linux Observer State Machine#

The Linux observer operates in two modes based on desktop activity:

SCREENCAST  ←→  IDLE
Mode Trigger Captures
SCREENCAST Screen active (not idle/locked/power-save) Video + Audio
IDLE Screen idle, locked, or power-save Audio only (if threshold met)

Segment boundaries are triggered by:

  • Transitions between SCREENCAST and IDLE modes
  • Mute state changes
  • 5-minute window elapsed

Key Components#

  • observer.py — Unified entry point with platform detection
  • linux/observer.py — Linux capture: audio + screencast + activity detection
  • linux/screencast.py — XDG Portal screencast with PipeWire + GStreamer
  • gnome/activity.py — GNOME-specific activity detection (idle, lock, power save)
  • observer_client.py — HTTP upload client for observer → server communication
  • sense.py — File watcher that dispatches transcription and description jobs
  • transcribe.py — Audio transcription with faster-whisper and sentence-level embeddings
  • describe.py — Vision analysis with Gemini, category-based prompts
  • categories/ — Category-specific prompts for screen content (see SCREEN_CATEGORIES.md)

Standalone Observers#

Tmux capture is handled by the solstone-tmux package, which runs as its own systemd user service. See solstone-tmux repo for setup instructions.

macOS capture is handled by the solstone-macos native Swift app. See solstone-macos repo.

Both upload segments via the same HTTP ingest API used by the built-in Linux observer.

Output Formats#

See JOURNAL.md for detailed extract schemas:

  • Audio transcripts: audio.jsonl with timestamps (speaker detection not included)
  • Screen analysis: screen.jsonl with frame-by-frame categorization

Configuration#

Requires the journal directory at project root. API keys for transcription/vision services configured in .env.