personal memory agent

solstone Journal Guide#

This document describes the layout of a journal directory where all captures, extracts, and insights are stored. Each dated YYYYMMDD folder is referred to as a day, and within each day captured content is organized into segments (timestamped duration folders). Each segment folder uses the format HHMMSS_LEN/ where HHMMSS is the start time and LEN is the duration in seconds. This folder name serves as the segment key, uniquely identifying the segment within a given day.

The Three-Layer Architecture#

solstone transforms raw recordings into actionable understanding through a three-layer pipeline:

┌─────────────────────────────────────┐
│  LAYER 3: AGENT OUTPUTS             │  Narrative summaries
│  (Markdown files)                   │  "What it means"
│  - agents/*.md (daily outputs)      │
│  - *.md (segment outputs)           │
└─────────────────────────────────────┘
         ↑ synthesized from
┌─────────────────────────────────────┐
│  LAYER 2: EXTRACTS                  │  Structured data
│  (JSON/JSONL files)                 │  "What happened"
│  - audio.jsonl, *_audio.jsonl       │
│  - screen.jsonl, *_screen.jsonl     │
│  - events/*.jsonl (per-facet)       │
└─────────────────────────────────────┘
         ↑ derived from
┌─────────────────────────────────────┐
│  LAYER 1: CAPTURES                  │  Raw recordings
│  (Binary media files)               │  "What was recorded"
│  - *.flac, *.ogg, *.opus, *.wav (audio)    │
│  - *.webm (video)                   │
└─────────────────────────────────────┘

Vocabulary Quick Reference#

Pipeline Layers

Term Definition Examples
Capture Raw audio/video recording *.flac, *.ogg, *.opus, *.wav, *.webm
Extract Structured data from captures *.jsonl
Agent Output AI-generated narrative summary agents/*.md, HHMMSS_LEN/*.md

Organization

Term Definition Examples
Day 24-hour activity directory 20250119/
Segment 5-minute time window 143022_300/ (14:30:22, 5 min)
Span Sequential segment group Import creating 3 segments
Facet Project/context scope #work, #personal

Extracted Data

Term Definition Examples
Entity Tracked person/project/concept People, companies, tools
Occurrence Time-based event Meetings, messages, files

Top-Level Directory Structure#

Directory/File Purpose
YYYYMMDD/ Daily capture folders containing segments, extracts, and agent outputs
entities/ Journal-level entity identity records (<id>/entity.json)
facets/ Facet-specific data: entity relationships, todos, events, news, action logs
agents/ Agent run logs in per-agent subdirectories (<name>/<id>.jsonl), day indexes (<day>.jsonl), and latest-run symlinks (<name>.log)
apps/ App-specific storage (distinct from codebase apps/)
streams/ Per-stream state files (<name>.json) tracking segment chains and sequence numbers
imports/ Imported audio files and processing artifacts
tokens/ Token usage logs from AI model calls, organized by day
indexer/ Search index (journal.sqlite FTS5 database)
health/ Service health logs (<service>.log files)
config/ Configuration files and journal-level action logs
task_log.txt Optional log of utility runs in [epoch]\tmessage format
summary.md Journal-wide statistics summary (generated by sol journal-stats)
stats.json Detailed journal statistics in JSON format (generated by sol journal-stats)

Config directory#

  • config/journal.json – owner configuration for the journal (optional, see Owner configuration).
  • config/convey.json – Convey UI preferences (facet/app ordering, selected facet).
  • config/actions/ – journal-level action logs (see Action Logs).

Owner configuration#

The optional config/journal.json file allows customization of journal processing and presentation based on owner preferences. This file should be created at the journal root and contains personal settings that affect how the system processes and interprets journal data.

Identity configuration#

The identity block contains information about the journal owner that helps tools correctly identify the owner in transcripts, meetings, and other captured content:

{
  "identity": {
    "name": "Jeremie Miller",
    "preferred": "Jer",
    "pronouns": {
      "subject": "he",
      "object": "him",
      "possessive": "his",
      "reflexive": "himself"
    },
    "aliases": ["Jer", "jeremie"],
    "email_addresses": ["jer@example.com"],
    "timezone": "America/Los_Angeles"
  }
}

Fields:

  • name (string) – Full legal or formal name of the journal owner
  • preferred (string) – Preferred name or nickname to be used when addressing the owner
  • pronouns (object) – Structured pronoun set for template usage with fields:
    • subject – Subject pronoun (e.g., "he", "she", "they")
    • object – Object pronoun (e.g., "him", "her", "them")
    • possessive – Possessive adjective (e.g., "his", "her", "their")
    • reflexive – Reflexive pronoun (e.g., "himself", "herself", "themselves")
  • aliases (array of strings) – Alternative names, nicknames, or usernames that may appear in transcripts
  • email_addresses (array of strings) – Email addresses associated with the owner for participant detection
  • timezone (string) – IANA timezone identifier (e.g., "America/New_York", "Europe/London") for timestamp interpretation

This configuration helps meeting extraction identify the owner as a participant, enables personalized agent interactions, and ensures timestamps are interpreted correctly across the journal.

Convey configuration#

The convey block contains settings for the web application:

{
  "convey": {
    "password": "your-password-here"
  }
}

Fields:

  • password (string) – Password for accessing the convey web application. When set, owners must authenticate before accessing the journal interface.

UI Preferences: The separate config/convey.json file stores UI/UX personalization (facet/app ordering, selected facet). All fields optional:

{
  "facets": {"order": ["work", "personal"], "selected": "work"},
  "apps": {"order": ["home", "calendar", "todos"], "starred": ["home", "todos"]}
}
  • facets.order – Custom facet ordering. facets.selected – Currently selected facet (auto-synced with browser).
  • apps.order – Custom app ordering in menu bar.
  • apps.starred – Apps to show in the quick-access starred section.

Retention configuration#

The retention block controls automatic cleanup of layer 1 raw media (audio recordings, video captures, screen diffs) while preserving all layer 2 extracts and layer 3 agent outputs. Three modes control when raw media is deleted:

  • "keep" – retain raw media indefinitely (default)
  • "days" – delete raw media after raw_media_days days, once the segment has finished processing
  • "processed" – delete raw media as soon as the segment has finished processing
{
  "retention": {
    "raw_media": "days",
    "raw_media_days": 30,
    "per_stream": {
      "plaud": {
        "raw_media": "days",
        "raw_media_days": 7
      },
      "archon": {
        "raw_media": "processed"
      }
    }
  }
}

Fields:

  • raw_media (string) – Retention mode: "keep", "days", or "processed". Default: "keep".
  • raw_media_days (integer or null) – Number of days to retain raw media when mode is "days". Required when raw_media is "days", ignored otherwise.
  • per_stream (object) – Per-stream overrides keyed by stream name. Each entry supports raw_media and raw_media_days. Omitted fields inherit from the global retention settings.

"Raw media" means layer 1 capture files only: audio files (.flac, .opus, .ogg, .m4a, .wav), video files (.webm, .mov, .mp4), and screen diffs (monitor_*_diff.png).

All layer 2 and layer 3 content is always preserved regardless of retention policy: transcripts (audio.jsonl, screen.jsonl), agent outputs (agents/*.md), speaker labels (agents/speaker_labels.json), facet events (events/*.jsonl), entity data, segment metadata (stream.json), and search index entries.

Raw media is never deleted from segments that haven't finished processing. A segment is considered complete only when all four checks pass:

  • No _active.jsonl files in agents/ (no running agents)
  • audio.jsonl (or *_audio.jsonl) exists if audio raw media was captured
  • screen.jsonl (or *_screen.jsonl) exists if video raw media was captured
  • agents/speaker_labels.json exists if voice embeddings (.npz) are present

Purged segments remain fully navigable in convey. Transcripts, entities, speaker labels, and summaries are all intact. The only difference is that audio/video playback is unavailable.

Environment variables#

The env block provides fallback values for environment variables. These are loaded at CLI startup and used when the corresponding variable is not set in the shell or .env file:

{
  "env": {
    "GOOGLE_API_KEY": "your-google-api-key",
    "ANTHROPIC_API_KEY": "your-anthropic-api-key",
    "OPENAI_API_KEY": "your-openai-api-key",
    "REVAI_ACCESS_TOKEN": "your-revai-token",
    "PLAUD_ACCESS_TOKEN": "your-plaud-token"
  }
}

Precedence order (highest to lowest):

  1. Shell environment variables
  2. .env file in project root
  3. Journal config env section

This allows storing API keys in the journal config as an alternative to .env, which can be useful when the journal is synced across machines or when you want to keep all configuration in one place.

Template usage examples#

The structured pronoun format enables proper pronoun usage in generated text and agent responses:

# In templates or generated text:
f"{identity.pronouns.subject} joined the meeting"  # "he joined the meeting"
f"I spoke with {identity.pronouns.object}"         # "I spoke with him"
f"That is {identity.pronouns.possessive} desk"     # "That is his desk"
f"{identity.pronouns.subject} did it {identity.pronouns.reflexive}"  # "he did it himself"

For complete documentation of the prompt template system including all variable categories, composition patterns, and how to add new variables, see PROMPT_TEMPLATES.md.

Transcribe configuration#

The transcribe block configures audio transcription settings for sol transcribe:

{
  "transcribe": {
    "backend": "whisper",
    "enrich": true,
    "preserve_all": false,
    "whisper": {
      "device": "auto",
      "model": "medium.en",
      "compute_type": "default"
    },
    "revai": {
      "model": "fusion"
    }
  }
}

Top-level fields:

  • backend (string) – STT backend to use: "whisper" (local processing) or "revai" (cloud with speaker diarization). Default: "whisper".
  • enrich (boolean) – Enable LLM enrichment for topic extraction and transcript correction. Default: true.
  • preserve_all (boolean) – Keep audio files even when no speech is detected. When false, silent recordings are deleted to save disk space. Default: false.

Whisper backend settings (transcribe.whisper):

  • device (string) – Device for inference: "auto" (detect GPU, fall back to CPU), "cpu", or "cuda". Default: "auto".
  • model (string) – Whisper model to use (e.g., "tiny.en", "base.en", "small.en", "medium.en", "large-v3-turbo", "distil-large-v3"). Default: "medium.en".
  • compute_type (string) – Compute precision: "default" (auto-select optimal for platform), "float32" (most compatible), "float16" (faster on CUDA GPUs), "int8" (fastest on CPU). Default: "default".

Rev.ai backend settings (transcribe.revai):

  • model (string) – Rev.ai transcriber model: "fusion" (best quality), "machine" (fast automated), or "low_cost". Default: "fusion".

Platform auto-detection (Whisper): When compute_type is "default", optimal settings are automatically selected:

  • CUDA GPU: Uses float16 for GPU-optimized inference
  • CPU (including Apple Silicon): Uses int8 for ~2x faster inference and significantly faster model loading

Voice embeddings (resemblyzer) also auto-detect the best device: MPS on Apple Silicon (~16x faster), CUDA when available, or CPU fallback.

CLI flags can override settings: --backend selects the backend, --cpu forces CPU mode with int8 (Whisper only), --model MODEL overrides the Whisper model.

Describe configuration#

The describe block configures screen analysis settings for sol describe:

{
  "describe": {
    "max_extractions": 20,
    "categories": {
      "code": {
        "importance": "high",
        "extraction": "Extract when viewing different repositories or files"
      },
      "gaming": {
        "importance": "ignore"
      }
    }
  }
}

Fields:

  • max_extractions (integer) – Maximum number of frames to run detailed content extraction on per video. The first qualified frame is always extracted regardless of this limit. When more frames are eligible, selection uses AI-based prioritization (falling back to random selection). Default: 20.
  • categories (object) – Per-category overrides for importance and extraction guidance.

Category overrides#

Each category (e.g., code, meeting, browsing) can have:

Field Values Description
importance high, normal, low, ignore Advisory priority hint for AI frame selection. high prioritizes these frames, low deprioritizes unless unique, ignore suggests skipping unless categorization seems wrong. Default: normal.
extraction string Custom guidance for when to extract content from this category. Overrides the default from the category's .json file.

Importance levels are advisory hints passed to the AI selection process, not hard filters. The AI may still select frames from ignore categories if it determines the content is valuable or the categorization may be incorrect.

Providers configuration#

The providers block enables fine-grained control over which LLM provider and model is used for different contexts. This supports a tier-based system where you can specify capability levels (pro/flash/lite) rather than specific model names.

{
  "providers": {
    "default": {
      "provider": "google",
      "tier": 2
    },
    "contexts": {
      "observe.*": {"provider": "google", "tier": 3},
      "talent.system.*": {"tier": 1},
      "talent.system.meetings": {"provider": "anthropic", "disabled": true},
      "talent.entities.observer": {"tier": 2, "extract": false}
    },
    "models": {
      "google": {
        "1": "gemini-3-pro-preview",
        "2": "gemini-3-flash-preview",
        "3": "gemini-2.5-flash-lite"
      }
    }
  }
}

Tier system#

Tiers provide a provider-agnostic way to specify model capability levels:

Tier Name Description
1 pro Highest capability, best for complex reasoning
2 flash Balanced performance and cost (default)
3 lite Fastest and cheapest, for simple tasks

System defaults map tiers to models for each provider. See think/models.py for current tier-to-model mappings (PROVIDER_DEFAULTS constant).

If a requested tier is unavailable for a provider, the system falls back to more capable tiers (e.g., tier 3 → tier 2 → tier 1).

Context matching#

Contexts are matched in order of specificity:

  1. Exact match"talent.system.meetings" matches only that exact context
  2. Glob pattern"observe.*" matches any context starting with observe.
  3. Default – Falls back to the default configuration

Context naming convention#

Talent configs (agents and generators) use the pattern talent.{source}.{name}:

  • System configs: talent.system.{name} (e.g., talent.system.meetings, talent.system.default)
  • App configs: talent.{app}.{name} (e.g., talent.entities.observer, talent.support.support)

Other contexts follow the pattern {module}.{feature}[.{operation}]:

  • Observe pipeline: observe.describe.frame, observe.enrich, observe.transcribe.gemini

Configuration options#

default – Global defaults applied when no context matches:

  • provider (string) – Provider name: "google", "openai", or "anthropic". Default: "google".
  • tier (integer) – Tier number (1-3). Default: 2 (flash).
  • model (string) – Explicit model name (overrides tier if specified).

contexts – Context-specific overrides. Each key is a context pattern, value is:

  • provider (string) – Override provider (optional, inherits from default).
  • tier (integer) – Tier number (optional).
  • model (string) – Explicit model name (optional, overrides tier).
  • disabled (boolean) – Disable this talent config (optional, talent contexts only).
  • extract (boolean) – Enable/disable event extraction for generators with occurrence/anticipation hooks (optional).

models – Per-provider tier overrides. Maps provider name to tier-model mappings:

{
  "google": {"1": "gemini-3-pro-preview", "2": "gemini-3-flash-preview"},
  "openai": {"2": "gpt-5-mini-custom"}
}

Note: Tier keys in JSON must be strings ("1", "2", "3") since JSON doesn't support integer keys.

Facet folders#

The facets/ directory provides a way to organize journal content by scope or focus area. Each facet represents a cohesive grouping of related activities, projects, or areas of interest.

Facet structure#

Each facet is organized as facets/<facet>/ where <facet> is a descriptive short unique name. When referencing facets in the system, use hashtags (e.g., #personal for the "Personal Life" facet, #ml_research for "Machine Learning Research"). Each facet folder contains:

  • facet.json – metadata file with facet title and description.
  • activities/ – configured activities and completed activity records (see Activity Records).
  • entities/ – entity relationships and detected entities (see Facet Entities).
  • todos/ – daily todo lists (see Facet-Scoped Todos).
  • events/ – extracted events per day (see Event extracts).
  • news/ – daily news and updates relevant to the facet (optional).
  • logs/ – action audit logs for tool calls (optional, see Action Logs).

Facet metadata#

The facet.json file contains basic information about the facet:

{
  "title": "Machine Learning Research",
  "description": "AI/ML research projects, experiments, and related activities",
  "color": "#4f46e5",
  "emoji": "🧠"
}

Optional fields:

  • color – hex color code for the facet card background in the web UI
  • emoji – emoji icon displayed in the top-left of the facet card
  • muted – boolean flag to mute/hide the facet from views (default: false)

Facet Entities#

Entities in solstone use a two-tier architecture with journal-level entities (canonical identity) and facet relationships (per-facet context). There are also detected entities (daily discoveries) that can be promoted to attached status.

Entity Storage Structure#

entities/
  └── {entity_id}/
      └── entity.json              # Journal-level entity (canonical identity)

facets/{facet}/
  └── entities/
      ├── YYYYMMDD.jsonl           # Daily detected entities
      └── {entity_id}/
          ├── entity.json          # Facet relationship
          ├── observations.jsonl   # Durable facts (optional)
          └── voiceprints.npz      # Voice recognition data (optional)

Journal-level entities (entities/<id>/entity.json) store the canonical identity: name, type, aliases (aka), and principal flag. These are shared across all facets.

Facet relationships (facets/<facet>/entities/<id>/entity.json) store per-facet context: description, timestamps, and custom fields specific to that facet.

Entity memory (observations, voiceprints) is stored alongside facet relationships.

Journal-Level Entities#

Journal entities represent the canonical identity record:

{
  "id": "alice_johnson",
  "name": "Alice Johnson",
  "type": "Person",
  "aka": ["Ali", "AJ"],
  "is_principal": false,
  "created_at": 1704067200000
}

Standard fields:

  • id (string) – Stable slug identifier derived from name via entity_slug() in think/entities/ (lowercase, underscores, e.g., "Alice Johnson" → "alice_johnson"). Used for folder paths, URLs, and tool references.
  • name (string) – Display name for the entity.
  • type (string) – Entity type (e.g., "Person", "Company", "Project", "Tool"). Types are flexible and owner-defined; must be alphanumeric with spaces, minimum 3 characters.
  • aka (array of strings) – Alternative names, nicknames, or acronyms. Used in audio transcription and fuzzy matching.
  • is_principal (boolean) – When true, identifies this entity as the journal owner. Auto-flagged when name/aka matches identity config.
  • blocked (boolean) – When true, entity is hidden from all facets and excluded from agent context.
  • created_at (integer) – Unix timestamp in milliseconds when entity was created.

Facet Relationships#

Facet relationships link journal entities to specific facets with context:

{
  "entity_id": "alice_johnson",
  "description": "Lead engineer on the API project",
  "attached_at": 1704067200000,
  "updated_at": 1704153600000,
  "last_seen": "20260115"
}

Relationship fields:

  • entity_id (string) – Links to the journal entity.
  • description (string) – Facet-specific description.
  • attached_at (integer) – Unix timestamp when attached to this facet.
  • updated_at (integer) – Unix timestamp of last modification.
  • last_seen (string) – Day (YYYYMMDD) when last mentioned in journal content.
  • detached (boolean) – When true, soft-deleted from this facet but data preserved.
  • Custom fields (any) – Additional facet-specific metadata (e.g., tier, status, priority).

Detected Entities#

Daily detection files (facets/<facet>/entities/YYYYMMDD.jsonl) contain entities automatically discovered by agents from journal content:

{"type": "Person", "name": "Charlie Brown", "description": "Mentioned in standup meeting"}
{"type": "Tool", "name": "React", "description": "Used in UI development work"}

Entity Lifecycle#

  1. Detection: Daily agents scan journal content and record entities in facets/<facet>/entities/YYYYMMDD.jsonl
  2. Aggregation: Review agent tracks detection frequency across recent days
  3. Promotion: Entities with 3+ detections are auto-promoted to attached, or owners manually promote via UI
  4. Persistence: Creates journal entity + facet relationship; remains active until detached
  5. Detachment: Sets detached: true on facet relationship, preserving all data
  6. Re-attachment: Clears detached flag, restoring the entity with preserved history
  7. Blocking: Sets blocked: true on journal entity and detaches from all facets

Cross-Facet Behavior#

The same entity can be attached to multiple facets with independent descriptions and timestamps. When loading entities across all facets, the alphabetically-first facet wins for duplicates during aggregation.

Facet News#

The news/ directory provides a chronological record of news, updates, and external developments relevant to the facet. This allows tracking of industry news, research updates, regulatory changes, or any external information that impacts the facet's focus area.

News organization#

News files are organized by date as news/YYYYMMDD.md where each file contains the day's relevant news items. Only create files for days that have news to record—sparse population is expected.

News file format#

Each YYYYMMDD.md file is a markdown document with a consistent structure:

# 2025-01-18 News - Machine Learning Research

## OpenAI Announces New Model Architecture
**Source:** techcrunch.com | **Time:** 09:15
Summary of the announcement and its relevance to current research projects...

## Paper: "Efficient Attention Mechanisms in Transformers"
**Source:** arxiv.org | **Time:** 14:30
Key findings from the paper and potential applications...

## Google Research Updates Dataset License Terms
**Source:** blog.google | **Time:** 16:45
Changes to dataset licensing that may affect ongoing experiments...

News entry structure#

Each news entry should include:

  • Title – concise headline as a level 2 heading
  • Source – origin of the news (website, journal, etc.)
  • Time – optional time of publication or discovery (HH:MM format)
  • Summary – brief description focusing on relevance to the facet
  • Impact – optional notes on how this affects facet work

News metadata#

Optionally, a news.json file can be maintained at the root of the news directory to track metadata:

{
  "last_updated": "2025-01-18",
  "sources": ["arxiv.org", "techcrunch.com", "nature.com"],
  "auto_fetch": false,
  "keywords": ["transformer", "attention", "llm", "research"]
}

This allows for future automation of news gathering while maintaining manual curation quality.

Activity Records#

The activities/ directory within each facet stores both the configured activity types (activities.jsonl) and completed activity records organized by day ({day}.jsonl). Activity records represent completed spans of activity — periods where a specific activity type was continuously tracked across one or more recording segments.

File path pattern:

facets/personal/activities/activities.jsonl                        # Configured activity types
facets/personal/activities/20260209.jsonl                          # Completed records for the day
facets/work/activities/20260209.jsonl
facets/work/activities/20260209/coding_095809_303/session_review.md  # Generated output

Each day file contains one JSON object per line, where each record represents a completed activity span:

{"id": "coding_095809_303", "activity": "coding", "segments": ["095809_303", "100313_303", "100816_303", "101320_302"], "level_avg": 0.88, "description": "Developed extraction prompts using Claude Code and VS Code", "active_entities": ["Claude Code", "VS Code", "sunstone"], "created_at": 1770435619415}
{"id": "meeting_090953_303", "activity": "meeting", "segments": ["090953_303", "091457_303", "092001_304", "092506_304", "093010_304"], "level_avg": 1.0, "description": "Sprint planning meeting with the engineering team", "active_entities": ["Alice", "Bob"], "created_at": 1770435619420}

Record ID scheme#

Activity record IDs follow the format {activity_type}_{segment_key} where segment_key is the segment in which the activity started. This is unique within a facet+day because only one activity of a given type can start in a given segment for one facet.

Record fields#

  • id (string) – Unique identifier: {activity}_{start_segment_key} (e.g., coding_095809_303)
  • activity (string) – Activity type ID from the facet's configured activities
  • segments (array of strings) – Ordered list of segment keys where this activity was active
  • level_avg (float) – Average engagement level across all segments (high=1.0, medium=0.5, low=0.25)
  • description (string) – AI-synthesized description of the full activity span
  • active_entities (array of strings) – Merged and deduplicated entity names from all segments
  • created_at (integer) – Unix timestamp in milliseconds when the record was created

Lifecycle#

Activity records are created by the activities segment agent when it detects that an activity has ended:

  1. The activity_state agent tracks per-segment, per-facet activity states with continuity via since fields. Each entry includes an id field ({activity}_{since}) that uniquely identifies the activity span, and activity.live events are emitted for active entries.
  2. The activities agent runs after activity_state and compares previous vs. current segment states
  3. When an activity ends (explicitly, implicitly, or via timeout), the agent walks the segment chain to collect all data
  4. A record is written to the facet's day file with preliminary description
  5. An LLM synthesizes all per-segment descriptions into a unified narrative
  6. The record description is updated with the synthesized version

Segment flush: If no new segments arrive for an extended period (1 hour), the supervisor triggers sol dream --flush on the last segment. Agents that declare hook.flush: true (like activities) run with flush=True in their context, treating all remaining active activities as ended. This ensures activities are recorded promptly even when the owner stops working, and prevents cross-day data loss.

Records are written idempotently — duplicate IDs are skipped on re-runs.

Generated output#

Activity-scheduled agents (schedule: "activity") produce output that is stored alongside the activity records, organized by day and record ID:

facets/{facet}/activities/{day}/{activity_id}/{agent}.{ext}

For example, a session_review agent processing a coding activity would write to:

facets/work/activities/20260209/coding_095809_303/session_review.md

These output directories are only created when activity-scheduled agents run. The path is computed by get_activity_output_path() in think/activities.py and passed as output_path in the agent request. Output files are indexed for search via the facets/*/activities/*/*/*.md formatter pattern.

Facet-Scoped Todos#

Todos are organized by facet in facets/{facet}/todos/{day}.jsonl where each file stores todo items as JSON Lines. Todos belong to a specific facet (e.g., "personal", "work", "research") and are completely separated by scope.

File path pattern:

facets/personal/todos/20250110.jsonl
facets/work/todos/20250110.jsonl
facets/research/todos/20250112.jsonl

Each file contains one JSON object per line, with the line number (1-indexed) serving as the stable todo ID.

{"text": "Draft standup update"}
{"text": "Review PR #1234 for indexing tweaks", "time": "14:30"}
{"text": "Morning planning session notes", "completed": true}
{"text": "Cancel meeting with vendor", "cancelled": true}

Format Specification#

JSONL structure:

Each line is a JSON object with the following fields:

  • text (required) – Task description
  • time (optional) – Scheduled time in HH:MM format (e.g., "14:30")
  • completed (optional) – Set to true when task is done
  • cancelled (optional) – Set to true for soft-deleted tasks
  • created_at (optional) – Unix timestamp in milliseconds when todo was created
  • updated_at (optional) – Unix timestamp in milliseconds of last modification

Facet context:

  • Facet is determined by the file location, not inline tags
  • Each facet has its own independent todo list for each day
  • Work todos (facets/work/todos/) are completely separate from personal todos (facets/personal/todos/)

Rules:

  • Line number is the stable todo ID (1-indexed); todos are never removed, only cancelled
  • Append new todos at the end of the file to maintain stable line numbering
  • Mark completed items with "completed": true
  • Cancel items with "cancelled": true (soft delete preserves line numbers)

Tool Access: All todo operations require both day and facet parameters:

  • todo_list(day, facet) – view numbered checklist for a specific facet
  • todo_add(day, facet, text) – append new todo
  • todo_done(day, facet, line_number) – mark complete
  • todo_cancel(day, facet, line_number) – cancel entry (soft delete)
  • todo_upcoming(limit, facet=None) – view upcoming todos (optionally filtered by facet)

This facet-scoped structure provides true separation of concerns while enabling automated tools to manage tasks deterministically.

Action Logs#

Action logs record an audit trail of owner-initiated actions and agent tool calls. There are two types:

  • Journal-level logs (config/actions/) – actions not tied to a specific facet (settings changes, remote observer management)
  • Facet-scoped logs (facets/{facet}/logs/) – actions within a specific facet (todos, entities)

Journal Action Logs#

The config/actions/ directory records journal-level actions. Logs are organized by day as config/actions/YYYYMMDD.jsonl.

{
  "timestamp": "2025-12-16T07:33:05.135587+00:00",
  "source": "app",
  "actor": "settings",
  "action": "identity_update",
  "params": {
    "changed_fields": {"name": {"old": "John", "new": "John Doe"}}
  }
}

Facet Action Logs#

The logs/ directory within each facet records facet-scoped actions. Logs are organized by day as facets/{facet}/logs/YYYYMMDD.jsonl.

{
  "timestamp": "2025-12-16T07:33:05.135587+00:00",
  "source": "tool",
  "actor": "todos:todo",
  "action": "todo_add",
  "params": {
    "text": "Review project proposal"
  },
  "facet": "work",
  "agent_id": "1765870373972"
}

Log Entry Fields#

Both log types share the same structure:

  • timestamp – ISO 8601 timestamp of the action
  • source – Origin type: "app" for web UI, "tool" for agent tools
  • actor – App or tool name that performed the action
  • action – Action name (e.g., "todo_add", "identity_update")
  • params – Action-specific parameters
  • facet – Facet name (only present in facet-scoped logs)
  • agent_id – Agent ID (only present for agent tool actions)

These logs enable auditing, debugging, and potential rollback of automated actions.

Token Usage#

The tokens/ directory tracks token usage from all AI model calls across the system. Usage data is organized by day as tokens/YYYYMMDD.jsonl where each file contains JSON Lines entries for that day's API calls.

Token log format#

Each line in a token log file is a JSON object with the following structure:

{
  "timestamp": 1736812345000,
  "model": "gemini-2.5-flash",
  "context": "agent.default.20250113_143022",
  "segment": "143022_300",
  "usage": {
    "input_tokens": 1500,
    "output_tokens": 500,
    "total_tokens": 2000,
    "cached_tokens": 800,
    "reasoning_tokens": 200
  }
}

Required fields:

  • timestamp – Unix timestamp in milliseconds (13 digits)
  • model – Model identifier (e.g., "gemini-2.5-flash", "gpt-5", "claude-sonnet-4-5")
  • context – Calling context (e.g., "agent.name.agent_id" or "module.function:line")
  • usage – Token counts dictionary with normalized field names

Optional fields:

  • segment – Recording segment key (e.g., "143022_300") when token usage is attributable to a specific observation window

Usage fields (all optional depending on model capabilities):

  • input_tokens – Tokens in the prompt/input
  • output_tokens – Tokens in the response/output
  • total_tokens – Total tokens consumed
  • cached_tokens – Tokens served from cache (reduces cost)
  • reasoning_tokens – Tokens used for extended thinking/reasoning
  • requests – Number of API requests made (for batch operations)

The logging system normalizes provider-specific formats (OpenAI, Gemini, Anthropic) into this unified schema for consistent cost tracking and analysis across all models.

Agent Event Logs#

The agents/ directory stores event logs for all AI agent sessions managed by Cortex. Each agent session produces a JSONL file containing the complete event history.

Directory layout:

  • <name>/ – per-agent subdirectory (e.g., default/, entities--observer/)
  • <name>/<agent_id>_active.jsonl – currently running agent (renamed when complete)
  • <name>/<agent_id>.jsonl – completed agent session
  • <name>.log – symlink to the latest completed run for each agent name
  • <day>.jsonl – day index with one summary line per agent that completed on that day

The agent_id is a Unix timestamp in milliseconds that uniquely identifies the session.

Event format (JSONL):

Each line is a JSON object with an event field indicating the event type:

{"event": "start", "ts": 1755450767962, "name": "helper", "prompt": "Help me with...", "facet": "work"}
{"event": "text", "ts": 1755450768000, "content": "I'll help you with that."}
{"event": "tool_call", "ts": 1755450769000, "tool": "search", "params": {"query": "example"}}
{"event": "tool_result", "ts": 1755450770000, "tool": "search", "result": "..."}
{"event": "finish", "ts": 1755450771000, "result": "Here's what I found..."}

Common event types:

  • start – agent session started, includes name, prompt, and facet
  • text – streaming text output from the agent
  • tool_call – agent invoked a tool
  • tool_result – result returned from tool execution
  • error – error occurred during execution
  • finish – agent session completed, includes final result

See CORTEX.md for agent architecture and spawning details.

App Storage#

The apps/ directory provides storage space for Convey apps to persist configuration, data, and artifacts specific to this journal. Each app has its own directory at apps/<app_name>/ where it can maintain app-specific state independent of the application codebase.

Apps typically use config.json for journal-specific settings and create subdirectories for data storage (e.g., cache/, data/, logs/). This is distinct from the app metadata file (apps/<app>/app.json in the codebase) which defines icon, label, and facet support across all journals. See APPS.md for storage utilities (get_app_storage_path, load_app_config, save_app_config).

Search Index#

The indexer/ directory contains the full-text search index built from journal content.

Files:

  • indexer/journal.sqlite – FTS5 SQLite database containing indexed chunks from agent outputs, events, entities, todos, and action logs

The indexer converts content to markdown chunks via the formatters framework, then indexes with metadata fields (day, facet, agent) for filtering. Raw audio/screen transcripts are formattable but not indexed — agent outputs provide more useful search results. Use get_journal_index() from think/indexer/journal.py to access the database programmatically.

Which content gets indexed is controlled by the FORMATTERS registry in think/formatters.py. Each entry maps a glob pattern to a formatter function and an indexed flag. The registry patterns must be specific enough to use as Path.glob() arguments from the journal root — adding a new content location requires a new entry.

Run sol indexer to rebuild the index from current journal content.

Service Health#

The health/ directory contains log files for long-running services.

Files:

  • health/<service>.log – log output for each service (e.g., observe.log, cortex.log, convey.log)
  • health/retention.log – JSONL log of retention purge operations with timestamps, files deleted, bytes freed, and per-segment details

These logs are useful for debugging service issues. See DOCTOR.md for diagnostics and troubleshooting guidance.

Imported Audio#

The imports/ directory stores audio files imported via the import app, along with their processing artifacts. Each import is organized by detected timestamp:

imports/
  └── YYYYMMDD_HHMMSS/           # Import directory (detected or owner-specified timestamp)
      ├── import.json            # Import metadata and processing status
      ├── {original_filename}    # Original uploaded audio file
      ├── imported.json          # Processed transcript in standard format
      └── segments.json          # List of segment keys created for this import

Import metadata#

The import.json file tracks the import process:

{
  "original_filename": "meeting_recording.m4a",
  "upload_timestamp": 1755034698276,
  "upload_datetime": "2025-08-12T15:38:18.276000",
  "detection_result": {
    "day": "20250630",
    "time": "143256",
    "confidence": "high",
    "source": "Date/Time Original"
  },
  "detected_timestamp": "20250630_143256",
  "user_timestamp": "20250630_143256",
  "file_size": 13950943,
  "mime_type": "audio/x-m4a",
  "facet": "work",
  "processing_completed": "2025-08-12T15:41:42.970189"
}

Once processed, imports are linked into the appropriate day's segment via imported_audio.jsonl files that reference the original import location.

Day folder contents#

Within each day, captured content is organized into segments (timestamped duration folders). The folder name is the segment key, which uniquely identifies the segment within the day and follows this format:

  • HHMMSS_LEN/ – Start time and duration in seconds (e.g., 143022_300/ for a 5-minute segment starting at 14:30:22)

Each segment progresses through the three-layer pipeline: captures are recorded, extracts are generated, and agent outputs are synthesized.

Stream identity#

Every segment belongs to a stream — a named series of segments from a single source. Streams provide navigable chains linking each segment to its predecessor.

  • stream.json – Per-segment stream marker containing:
    • stream – stream name (e.g., "archon", "import.apple")
    • prev_day – day of the previous segment in this stream (null for first)
    • prev_segment – segment key of the predecessor (null for first)
    • seq – sequence number within the stream

Stream names follow the convention: {hostname} for local observers, {remote_name} for remotes, import.{type} for imports (e.g., import.apple, import.text). Global stream state is tracked in the top-level streams/ directory as {name}.json files.

Pre-stream segments (created before stream identity was added) have no stream.json and are handled gracefully as None throughout the pipeline.

Layer 1: Captures#

Captures are the original binary media files recorded by observation tools.

Audio captures#

Audio files are initially written to the day root with the segment key prefix (Linux) or directly to segment folders (macOS):

  • Linux: HHMMSS_LEN_*.flac – audio files in day root (e.g., 143022_300_audio.flac)
  • macOS: HHMMSS_LEN/audio.m4a – audio files written directly to segment folder

After transcription, audio files are moved into their segment folder:

  • HHMMSS_LEN/*.flac, *.m4a, *.ogg, *.opus, or *.wav – audio files moved here after processing, preserving descriptive suffix (e.g., audio.flac, audio.m4a, imported_audio.opus)

Note: The descriptive portion after the segment key (e.g., _audio, _recording) is preserved when files are moved into segment directories. Processing tools match files by extension only, ignoring the descriptive suffix.

Screen captures#

Screen recordings use per-monitor files with position and connector/displayID in the filename:

  • Linux: HHMMSS_LEN_<position>_<connector>_screen.webm – screencast video files in day root (e.g., 143022_300_center_DP-3_screen.webm)
  • macOS: HHMMSS_LEN/<position>_<displayID>_screen.mov – video files written directly to segment folder (e.g., center_1_screen.mov)

After analysis, files are in their segment folder:

  • HHMMSS_LEN/<position>_<connector>_screen.webm or *.mov – video files (e.g., center_DP-3_screen.webm, center_1_screen.mov)

For multi-monitor setups, each monitor produces a separate file. Position labels include: center, left, right, top, bottom, and combinations like left-top.

Layer 2: Extracts#

Extracts are structured data files (JSON/JSONL) derived from captures through AI analysis.

Audio transcript extracts#

The transcript file (audio.jsonl) contains a metadata line followed by one JSON object per transcript segment.

Example transcript file:

{"raw": "audio.flac"}
{"start": "00:00:01", "source": "mic", "text": "So we need to finalize the authentication module today."}
{"start": "00:00:15", "source": "sys", "text": "I agree. Let's make sure we have proper unit tests."}

Metadata line (first line):

  • raw – path to processed audio file (required)
  • backend – STT backend used (e.g., "whisper", "revai")
  • model – model used for transcription (e.g., "medium.en", "revai-fusion")
  • device – device used for inference (e.g., "cuda", "cpu", "cloud")
  • compute_type – compute precision used (e.g., "float16", "int8", "api")
  • remote – remote name if transcribed from a remote source (optional)
  • imported – object with import metadata for external files (optional):
    • id – unique import identifier
    • facet – facet name for entity extraction
    • setting – contextual setting description

Transcript statements (subsequent lines):

  • start – timestamp in HH:MM:SS format (required)
  • text – transcribed text (required)
  • source – audio source: "mic" or "sys" (optional)
  • speaker – speaker identifier, numeric or string (optional, not currently populated)
  • corrected – LLM-corrected version of text (optional, added during enrichment)
  • description – tone or delivery description, e.g., "enthusiastic", "questioning" (optional, added during enrichment)

Screen frame extracts#

Screen analysis files use per-monitor naming: <position>_<connector>_screen.jsonl (e.g., center_DP-3_screen.jsonl, left_HDMI-1_screen.jsonl). For single-monitor setups, the file is simply screen.jsonl. Each file contains one JSON object per qualified frame. Frames qualify when they show significant visual change (≥5% RMS difference) compared to the previous qualified frame.

Example frame record:

{
  "frame_id": 123,
  "timestamp": 45.67,
  "requests": [
    {"type": "describe", "model": "gemini-2.5-flash-lite", "duration": 0.5},
    {"type": "category", "category": "reading", "model": "gemini-3-flash", "duration": 1.2}
  ],
  "analysis": {
    "visual_description": "Documentation page showing API reference.",
    "primary": "reading",
    "secondary": "none",
    "overlap": true
  },
  "content": {
    "reading": "# API Reference\n\n## Authentication\n\nUse Bearer tokens..."
  }
}

Common fields:

  • frame_id – sequential frame number in the video
  • timestamp – time in seconds from video start
  • requests – list of vision API requests made for this frame (type: "describe" for initial, "category" for follow-ups)
  • analysis – categorization result with primary, secondary, overlap, and visual_description
  • content – object containing category-specific extracted content (see below)
  • error – present when processing failed after retries

Category-specific content (inside content object):

  • messaging – markdown content when frame contains chat/email apps
  • browsing – markdown content when frame contains web browsing
  • reading – markdown content when frame contains documents/articles
  • productivity – markdown content when frame contains spreadsheets/slides/calendars
  • meeting – JSON object when frame contains video conferencing, includes participant detection and bounding boxes

The vision analysis uses multi-stage conditional processing:

  1. Initial categorization determines content type (e.g., code, meeting, browsing, reading). See observe/categories/ for the full list of categories.
  2. Category-specific follow-up prompts are discovered from observe/categories/*.md files
  3. Follow-ups are triggered for categories that have extraction content in their .md file (currently: messaging, browsing, reading, productivity output markdown; meeting outputs JSON)

Event extracts#

Generator output processing extracts time-based events from the day's transcripts—meetings, messages, follow-ups, file activity and more. Events are stored per-facet in JSONL files at facets/{facet}/events/{day}.jsonl.

There are two types of events:

  • Occurrences – events that happened on the capture day (occurred: true)
  • Anticipations – future scheduled events extracted from calendar views (occurred: false)
{"type": "meeting", "start": "09:00:00", "end": "09:30:00", "title": "Team stand-up", "summary": "Status update with the engineering team", "work": true, "participants": ["Jeremie Miller", "Alice", "Bob"], "facet": "work", "agent": "meetings", "occurred": true, "source": "20250101/agents/meetings.md", "details": "Sprint planning discussion"}
{"type": "deadline", "date": "2025-01-15", "start": null, "end": null, "title": "Project milestone", "summary": "Q1 deliverable due", "work": true, "participants": [], "facet": "work", "agent": "schedule", "occurred": false, "source": "20250101/agents/schedule.md", "details": "Final review before release"}

Common fields:

  • type – event kind: meeting, message, file, followup, documentation, research, media, deadline, appointment, etc.
  • start and end – HH:MM:SS timestamps (or null for anticipations without specific times)
  • date – ISO date YYYY-MM-DD (anticipations only, indicates scheduled date)
  • title and summary – short text for display and search
  • facet – facet name the event belongs to (required)
  • agent – source generator type (e.g., "meetings", "schedule", "flow")
  • occurredtrue for occurrences, false for anticipations
  • source – path to the output file that generated this event
  • work – boolean, work vs. personal classification
  • participants – optional list of people or entities involved
  • details – free-form string with additional context

This structure allows the indexer to collect and search events across all facets and days.

Layer 3: Agent Outputs#

Agent outputs are AI-generated markdown files that provide human-readable narratives synthesized from captures and extracts.

Segment outputs#

After captures are processed, segment-level outputs are generated within each segment folder as HHMMSS_LEN/*.md files. Available segment output types are defined by templates in talent/ with "schedule": "segment" in their metadata JSON.

Daily outputs#

Post-processing generates day-level outputs in the agents/ directory that synthesize all segments.

Generator discovery: Available generator types are discovered at runtime from:

  • talent/*.md – system generator templates (files with schedule field but no tools field)
  • apps/{app}/talent/*.md – app-specific generator templates

Each template is a .md file with JSON frontmatter containing metadata (title, description, schedule, output format). The schedule field is required and must be "segment" or "daily" - generators with missing or invalid schedule are skipped. Use get_talent_configs(has_tools=False) from think/talent.py to retrieve all available generators, or get_talent_configs(has_tools=False, schedule="daily") to get generators filtered by schedule.

Output naming:

  • System outputs: agents/{agent}.md (e.g., agents/flow.md, agents/meetings.md)
  • App outputs: agents/_{app}_{agent}.md (e.g., agents/_entities_observer.md)
  • JSON output: agents/{agent}.json when metadata specifies "output": "json"

Each generator type has a corresponding template file ({name}.md) that defines how the AI synthesizes extracts into narrative form.