personal memory agent

Provider Implementation Guide#

Guide for implementing new AI providers in the think module.

For a high-level overview of the think module, see THINK.md.

Required Exports#

Each provider module in think/providers/ must export three functions:

Function Purpose
run_generate() Synchronous text generation, returns GenerateResult
run_agenerate() Asynchronous text generation, returns GenerateResult
run_cogitate() Tool-calling execution

See think/providers/__init__.py for the canonical export list and think/providers/google.py as a reference implementation.

Each provider module must also define __all__ exporting these three functions.

API Key Handling#

API keys are configured in the env section of journal/config/journal.json. At process startup, setup_cli() loads these into os.environ. Providers read keys from os.environ — no .env files or dotenv are involved.

Naming convention: {PROVIDER}_API_KEY (e.g., GOOGLE_API_KEY, OPENAI_API_KEY)

Implementation pattern:

api_key = os.getenv("MYPROVIDER_API_KEY")
if not api_key:
    raise ValueError("MYPROVIDER_API_KEY not found in environment")

Client caching: Providers typically cache client instances as module-level singletons to enable connection reuse:

_client = None

def _get_client():
    global _client
    if _client is None:
        api_key = os.getenv("MYPROVIDER_API_KEY")
        if not api_key:
            raise ValueError("MYPROVIDER_API_KEY not found in environment")
        _client = MyProviderClient(api_key=api_key)
    return _client

Settings app integration: Add your provider to PROVIDER_METADATA in think/providers/__init__.py with label and env_key fields. The settings UI dynamically builds provider dropdowns from the registry. Add corresponding API key UI fields in apps/settings/workspace.html for owner configuration.

run_generate() / run_agenerate()#

These functions handle direct LLM text generation. The unified API in think/models.py routes requests to provider-specific implementations and handles token logging and JSON validation centrally.

Function signature:

from think.providers.shared import GenerateResult

def run_generate(
    contents: Union[str, List[Any]],
    model: str,
    temperature: float = 0.3,
    max_output_tokens: int = 8192 * 2,
    system_instruction: Optional[str] = None,
    json_output: bool = False,
    thinking_budget: Optional[int] = None,
    timeout_s: Optional[float] = None,
    **kwargs: Any,
) -> GenerateResult:

The run_agenerate() function has the same signature but is async.

Return type - GenerateResult:

class GenerateResult(TypedDict, total=False):
    text: Required[str]           # Response text
    usage: Optional[dict]         # Normalized usage dict
    finish_reason: Optional[str]  # Normalized: "stop", "max_tokens", etc.
    thinking: Optional[list]      # List of thinking block dicts

Parameter details:

Parameter Notes
contents String, list of strings, or list with mixed content. For vision-capable providers (currently Google only), can include PIL Image objects. Other providers stringify non-text content.
model Already resolved by routing - providers don't need to handle model selection.
max_output_tokens Response token limit. Note: Google internally adds thinking_budget to this for total budget calculation.
system_instruction System prompt. Providers handle this per their API (separate field, prepended message, etc.).
json_output Request JSON response. Google uses response_mime_type, Anthropic/OpenAI use response format or system instruction.
thinking_budget Token budget for reasoning/thinking. Must be > 0 to enable; None or 0 means no thinking. Google and Anthropic use this directly. OpenAI ignores thinking_budget — instead, reasoning effort is controlled via model name suffixes (e.g., "gpt-5.2-high"). Valid suffixes: -none, -low, -medium, -high, -xhigh. Without a suffix, reasoning_effort is omitted and OpenAI uses the model default. Note: run_cogitate() always enables thinking regardless of this parameter.
timeout_s Request timeout in seconds. Convert to provider's expected format (e.g., Google uses milliseconds internally).
**kwargs Absorb unknown kwargs for forward compatibility. Provider-specific options (e.g., cached_content for Google) pass through here.

Key responsibilities:

  • Accept the common parameter set shown above
  • Return GenerateResult with text, usage, finish_reason, and thinking
  • Normalize finish_reason to standard values: "stop", "max_tokens", "safety", etc.
  • Handle provider-specific response parsing

Note: Token logging and JSON validation are handled by the wrapper in think/models.py, not by providers.

Important: Providers should gracefully ignore unsupported parameters rather than raising errors.

run_cogitate()#

Handles tool-calling execution.

async def run_cogitate(
    config: Dict[str, Any],
    on_event: Optional[Callable[[dict], None]] = None,
) -> str:

Config dict fields (see think/agents.py main_async() for routing logic):

  • prompt: User's input (required)
  • model: Model identifier
  • max_tokens: Output token limit
  • system_instruction: System instruction (journal.md for agents)
  • extra_context: Runtime context (facets, insights list, datetime) as first user message
  • user_instruction: Agent-specific prompt as second user message
  • tools: Optional list of allowed tool names
  • agent_id, name: Identity for logging and tool calls
  • session_id: CLI session ID for conversation continuation
  • chat_id: Chat ID for reverse lookup from agent to chat

Event emission:

Providers must emit events via the on_event callback. See think/providers/shared.py for TypedDict definitions:

Event When
StartEvent Agent run begins
ToolStartEvent Tool invocation starts
ToolEndEvent Tool invocation completes
ThinkingEvent Reasoning/thinking content available
FinishEvent Agent run completes successfully
ErrorEvent Error occurs

Use JSONEventCallback from think/providers/shared.py to wrap the callback and auto-add timestamps.

Finish event format:

The finish event must include the result text and should include usage for token tracking:

callback.emit({
    "event": "finish",
    "result": final_text,
    "usage": usage_dict,  # Same format as token logging
    "ts": int(time.time() * 1000),
})

Error handling pattern:

All providers must follow this pattern to prevent duplicate error reporting:

try:
    # ... agent logic ...
except Exception as exc:
    callback.emit({
        "event": "error",
        "error": str(exc),
        "trace": traceback.format_exc(),
    })
    setattr(exc, "_evented", True)  # Prevents duplicate reporting
    raise

Tool integration:

Invoke tools via sol call <module> <command> [args...] commands. Providers should route tool calls through the configured command path and honor config["tools"] allowlists when present.

Conversation continuation:

When session_id is provided, use the provider CLI's native resume mechanism:

session_id = config.get("session_id")
if session_id:
    cmd.extend(["--resume", session_id])

Each CLI tool manages its own session state internally. The session_id is returned from the CLI's init/finish event on the first interaction and reused for all subsequent continuations within the same chat.

Token Logging#

Token logging is handled centrally by the wrapper in think/models.py. Providers return usage data in their GenerateResult, and the wrapper calls log_token_usage().

Usage dict format:

Providers normalize usage into the unified schema defined by USAGE_KEYS in think/providers/shared.py. Each provider's _extract_usage() is responsible for mapping API-specific field names to these canonical keys. log_token_usage() passes through known keys — it does not re-normalize.

usage_dict = {
    "input_tokens": 1500,            # Required
    "output_tokens": 500,            # Required
    "total_tokens": 2000,            # Required (computed if missing)
    "cached_tokens": 800,            # Optional: cache hits
    "reasoning_tokens": 200,         # Optional: thinking/reasoning tokens
    "cache_creation_tokens": 100,    # Optional: cache creation cost
    "requests": 1,                   # Optional: request count
}

Key points:

  • Return usage in GenerateResult["usage"] - wrapper handles logging
  • For run_cogitate(), include usage in the finish event

Context & Routing#

Context strings determine provider and model selection. Providers receive already-resolved models, but understanding the system helps:

Context naming convention:

  • Talent configs (agents/generators): talent.{source}.{name} where source is system or app name
    • System: talent.system.meetings, talent.system.default
    • App: talent.entities.observer, talent.chat.helper
  • Other contexts: {module}.{feature}[.{operation}]
    • Examples: observe.describe.frame, app.chat.title

Dynamic discovery: All context metadata (tier/label/group) is defined in prompt .md files via YAML frontmatter:

  • Prompt files: Listed in PROMPT_PATHS in think/models.py - add context, tier, label, group fields
  • Categories: observe/categories/*.md - add tier, label, group fields
  • System talent: talent/*.md - add tier, label, group fields in frontmatter
  • App talent: apps/*/talent/*.md - add tier, label, group fields in frontmatter

All contexts are discovered at runtime. Use get_context_registry() to get the complete context map.

Resolution (handled by think/models.py resolve_provider(context, agent_type)):

  1. Exact match in journal.json providers.contexts
  2. Glob pattern match (fnmatch) with specificity ranking
  3. Dynamic context registry (discovered prompts, categories, talent configs)
  4. Type-specific default (from providers.generate or providers.cogitate)
  5. System defaults from TYPE_DEFAULTS

Providers don't implement routing - they receive the resolved model.

Configuration#

Provider configuration lives in journal.json under the providers key.

Structure:

providers:
  generate:
    provider: <provider-name>
    tier: <1|2|3>
    backup: <provider-name>
  cogitate:
    provider: <provider-name>
    tier: <1|2|3>
    backup: <provider-name>
  contexts:
    <context-pattern>:
      provider: <provider-name>
      model: <explicit-model>  # OR
      tier: <1|2|3>            # tier-based resolution
  models:
    <provider-name>:
      "<tier>": "<model-override>"

The generate section controls text generation (analysis, extraction, transcription). The cogitate section controls tool-calling agents (interactive chat, daily briefings). Each section has its own provider, tier, and backup provider.

Tier system:

  • 1 = PRO (most capable)
  • 2 = FLASH (balanced)
  • 3 = LITE (fast/cheap)

See tests/fixtures/journal/config/journal.json for a complete example and think/models.py PROVIDER_DEFAULTS for tier-to-model mappings.

Testing#

Required test coverage:

  1. Unit tests in tests/test_<provider>.py:

    • Mock API responses
    • Test parameter handling
    • Test error cases
  2. Integration tests in tests/integration/test_<provider>_backend.py:

    • Live API calls (require API keys)
    • End-to-end generation
    • Token usage verification

See existing test files for patterns:

  • tests/test_google.py, tests/test_openai.py, tests/test_anthropic.py
  • tests/integration/test_google_backend.py, etc.

Run integration tests with: make test-integration

Batch Processing#

The Batch class in think/batch.py automatically works with all providers via the unified agenerate() API in think/models.py. No provider-specific batch implementation is needed - just ensure your run_agenerate() works correctly.

OpenAI-Compatible Providers#

For providers with OpenAI-compatible APIs (e.g., DigitalOcean, Azure OpenAI, local LLMs), you can leverage the OpenAI SDK with a custom base URL:

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("MYPROVIDER_API_KEY"),
    base_url="https://api.myprovider.com/v1",
)

This allows reusing much of the OpenAI provider's patterns for request/response handling.

Checklist for New Providers#

Core implementation:

  1. Create think/providers/<name>.py with __all__ = ["run_generate", "run_agenerate", "run_cogitate"]
  2. Implement run_generate(), run_agenerate(), run_cogitate() following signatures above
  3. Import GenerateResult from think.providers.shared and return it from generate functions

Model constants in think/models.py: 4. Add model constants using the pattern {PROVIDER}_{TIER} (e.g., DO_LLAMA_70B, DO_MISTRAL_NEMO)

  • Existing examples: GEMINI_FLASH, GPT_5, CLAUDE_SONNET_4
  1. Add provider tier mappings to PROVIDER_DEFAULTS dict
  2. Update get_model_provider() to detect your models by prefix (critical for cost tracking)

Registry: 7. Add provider to PROVIDER_REGISTRY in think/providers/__init__.py 8. Add routing case in think/agents.py main_async() (around line 331)

Settings UI: 9. Add provider to PROVIDER_METADATA in think/providers/__init__.py with label and env_key 10. Add API key UI field in apps/settings/workspace.html

Testing: 11. Create unit tests in tests/test_<name>.py 12. Create integration tests in tests/integration/test_<name>_backend.py 13. Add test contexts to tests/fixtures/journal/config/journal.json

Documentation: 14. Update think/providers/__init__.py docstring 15. Update docs/THINK.md providers table 16. Update docs/CORTEX.md valid provider values