personal memory agent
at main 375 lines 16 kB view raw view rendered
1# Provider Implementation Guide 2 3Guide for implementing new AI providers in the think module. 4 5For a high-level overview of the think module, see [THINK.md](THINK.md). 6 7## Required Exports 8 9Each provider module in `think/providers/` must export three functions: 10 11| Function | Purpose | 12|----------|---------| 13| `run_generate()` | Synchronous text generation, returns `GenerateResult` | 14| `run_agenerate()` | Asynchronous text generation, returns `GenerateResult` | 15| `run_cogitate()` | Tool-calling execution | 16 17See `think/providers/__init__.py` for the canonical export list and `think/providers/google.py` as a reference implementation. 18 19Each provider module must also define `__all__` exporting these three functions. 20 21## API Key Handling 22 23API keys are configured in the ``env`` section of ``journal/config/journal.json``. At process startup, ``setup_cli()`` loads these into ``os.environ``. Providers read keys from ``os.environ`` — no ``.env`` files or ``dotenv`` are involved. 24 25**Naming convention:** `{PROVIDER}_API_KEY` (e.g., `GOOGLE_API_KEY`, `OPENAI_API_KEY`) 26 27**Implementation pattern:** 28```python 29api_key = os.getenv("MYPROVIDER_API_KEY") 30if not api_key: 31 raise ValueError("MYPROVIDER_API_KEY not found in environment") 32``` 33 34**Client caching:** Providers typically cache client instances as module-level singletons to enable connection reuse: 35```python 36_client = None 37 38def _get_client(): 39 global _client 40 if _client is None: 41 api_key = os.getenv("MYPROVIDER_API_KEY") 42 if not api_key: 43 raise ValueError("MYPROVIDER_API_KEY not found in environment") 44 _client = MyProviderClient(api_key=api_key) 45 return _client 46``` 47 48**Settings app integration:** Add your provider to `PROVIDER_METADATA` in `think/providers/__init__.py` with `label` and `env_key` fields. The settings UI dynamically builds provider dropdowns from the registry. Add corresponding API key UI fields in `apps/settings/workspace.html` for owner configuration. 49 50## run_generate() / run_agenerate() 51 52These functions handle direct LLM text generation. The unified API in `think/models.py` routes requests to provider-specific implementations and handles token logging and JSON validation centrally. 53 54**Function signature:** 55```python 56from think.providers.shared import GenerateResult 57 58def run_generate( 59 contents: Union[str, List[Any]], 60 model: str, 61 temperature: float = 0.3, 62 max_output_tokens: int = 8192 * 2, 63 system_instruction: Optional[str] = None, 64 json_output: bool = False, 65 thinking_budget: Optional[int] = None, 66 timeout_s: Optional[float] = None, 67 **kwargs: Any, 68) -> GenerateResult: 69``` 70 71The `run_agenerate()` function has the same signature but is `async`. 72 73**Return type - GenerateResult:** 74```python 75class GenerateResult(TypedDict, total=False): 76 text: Required[str] # Response text 77 usage: Optional[dict] # Normalized usage dict 78 finish_reason: Optional[str] # Normalized: "stop", "max_tokens", etc. 79 thinking: Optional[list] # List of thinking block dicts 80``` 81 82**Parameter details:** 83 84| Parameter | Notes | 85|-----------|-------| 86| `contents` | String, list of strings, or list with mixed content. For vision-capable providers (currently Google only), can include PIL Image objects. Other providers stringify non-text content. | 87| `model` | Already resolved by routing - providers don't need to handle model selection. | 88| `max_output_tokens` | Response token limit. Note: Google internally adds `thinking_budget` to this for total budget calculation. | 89| `system_instruction` | System prompt. Providers handle this per their API (separate field, prepended message, etc.). | 90| `json_output` | Request JSON response. Google uses `response_mime_type`, Anthropic/OpenAI use response format or system instruction. | 91| `thinking_budget` | Token budget for reasoning/thinking. Must be `> 0` to enable; `None` or `0` means no thinking. Google and Anthropic use this directly. OpenAI ignores `thinking_budget` — instead, reasoning effort is controlled via model name suffixes (e.g., `"gpt-5.2-high"`). Valid suffixes: `-none`, `-low`, `-medium`, `-high`, `-xhigh`. Without a suffix, `reasoning_effort` is omitted and OpenAI uses the model default. Note: `run_cogitate()` always enables thinking regardless of this parameter. | 92| `timeout_s` | Request timeout in seconds. Convert to provider's expected format (e.g., Google uses milliseconds internally). | 93| `**kwargs` | Absorb unknown kwargs for forward compatibility. Provider-specific options (e.g., `cached_content` for Google) pass through here. | 94 95**Key responsibilities:** 96- Accept the common parameter set shown above 97- Return `GenerateResult` with text, usage, finish_reason, and thinking 98- Normalize `finish_reason` to standard values: `"stop"`, `"max_tokens"`, `"safety"`, etc. 99- Handle provider-specific response parsing 100 101**Note:** Token logging and JSON validation are handled by the wrapper in `think/models.py`, not by providers. 102 103**Important:** Providers should gracefully ignore unsupported parameters rather than raising errors. 104 105## run_cogitate() 106 107Handles tool-calling execution. 108 109```python 110async def run_cogitate( 111 config: Dict[str, Any], 112 on_event: Optional[Callable[[dict], None]] = None, 113) -> str: 114``` 115 116**Config dict fields** (see `think/agents.py` `main_async()` for routing logic): 117- `prompt`: User's input (required) 118- `model`: Model identifier 119- `max_tokens`: Output token limit 120- `system_instruction`: System instruction (journal.md for agents) 121- `extra_context`: Runtime context (facets, insights list, datetime) as first user message 122- `user_instruction`: Agent-specific prompt as second user message 123- `tools`: Optional list of allowed tool names 124- `use_id`, `name`: Identity for logging and tool calls 125- `session_id`: CLI session ID for conversation continuation 126- `chat_id`: Chat ID for reverse lookup from agent to chat 127 128**Event emission:** 129 130Providers must emit events via the `on_event` callback. See `think/providers/shared.py` for TypedDict definitions: 131 132| Event | When | 133|-------|------| 134| `StartEvent` | Agent run begins | 135| `ToolStartEvent` | Tool invocation starts | 136| `ToolEndEvent` | Tool invocation completes | 137| `ThinkingEvent` | Reasoning/thinking content available | 138| `FinishEvent` | Agent run completes successfully | 139| `ErrorEvent` | Error occurs | 140 141Use `JSONEventCallback` from `think/providers/shared.py` to wrap the callback and auto-add timestamps. 142 143**Finish event format:** 144 145The `finish` event must include the result text and should include usage for token tracking: 146```python 147callback.emit({ 148 "event": "finish", 149 "result": final_text, 150 "usage": usage_dict, # Same format as token logging 151 "ts": int(time.time() * 1000), 152}) 153``` 154 155**Error handling pattern:** 156 157All providers must follow this pattern to prevent duplicate error reporting: 158```python 159try: 160 # ... agent logic ... 161except Exception as exc: 162 callback.emit({ 163 "event": "error", 164 "error": str(exc), 165 "trace": traceback.format_exc(), 166 }) 167 setattr(exc, "_evented", True) # Prevents duplicate reporting 168 raise 169``` 170 171**Tool integration:** 172 173Invoke tools via `sol call <module> <command> [args...]` commands. 174Providers should route tool calls through the configured command path and 175honor `config["tools"]` allowlists when present. 176 177 178**Conversation continuation:** 179 180When `session_id` is provided, use the provider CLI's native resume mechanism: 181```python 182session_id = config.get("session_id") 183if session_id: 184 cmd.extend(["--resume", session_id]) 185``` 186 187Each CLI tool manages its own session state internally. The `session_id` is 188returned from the CLI's init/finish event on the first interaction and reused 189for all subsequent continuations within the same chat. 190 191## Token Logging 192 193Token logging is handled centrally by the wrapper in `think/models.py`. Providers return usage data in their `GenerateResult`, and the wrapper calls `log_token_usage()`. 194 195**Usage dict format:** 196 197Providers normalize usage into the unified schema defined by `USAGE_KEYS` in `think/providers/shared.py`. Each provider's `_extract_usage()` is responsible for mapping API-specific field names to these canonical keys. `log_token_usage()` passes through known keys — it does **not** re-normalize. 198 199```python 200usage_dict = { 201 "input_tokens": 1500, # Required 202 "output_tokens": 500, # Required 203 "total_tokens": 2000, # Required (computed if missing) 204 "cached_tokens": 800, # Optional: cache hits 205 "reasoning_tokens": 200, # Optional: thinking/reasoning tokens 206 "cache_creation_tokens": 100, # Optional: cache creation cost 207 "requests": 1, # Optional: request count 208} 209``` 210 211**Key points:** 212- Return usage in `GenerateResult["usage"]` - wrapper handles logging 213- For `run_cogitate()`, include usage in the `finish` event 214 215## Context & Routing 216 217Context strings determine provider and model selection. Providers receive already-resolved models, but understanding the system helps: 218 219**Context naming convention:** 220- Talent configs (agents/generators): `talent.{source}.{name}` where source is `system` or app name 221 - System: `talent.system.meetings`, `talent.system.default` 222 - App: `talent.entities.observer`, `talent.chat.helper` 223- Other contexts: `{module}.{feature}[.{operation}]` 224 - Examples: `observe.describe.frame`, `app.chat.title` 225 226**Dynamic discovery:** All context metadata (tier/label/group) is defined in prompt .md files via YAML frontmatter: 227- Prompt files: Listed in `PROMPT_PATHS` in `think/models.py` - add `context`, `tier`, `label`, `group` fields 228- Categories: `observe/categories/*.md` - add `tier`, `label`, `group` fields 229- System talent: `talent/*.md` - add `tier`, `label`, `group` fields in frontmatter 230- App talent: `apps/*/talent/*.md` - add `tier`, `label`, `group` fields in frontmatter 231 232All contexts are discovered at runtime. Use `get_context_registry()` to get the complete context map. 233 234**Resolution** (handled by `think/models.py` `resolve_provider(context, agent_type)`): 2351. Exact match in journal.json `providers.contexts` 2362. Glob pattern match (fnmatch) with specificity ranking 2373. Dynamic context registry (discovered prompts, categories, talent configs) 2384. Type-specific default (from `providers.generate` or `providers.cogitate`) 2395. System defaults from `TYPE_DEFAULTS` 240 241Providers don't implement routing - they receive the resolved model. 242 243## Configuration 244 245Provider configuration lives in `journal.json` under the `providers` key. 246 247**Structure:** 248``` 249providers: 250 generate: 251 provider: <provider-name> 252 tier: <1|2|3> 253 backup: <provider-name> 254 cogitate: 255 provider: <provider-name> 256 tier: <1|2|3> 257 backup: <provider-name> 258 contexts: 259 <context-pattern>: 260 provider: <provider-name> 261 model: <explicit-model> # OR 262 tier: <1|2|3> # tier-based resolution 263 models: 264 <provider-name>: 265 "<tier>": "<model-override>" 266``` 267 268The `generate` section controls text generation (analysis, extraction, transcription). 269The `cogitate` section controls tool-calling agents (interactive chat, daily briefings). 270Each section has its own provider, tier, and backup provider. 271 272**Tier system:** 273- 1 = PRO (most capable) 274- 2 = FLASH (balanced) 275- 3 = LITE (fast/cheap) 276 277See `tests/fixtures/journal/config/journal.json` for a complete example and `think/models.py` `PROVIDER_DEFAULTS` for tier-to-model mappings. 278 279## Testing 280 281**Required test coverage:** 282 2831. **Unit tests** in `tests/test_<provider>.py`: 284 - Mock API responses 285 - Test parameter handling 286 - Test error cases 287 2882. **Integration tests** in `tests/integration/test_<provider>_backend.py`: 289 - Live API calls (require API keys) 290 - End-to-end generation 291 - Token usage verification 292 293See existing test files for patterns: 294- `tests/test_google.py`, `tests/test_openai.py`, `tests/test_anthropic.py` 295- `tests/integration/test_google_backend.py`, etc. 296 297Run integration tests with: `make test-integration` 298 299## Batch Processing 300 301The `Batch` class in `think/batch.py` automatically works with all providers via the unified `agenerate()` API in `think/models.py`. No provider-specific batch implementation is needed - just ensure your `run_agenerate()` works correctly. 302 303## OpenAI-Compatible Providers 304 305For providers with OpenAI-compatible APIs (e.g., DigitalOcean, Azure OpenAI, local LLMs), you can leverage the OpenAI SDK with a custom base URL: 306 307```python 308from openai import OpenAI 309 310client = OpenAI( 311 api_key=os.getenv("MYPROVIDER_API_KEY"), 312 base_url="https://api.myprovider.com/v1", 313) 314``` 315 316This allows reusing much of the OpenAI provider's patterns for request/response handling. 317 318The Ollama provider (`think/providers/ollama.py`) takes a different approach — 319it uses Ollama's native ``/api/chat`` endpoint directly via ``httpx`` for 320reliable thinking control. See the Ollama section below. 321 322## Ollama (Local) Provider 323 324The ``ollama`` provider connects to a local Ollama instance via the native 325``/api/chat`` endpoint (not the OpenAI-compatible endpoint, which silently 326ignores the ``think`` parameter on models like Qwen3.5). Key differences 327from cloud providers: 328 329- **No API key required.** ``validate_key()`` checks Ollama reachability 330 instead of key validity. 331- **Model prefix convention:** Models use the ``ollama-local/`` prefix 332 (e.g., ``ollama-local/qwen3.5:9b``). The prefix is stripped before 333 sending requests to the Ollama API. 334- **Thinking support:** Controlled via Ollama's ``think`` parameter, 335 mapped from ``thinking_budget``. Budget > 0 enables thinking; 336 None or 0 disables it. 337- **Cogitate via OpenCode CLI.** ``run_cogitate()`` uses the OpenCode CLI 338 (``opencode run --format json``) as a subprocess, following the same 339 CLIRunner pattern as the other providers. Requires OpenCode CLI installed 340 and configured with a user-level ``.opencode/opencode.json`` that registers 341 the local Ollama instance as a provider. Do not place this config in the 342 project root — it belongs in the user's config directory. 343- **Base URL:** Reads ``OLLAMA_BASE_URL`` env var, defaults to 344 ``http://localhost:11434``. 345 346## Checklist for New Providers 347 348**Core implementation:** 3491. Create `think/providers/<name>.py` with `__all__ = ["run_generate", "run_agenerate", "run_cogitate"]` 3502. Implement `run_generate()`, `run_agenerate()`, `run_cogitate()` following signatures above 3513. Import `GenerateResult` from `think.providers.shared` and return it from generate functions 352 353**Model constants** in `think/models.py`: 3544. Add model constants using the pattern `{PROVIDER}_{TIER}` (e.g., `DO_LLAMA_70B`, `DO_MISTRAL_NEMO`) 355 - Existing examples: `GEMINI_FLASH`, `GPT_5`, `CLAUDE_SONNET_4` 3565. Add provider tier mappings to `PROVIDER_DEFAULTS` dict 3576. Update `get_model_provider()` to detect your models by prefix (critical for cost tracking) 358 359**Registry:** 3607. Add provider to `PROVIDER_REGISTRY` in `think/providers/__init__.py` 3618. Add routing case in `think/agents.py` `main_async()` (around line 331) 362 363**Settings UI:** 3649. Add provider to `PROVIDER_METADATA` in `think/providers/__init__.py` with `label` and `env_key` 36510. Add API key UI field in `apps/settings/workspace.html` 366 367**Testing:** 36811. Create unit tests in `tests/test_<name>.py` 36912. Create integration tests in `tests/integration/test_<name>_backend.py` 37013. Add test contexts to `tests/fixtures/journal/config/journal.json` 371 372**Documentation:** 37314. Update `think/providers/__init__.py` docstring 37415. Update `docs/THINK.md` providers table 37516. Update `docs/CORTEX.md` valid provider values