personal memory agent
1# Provider Implementation Guide
2
3Guide for implementing new AI providers in the think module.
4
5For a high-level overview of the think module, see [THINK.md](THINK.md).
6
7## Required Exports
8
9Each provider module in `think/providers/` must export three functions:
10
11| Function | Purpose |
12|----------|---------|
13| `run_generate()` | Synchronous text generation, returns `GenerateResult` |
14| `run_agenerate()` | Asynchronous text generation, returns `GenerateResult` |
15| `run_cogitate()` | Tool-calling execution |
16
17See `think/providers/__init__.py` for the canonical export list and `think/providers/google.py` as a reference implementation.
18
19Each provider module must also define `__all__` exporting these three functions.
20
21## API Key Handling
22
23API keys are configured in the ``env`` section of ``journal/config/journal.json``. At process startup, ``setup_cli()`` loads these into ``os.environ``. Providers read keys from ``os.environ`` — no ``.env`` files or ``dotenv`` are involved.
24
25**Naming convention:** `{PROVIDER}_API_KEY` (e.g., `GOOGLE_API_KEY`, `OPENAI_API_KEY`)
26
27**Implementation pattern:**
28```python
29api_key = os.getenv("MYPROVIDER_API_KEY")
30if not api_key:
31 raise ValueError("MYPROVIDER_API_KEY not found in environment")
32```
33
34**Client caching:** Providers typically cache client instances as module-level singletons to enable connection reuse:
35```python
36_client = None
37
38def _get_client():
39 global _client
40 if _client is None:
41 api_key = os.getenv("MYPROVIDER_API_KEY")
42 if not api_key:
43 raise ValueError("MYPROVIDER_API_KEY not found in environment")
44 _client = MyProviderClient(api_key=api_key)
45 return _client
46```
47
48**Settings app integration:** Add your provider to `PROVIDER_METADATA` in `think/providers/__init__.py` with `label` and `env_key` fields. The settings UI dynamically builds provider dropdowns from the registry. Add corresponding API key UI fields in `apps/settings/workspace.html` for owner configuration.
49
50## run_generate() / run_agenerate()
51
52These functions handle direct LLM text generation. The unified API in `think/models.py` routes requests to provider-specific implementations and handles token logging and JSON validation centrally.
53
54**Function signature:**
55```python
56from think.providers.shared import GenerateResult
57
58def run_generate(
59 contents: Union[str, List[Any]],
60 model: str,
61 temperature: float = 0.3,
62 max_output_tokens: int = 8192 * 2,
63 system_instruction: Optional[str] = None,
64 json_output: bool = False,
65 thinking_budget: Optional[int] = None,
66 timeout_s: Optional[float] = None,
67 **kwargs: Any,
68) -> GenerateResult:
69```
70
71The `run_agenerate()` function has the same signature but is `async`.
72
73**Return type - GenerateResult:**
74```python
75class GenerateResult(TypedDict, total=False):
76 text: Required[str] # Response text
77 usage: Optional[dict] # Normalized usage dict
78 finish_reason: Optional[str] # Normalized: "stop", "max_tokens", etc.
79 thinking: Optional[list] # List of thinking block dicts
80```
81
82**Parameter details:**
83
84| Parameter | Notes |
85|-----------|-------|
86| `contents` | String, list of strings, or list with mixed content. For vision-capable providers (currently Google only), can include PIL Image objects. Other providers stringify non-text content. |
87| `model` | Already resolved by routing - providers don't need to handle model selection. |
88| `max_output_tokens` | Response token limit. Note: Google internally adds `thinking_budget` to this for total budget calculation. |
89| `system_instruction` | System prompt. Providers handle this per their API (separate field, prepended message, etc.). |
90| `json_output` | Request JSON response. Google uses `response_mime_type`, Anthropic/OpenAI use response format or system instruction. |
91| `thinking_budget` | Token budget for reasoning/thinking. Must be `> 0` to enable; `None` or `0` means no thinking. Google and Anthropic use this directly. OpenAI ignores `thinking_budget` — instead, reasoning effort is controlled via model name suffixes (e.g., `"gpt-5.2-high"`). Valid suffixes: `-none`, `-low`, `-medium`, `-high`, `-xhigh`. Without a suffix, `reasoning_effort` is omitted and OpenAI uses the model default. Note: `run_cogitate()` always enables thinking regardless of this parameter. |
92| `timeout_s` | Request timeout in seconds. Convert to provider's expected format (e.g., Google uses milliseconds internally). |
93| `**kwargs` | Absorb unknown kwargs for forward compatibility. Provider-specific options (e.g., `cached_content` for Google) pass through here. |
94
95**Key responsibilities:**
96- Accept the common parameter set shown above
97- Return `GenerateResult` with text, usage, finish_reason, and thinking
98- Normalize `finish_reason` to standard values: `"stop"`, `"max_tokens"`, `"safety"`, etc.
99- Handle provider-specific response parsing
100
101**Note:** Token logging and JSON validation are handled by the wrapper in `think/models.py`, not by providers.
102
103**Important:** Providers should gracefully ignore unsupported parameters rather than raising errors.
104
105## run_cogitate()
106
107Handles tool-calling execution.
108
109```python
110async def run_cogitate(
111 config: Dict[str, Any],
112 on_event: Optional[Callable[[dict], None]] = None,
113) -> str:
114```
115
116**Config dict fields** (see `think/agents.py` `main_async()` for routing logic):
117- `prompt`: User's input (required)
118- `model`: Model identifier
119- `max_tokens`: Output token limit
120- `system_instruction`: System instruction (journal.md for agents)
121- `extra_context`: Runtime context (facets, insights list, datetime) as first user message
122- `user_instruction`: Agent-specific prompt as second user message
123- `tools`: Optional list of allowed tool names
124- `use_id`, `name`: Identity for logging and tool calls
125- `session_id`: CLI session ID for conversation continuation
126- `chat_id`: Chat ID for reverse lookup from agent to chat
127
128**Event emission:**
129
130Providers must emit events via the `on_event` callback. See `think/providers/shared.py` for TypedDict definitions:
131
132| Event | When |
133|-------|------|
134| `StartEvent` | Agent run begins |
135| `ToolStartEvent` | Tool invocation starts |
136| `ToolEndEvent` | Tool invocation completes |
137| `ThinkingEvent` | Reasoning/thinking content available |
138| `FinishEvent` | Agent run completes successfully |
139| `ErrorEvent` | Error occurs |
140
141Use `JSONEventCallback` from `think/providers/shared.py` to wrap the callback and auto-add timestamps.
142
143**Finish event format:**
144
145The `finish` event must include the result text and should include usage for token tracking:
146```python
147callback.emit({
148 "event": "finish",
149 "result": final_text,
150 "usage": usage_dict, # Same format as token logging
151 "ts": int(time.time() * 1000),
152})
153```
154
155**Error handling pattern:**
156
157All providers must follow this pattern to prevent duplicate error reporting:
158```python
159try:
160 # ... agent logic ...
161except Exception as exc:
162 callback.emit({
163 "event": "error",
164 "error": str(exc),
165 "trace": traceback.format_exc(),
166 })
167 setattr(exc, "_evented", True) # Prevents duplicate reporting
168 raise
169```
170
171**Tool integration:**
172
173Invoke tools via `sol call <module> <command> [args...]` commands.
174Providers should route tool calls through the configured command path and
175honor `config["tools"]` allowlists when present.
176
177
178**Conversation continuation:**
179
180When `session_id` is provided, use the provider CLI's native resume mechanism:
181```python
182session_id = config.get("session_id")
183if session_id:
184 cmd.extend(["--resume", session_id])
185```
186
187Each CLI tool manages its own session state internally. The `session_id` is
188returned from the CLI's init/finish event on the first interaction and reused
189for all subsequent continuations within the same chat.
190
191## Token Logging
192
193Token logging is handled centrally by the wrapper in `think/models.py`. Providers return usage data in their `GenerateResult`, and the wrapper calls `log_token_usage()`.
194
195**Usage dict format:**
196
197Providers normalize usage into the unified schema defined by `USAGE_KEYS` in `think/providers/shared.py`. Each provider's `_extract_usage()` is responsible for mapping API-specific field names to these canonical keys. `log_token_usage()` passes through known keys — it does **not** re-normalize.
198
199```python
200usage_dict = {
201 "input_tokens": 1500, # Required
202 "output_tokens": 500, # Required
203 "total_tokens": 2000, # Required (computed if missing)
204 "cached_tokens": 800, # Optional: cache hits
205 "reasoning_tokens": 200, # Optional: thinking/reasoning tokens
206 "cache_creation_tokens": 100, # Optional: cache creation cost
207 "requests": 1, # Optional: request count
208}
209```
210
211**Key points:**
212- Return usage in `GenerateResult["usage"]` - wrapper handles logging
213- For `run_cogitate()`, include usage in the `finish` event
214
215## Context & Routing
216
217Context strings determine provider and model selection. Providers receive already-resolved models, but understanding the system helps:
218
219**Context naming convention:**
220- Talent configs (agents/generators): `talent.{source}.{name}` where source is `system` or app name
221 - System: `talent.system.meetings`, `talent.system.default`
222 - App: `talent.entities.observer`, `talent.chat.helper`
223- Other contexts: `{module}.{feature}[.{operation}]`
224 - Examples: `observe.describe.frame`, `app.chat.title`
225
226**Dynamic discovery:** All context metadata (tier/label/group) is defined in prompt .md files via YAML frontmatter:
227- Prompt files: Listed in `PROMPT_PATHS` in `think/models.py` - add `context`, `tier`, `label`, `group` fields
228- Categories: `observe/categories/*.md` - add `tier`, `label`, `group` fields
229- System talent: `talent/*.md` - add `tier`, `label`, `group` fields in frontmatter
230- App talent: `apps/*/talent/*.md` - add `tier`, `label`, `group` fields in frontmatter
231
232All contexts are discovered at runtime. Use `get_context_registry()` to get the complete context map.
233
234**Resolution** (handled by `think/models.py` `resolve_provider(context, agent_type)`):
2351. Exact match in journal.json `providers.contexts`
2362. Glob pattern match (fnmatch) with specificity ranking
2373. Dynamic context registry (discovered prompts, categories, talent configs)
2384. Type-specific default (from `providers.generate` or `providers.cogitate`)
2395. System defaults from `TYPE_DEFAULTS`
240
241Providers don't implement routing - they receive the resolved model.
242
243## Configuration
244
245Provider configuration lives in `journal.json` under the `providers` key.
246
247**Structure:**
248```
249providers:
250 generate:
251 provider: <provider-name>
252 tier: <1|2|3>
253 backup: <provider-name>
254 cogitate:
255 provider: <provider-name>
256 tier: <1|2|3>
257 backup: <provider-name>
258 contexts:
259 <context-pattern>:
260 provider: <provider-name>
261 model: <explicit-model> # OR
262 tier: <1|2|3> # tier-based resolution
263 models:
264 <provider-name>:
265 "<tier>": "<model-override>"
266```
267
268The `generate` section controls text generation (analysis, extraction, transcription).
269The `cogitate` section controls tool-calling agents (interactive chat, daily briefings).
270Each section has its own provider, tier, and backup provider.
271
272**Tier system:**
273- 1 = PRO (most capable)
274- 2 = FLASH (balanced)
275- 3 = LITE (fast/cheap)
276
277See `tests/fixtures/journal/config/journal.json` for a complete example and `think/models.py` `PROVIDER_DEFAULTS` for tier-to-model mappings.
278
279## Testing
280
281**Required test coverage:**
282
2831. **Unit tests** in `tests/test_<provider>.py`:
284 - Mock API responses
285 - Test parameter handling
286 - Test error cases
287
2882. **Integration tests** in `tests/integration/test_<provider>_backend.py`:
289 - Live API calls (require API keys)
290 - End-to-end generation
291 - Token usage verification
292
293See existing test files for patterns:
294- `tests/test_google.py`, `tests/test_openai.py`, `tests/test_anthropic.py`
295- `tests/integration/test_google_backend.py`, etc.
296
297Run integration tests with: `make test-integration`
298
299## Batch Processing
300
301The `Batch` class in `think/batch.py` automatically works with all providers via the unified `agenerate()` API in `think/models.py`. No provider-specific batch implementation is needed - just ensure your `run_agenerate()` works correctly.
302
303## OpenAI-Compatible Providers
304
305For providers with OpenAI-compatible APIs (e.g., DigitalOcean, Azure OpenAI, local LLMs), you can leverage the OpenAI SDK with a custom base URL:
306
307```python
308from openai import OpenAI
309
310client = OpenAI(
311 api_key=os.getenv("MYPROVIDER_API_KEY"),
312 base_url="https://api.myprovider.com/v1",
313)
314```
315
316This allows reusing much of the OpenAI provider's patterns for request/response handling.
317
318The Ollama provider (`think/providers/ollama.py`) takes a different approach —
319it uses Ollama's native ``/api/chat`` endpoint directly via ``httpx`` for
320reliable thinking control. See the Ollama section below.
321
322## Ollama (Local) Provider
323
324The ``ollama`` provider connects to a local Ollama instance via the native
325``/api/chat`` endpoint (not the OpenAI-compatible endpoint, which silently
326ignores the ``think`` parameter on models like Qwen3.5). Key differences
327from cloud providers:
328
329- **No API key required.** ``validate_key()`` checks Ollama reachability
330 instead of key validity.
331- **Model prefix convention:** Models use the ``ollama-local/`` prefix
332 (e.g., ``ollama-local/qwen3.5:9b``). The prefix is stripped before
333 sending requests to the Ollama API.
334- **Thinking support:** Controlled via Ollama's ``think`` parameter,
335 mapped from ``thinking_budget``. Budget > 0 enables thinking;
336 None or 0 disables it.
337- **Cogitate via OpenCode CLI.** ``run_cogitate()`` uses the OpenCode CLI
338 (``opencode run --format json``) as a subprocess, following the same
339 CLIRunner pattern as the other providers. Requires OpenCode CLI installed
340 and configured with a user-level ``.opencode/opencode.json`` that registers
341 the local Ollama instance as a provider. Do not place this config in the
342 project root — it belongs in the user's config directory.
343- **Base URL:** Reads ``OLLAMA_BASE_URL`` env var, defaults to
344 ``http://localhost:11434``.
345
346## Checklist for New Providers
347
348**Core implementation:**
3491. Create `think/providers/<name>.py` with `__all__ = ["run_generate", "run_agenerate", "run_cogitate"]`
3502. Implement `run_generate()`, `run_agenerate()`, `run_cogitate()` following signatures above
3513. Import `GenerateResult` from `think.providers.shared` and return it from generate functions
352
353**Model constants** in `think/models.py`:
3544. Add model constants using the pattern `{PROVIDER}_{TIER}` (e.g., `DO_LLAMA_70B`, `DO_MISTRAL_NEMO`)
355 - Existing examples: `GEMINI_FLASH`, `GPT_5`, `CLAUDE_SONNET_4`
3565. Add provider tier mappings to `PROVIDER_DEFAULTS` dict
3576. Update `get_model_provider()` to detect your models by prefix (critical for cost tracking)
358
359**Registry:**
3607. Add provider to `PROVIDER_REGISTRY` in `think/providers/__init__.py`
3618. Add routing case in `think/agents.py` `main_async()` (around line 331)
362
363**Settings UI:**
3649. Add provider to `PROVIDER_METADATA` in `think/providers/__init__.py` with `label` and `env_key`
36510. Add API key UI field in `apps/settings/workspace.html`
366
367**Testing:**
36811. Create unit tests in `tests/test_<name>.py`
36912. Create integration tests in `tests/integration/test_<name>_backend.py`
37013. Add test contexts to `tests/fixtures/journal/config/journal.json`
371
372**Documentation:**
37314. Update `think/providers/__init__.py` docstring
37415. Update `docs/THINK.md` providers table
37516. Update `docs/CORTEX.md` valid provider values