docs/PROVIDERS.md at main · solpbc.org/solstone

solpbc.org / solstone
fork atom
personal memory agent
fork atom
solstone / docs / PROVIDERS.md
at main 375 lines 16 kB view raw view rendered
wrap content
Jer Miller refactor: rename solstone agents concept to talents 54min ago
0050be8c
  1# Provider Implementation Guide
  2
  3Guide for implementing new AI providers in the think module.
  4
  5For a high-level overview of the think module, see [THINK.md](THINK.md).
  6
  7## Required Exports
  8
  9Each provider module in `think/providers/` must export three functions:
 10
 11| Function | Purpose |
 12|----------|---------|
 13| `run_generate()` | Synchronous text generation, returns `GenerateResult` |
 14| `run_agenerate()` | Asynchronous text generation, returns `GenerateResult` |
 15| `run_cogitate()` | Tool-calling execution |
 16
 17See `think/providers/__init__.py` for the canonical export list and `think/providers/google.py` as a reference implementation.
 18
 19Each provider module must also define `__all__` exporting these three functions.
 20
 21## API Key Handling
 22
 23API keys are configured in the ``env`` section of ``journal/config/journal.json``. At process startup, ``setup_cli()`` loads these into ``os.environ``. Providers read keys from ``os.environ`` — no ``.env`` files or ``dotenv`` are involved.
 24
 25**Naming convention:** `{PROVIDER}_API_KEY` (e.g., `GOOGLE_API_KEY`, `OPENAI_API_KEY`)
 26
 27**Implementation pattern:**
 28```python
 29api_key = os.getenv("MYPROVIDER_API_KEY")
 30if not api_key:
 31    raise ValueError("MYPROVIDER_API_KEY not found in environment")
 32```
 33
 34**Client caching:** Providers typically cache client instances as module-level singletons to enable connection reuse:
 35```python
 36_client = None
 37
 38def _get_client():
 39    global _client
 40    if _client is None:
 41        api_key = os.getenv("MYPROVIDER_API_KEY")
 42        if not api_key:
 43            raise ValueError("MYPROVIDER_API_KEY not found in environment")
 44        _client = MyProviderClient(api_key=api_key)
 45    return _client
 46```
 47
 48**Settings app integration:** Add your provider to `PROVIDER_METADATA` in `think/providers/__init__.py` with `label` and `env_key` fields. The settings UI dynamically builds provider dropdowns from the registry. Add corresponding API key UI fields in `apps/settings/workspace.html` for owner configuration.
 49
 50## run_generate() / run_agenerate()
 51
 52These functions handle direct LLM text generation. The unified API in `think/models.py` routes requests to provider-specific implementations and handles token logging and JSON validation centrally.
 53
 54**Function signature:**
 55```python
 56from think.providers.shared import GenerateResult
 57
 58def run_generate(
 59    contents: Union[str, List[Any]],
 60    model: str,
 61    temperature: float = 0.3,
 62    max_output_tokens: int = 8192 * 2,
 63    system_instruction: Optional[str] = None,
 64    json_output: bool = False,
 65    thinking_budget: Optional[int] = None,
 66    timeout_s: Optional[float] = None,
 67    **kwargs: Any,
 68) -> GenerateResult:
 69```
 70
 71The `run_agenerate()` function has the same signature but is `async`.
 72
 73**Return type - GenerateResult:**
 74```python
 75class GenerateResult(TypedDict, total=False):
 76    text: Required[str]           # Response text
 77    usage: Optional[dict]         # Normalized usage dict
 78    finish_reason: Optional[str]  # Normalized: "stop", "max_tokens", etc.
 79    thinking: Optional[list]      # List of thinking block dicts
 80```
 81
 82**Parameter details:**
 83
 84| Parameter | Notes |
 85|-----------|-------|
 86| `contents` | String, list of strings, or list with mixed content. For vision-capable providers (currently Google only), can include PIL Image objects. Other providers stringify non-text content. |
 87| `model` | Already resolved by routing - providers don't need to handle model selection. |
 88| `max_output_tokens` | Response token limit. Note: Google internally adds `thinking_budget` to this for total budget calculation. |
 89| `system_instruction` | System prompt. Providers handle this per their API (separate field, prepended message, etc.). |
 90| `json_output` | Request JSON response. Google uses `response_mime_type`, Anthropic/OpenAI use response format or system instruction. |
 91| `thinking_budget` | Token budget for reasoning/thinking. Must be `> 0` to enable; `None` or `0` means no thinking. Google and Anthropic use this directly. OpenAI ignores `thinking_budget` — instead, reasoning effort is controlled via model name suffixes (e.g., `"gpt-5.2-high"`). Valid suffixes: `-none`, `-low`, `-medium`, `-high`, `-xhigh`. Without a suffix, `reasoning_effort` is omitted and OpenAI uses the model default. Note: `run_cogitate()` always enables thinking regardless of this parameter. |
 92| `timeout_s` | Request timeout in seconds. Convert to provider's expected format (e.g., Google uses milliseconds internally). |
 93| `**kwargs` | Absorb unknown kwargs for forward compatibility. Provider-specific options (e.g., `cached_content` for Google) pass through here. |
 94
 95**Key responsibilities:**
 96- Accept the common parameter set shown above
 97- Return `GenerateResult` with text, usage, finish_reason, and thinking
 98- Normalize `finish_reason` to standard values: `"stop"`, `"max_tokens"`, `"safety"`, etc.
 99- Handle provider-specific response parsing
100
101**Note:** Token logging and JSON validation are handled by the wrapper in `think/models.py`, not by providers.
102
103**Important:** Providers should gracefully ignore unsupported parameters rather than raising errors.
104
105## run_cogitate()
106
107Handles tool-calling execution.
108
109```python
110async def run_cogitate(
111    config: Dict[str, Any],
112    on_event: Optional[Callable[[dict], None]] = None,
113) -> str:
114```
115
116**Config dict fields** (see `think/agents.py` `main_async()` for routing logic):
117- `prompt`: User's input (required)
118- `model`: Model identifier
119- `max_tokens`: Output token limit
120- `system_instruction`: System instruction (journal.md for agents)
121- `extra_context`: Runtime context (facets, insights list, datetime) as first user message
122- `user_instruction`: Agent-specific prompt as second user message
123- `tools`: Optional list of allowed tool names
124- `use_id`, `name`: Identity for logging and tool calls
125- `session_id`: CLI session ID for conversation continuation
126- `chat_id`: Chat ID for reverse lookup from agent to chat
127
128**Event emission:**
129
130Providers must emit events via the `on_event` callback. See `think/providers/shared.py` for TypedDict definitions:
131
132| Event | When |
133|-------|------|
134| `StartEvent` | Agent run begins |
135| `ToolStartEvent` | Tool invocation starts |
136| `ToolEndEvent` | Tool invocation completes |
137| `ThinkingEvent` | Reasoning/thinking content available |
138| `FinishEvent` | Agent run completes successfully |
139| `ErrorEvent` | Error occurs |
140
141Use `JSONEventCallback` from `think/providers/shared.py` to wrap the callback and auto-add timestamps.
142
143**Finish event format:**
144
145The `finish` event must include the result text and should include usage for token tracking:
146```python
147callback.emit({
148    "event": "finish",
149    "result": final_text,
150    "usage": usage_dict,  # Same format as token logging
151    "ts": int(time.time() * 1000),
152})
153```
154
155**Error handling pattern:**
156
157All providers must follow this pattern to prevent duplicate error reporting:
158```python
159try:
160    # ... agent logic ...
161except Exception as exc:
162    callback.emit({
163        "event": "error",
164        "error": str(exc),
165        "trace": traceback.format_exc(),
166    })
167    setattr(exc, "_evented", True)  # Prevents duplicate reporting
168    raise
169```
170
171**Tool integration:**
172
173Invoke tools via `sol call <module> <command> [args...]` commands.
174Providers should route tool calls through the configured command path and
175honor `config["tools"]` allowlists when present.
176
177
178**Conversation continuation:**
179
180When `session_id` is provided, use the provider CLI's native resume mechanism:
181```python
182session_id = config.get("session_id")
183if session_id:
184    cmd.extend(["--resume", session_id])
185```
186
187Each CLI tool manages its own session state internally. The `session_id` is
188returned from the CLI's init/finish event on the first interaction and reused
189for all subsequent continuations within the same chat.
190
191## Token Logging
192
193Token logging is handled centrally by the wrapper in `think/models.py`. Providers return usage data in their `GenerateResult`, and the wrapper calls `log_token_usage()`.
194
195**Usage dict format:**
196
197Providers normalize usage into the unified schema defined by `USAGE_KEYS` in `think/providers/shared.py`. Each provider's `_extract_usage()` is responsible for mapping API-specific field names to these canonical keys. `log_token_usage()` passes through known keys — it does **not** re-normalize.
198
199```python
200usage_dict = {
201    "input_tokens": 1500,            # Required
202    "output_tokens": 500,            # Required
203    "total_tokens": 2000,            # Required (computed if missing)
204    "cached_tokens": 800,            # Optional: cache hits
205    "reasoning_tokens": 200,         # Optional: thinking/reasoning tokens
206    "cache_creation_tokens": 100,    # Optional: cache creation cost
207    "requests": 1,                   # Optional: request count
208}
209```
210
211**Key points:**
212- Return usage in `GenerateResult["usage"]` - wrapper handles logging
213- For `run_cogitate()`, include usage in the `finish` event
214
215## Context & Routing
216
217Context strings determine provider and model selection. Providers receive already-resolved models, but understanding the system helps:
218
219**Context naming convention:**
220- Talent configs (agents/generators): `talent.{source}.{name}` where source is `system` or app name
221  - System: `talent.system.meetings`, `talent.system.default`
222  - App: `talent.entities.observer`, `talent.chat.helper`
223- Other contexts: `{module}.{feature}[.{operation}]`
224  - Examples: `observe.describe.frame`, `app.chat.title`
225
226**Dynamic discovery:** All context metadata (tier/label/group) is defined in prompt .md files via YAML frontmatter:
227- Prompt files: Listed in `PROMPT_PATHS` in `think/models.py` - add `context`, `tier`, `label`, `group` fields
228- Categories: `observe/categories/*.md` - add `tier`, `label`, `group` fields
229- System talent: `talent/*.md` - add `tier`, `label`, `group` fields in frontmatter
230- App talent: `apps/*/talent/*.md` - add `tier`, `label`, `group` fields in frontmatter
231
232All contexts are discovered at runtime. Use `get_context_registry()` to get the complete context map.
233
234**Resolution** (handled by `think/models.py` `resolve_provider(context, agent_type)`):
2351. Exact match in journal.json `providers.contexts`
2362. Glob pattern match (fnmatch) with specificity ranking
2373. Dynamic context registry (discovered prompts, categories, talent configs)
2384. Type-specific default (from `providers.generate` or `providers.cogitate`)
2395. System defaults from `TYPE_DEFAULTS`
240
241Providers don't implement routing - they receive the resolved model.
242
243## Configuration
244
245Provider configuration lives in `journal.json` under the `providers` key.
246
247**Structure:**
248```
249providers:
250  generate:
251    provider: <provider-name>
252    tier: <1|2|3>
253    backup: <provider-name>
254  cogitate:
255    provider: <provider-name>
256    tier: <1|2|3>
257    backup: <provider-name>
258  contexts:
259    <context-pattern>:
260      provider: <provider-name>
261      model: <explicit-model>  # OR
262      tier: <1|2|3>            # tier-based resolution
263  models:
264    <provider-name>:
265      "<tier>": "<model-override>"
266```
267
268The `generate` section controls text generation (analysis, extraction, transcription).
269The `cogitate` section controls tool-calling agents (interactive chat, daily briefings).
270Each section has its own provider, tier, and backup provider.
271
272**Tier system:**
273- 1 = PRO (most capable)
274- 2 = FLASH (balanced)
275- 3 = LITE (fast/cheap)
276
277See `tests/fixtures/journal/config/journal.json` for a complete example and `think/models.py` `PROVIDER_DEFAULTS` for tier-to-model mappings.
278
279## Testing
280
281**Required test coverage:**
282
2831. **Unit tests** in `tests/test_<provider>.py`:
284   - Mock API responses
285   - Test parameter handling
286   - Test error cases
287
2882. **Integration tests** in `tests/integration/test_<provider>_backend.py`:
289   - Live API calls (require API keys)
290   - End-to-end generation
291   - Token usage verification
292
293See existing test files for patterns:
294- `tests/test_google.py`, `tests/test_openai.py`, `tests/test_anthropic.py`
295- `tests/integration/test_google_backend.py`, etc.
296
297Run integration tests with: `make test-integration`
298
299## Batch Processing
300
301The `Batch` class in `think/batch.py` automatically works with all providers via the unified `agenerate()` API in `think/models.py`. No provider-specific batch implementation is needed - just ensure your `run_agenerate()` works correctly.
302
303## OpenAI-Compatible Providers
304
305For providers with OpenAI-compatible APIs (e.g., DigitalOcean, Azure OpenAI, local LLMs), you can leverage the OpenAI SDK with a custom base URL:
306
307```python
308from openai import OpenAI
309
310client = OpenAI(
311    api_key=os.getenv("MYPROVIDER_API_KEY"),
312    base_url="https://api.myprovider.com/v1",
313)
314```
315
316This allows reusing much of the OpenAI provider's patterns for request/response handling.
317
318The Ollama provider (`think/providers/ollama.py`) takes a different approach —
319it uses Ollama's native ``/api/chat`` endpoint directly via ``httpx`` for
320reliable thinking control. See the Ollama section below.
321
322## Ollama (Local) Provider
323
324The ``ollama`` provider connects to a local Ollama instance via the native
325``/api/chat`` endpoint (not the OpenAI-compatible endpoint, which silently
326ignores the ``think`` parameter on models like Qwen3.5). Key differences
327from cloud providers:
328
329- **No API key required.** ``validate_key()`` checks Ollama reachability
330  instead of key validity.
331- **Model prefix convention:** Models use the ``ollama-local/`` prefix
332  (e.g., ``ollama-local/qwen3.5:9b``). The prefix is stripped before
333  sending requests to the Ollama API.
334- **Thinking support:** Controlled via Ollama's ``think`` parameter,
335  mapped from ``thinking_budget``. Budget > 0 enables thinking;
336  None or 0 disables it.
337- **Cogitate via OpenCode CLI.** ``run_cogitate()`` uses the OpenCode CLI
338  (``opencode run --format json``) as a subprocess, following the same
339  CLIRunner pattern as the other providers. Requires OpenCode CLI installed
340  and configured with a user-level ``.opencode/opencode.json`` that registers
341  the local Ollama instance as a provider. Do not place this config in the
342  project root — it belongs in the user's config directory.
343- **Base URL:** Reads ``OLLAMA_BASE_URL`` env var, defaults to
344  ``http://localhost:11434``.
345
346## Checklist for New Providers
347
348**Core implementation:**
3491. Create `think/providers/<name>.py` with `__all__ = ["run_generate", "run_agenerate", "run_cogitate"]`
3502. Implement `run_generate()`, `run_agenerate()`, `run_cogitate()` following signatures above
3513. Import `GenerateResult` from `think.providers.shared` and return it from generate functions
352
353**Model constants** in `think/models.py`:
3544. Add model constants using the pattern `{PROVIDER}_{TIER}` (e.g., `DO_LLAMA_70B`, `DO_MISTRAL_NEMO`)
355   - Existing examples: `GEMINI_FLASH`, `GPT_5`, `CLAUDE_SONNET_4`
3565. Add provider tier mappings to `PROVIDER_DEFAULTS` dict
3576. Update `get_model_provider()` to detect your models by prefix (critical for cost tracking)
358
359**Registry:**
3607. Add provider to `PROVIDER_REGISTRY` in `think/providers/__init__.py`
3618. Add routing case in `think/agents.py` `main_async()` (around line 331)
362
363**Settings UI:**
3649. Add provider to `PROVIDER_METADATA` in `think/providers/__init__.py` with `label` and `env_key`
36510. Add API key UI field in `apps/settings/workspace.html`
366
367**Testing:**
36811. Create unit tests in `tests/test_<name>.py`
36912. Create integration tests in `tests/integration/test_<name>_backend.py`
37013. Add test contexts to `tests/fixtures/journal/config/journal.json`
371
372**Documentation:**
37314. Update `think/providers/__init__.py` docstring
37415. Update `docs/THINK.md` providers table
37516. Update `docs/CORTEX.md` valid provider values