···11# Twisted
2233-Twisted is a monorepo for a Tangled mobile client and the supporting Tap-backed indexing API.
33+Twisted is a monorepo for a Tangled mobile client and a Tap-backed indexing API.
4455## Projects
6677-- `apps/twisted`: Ionic Vue client for browsing Tangled repos, profiles, issues, PRs, and indexed search results
88-- `packages/api`: Go service that consumes Tangled records through Tap, fills gaps in the public Tangled API, and serves search
99-- `docs`: top-level specs and plans, split by project under `docs/app` and `docs/api`
77+- `apps/twisted`: Ionic Vue client for browsing Tangled repos, profiles, issues, PRs, and search
88+- `packages/api`: Go service for ingest, search, read-through indexing, and activity cache
99+- `docs`: project docs, ADRs, and operational references
10101111## Architecture
12121313-The app still uses Tangled's public knot and PDS APIs for canonical repo and profile data. The API project adds two complementary capabilities:
1313+The app still reads canonical repo and profile data from Tangled and AT Protocol APIs.
1414+The API adds:
14151515-1. Global search over indexed Tangled content
1616-2. Index-backed summaries for data that is hard to derive from the public API alone, such as followers
1616+1. network-wide search over indexed Tangled content
1717+2. index-backed summaries that are hard to derive from public APIs alone
17181818-That keeps direct browsing honest while giving the client one place to ask for cross-network discovery and graph augmentation.
1919+The backend now targets PostgreSQL for both local and remote deployments.
19202021## Development
21222222-Use the top-level [`justfile`](justfile) for common workflows (`just --list` to view)
2323-2424-Use `apps/twisted/.env.local` for machine-local overrides such as a localhost API or OAuth callback.
2525-2626-## Run Locally
2727-2828-Install dependencies once from the repo root:
2323+Install JS dependencies once:
29243025```bash
3126pnpm install
3227```
33283434-Start the Ionic/Vite app:
2929+Default local database URL:
35303631```bash
3737-pnpm dev # or: just dev
3232+postgresql://localhost/${USER}_dev?sslmode=disable
3833```
39344040-That serves the client from `apps/twisted` with Vite.
4141-4242-To run the Go API locally for routine experimentation, no Turso credentials are required.
3535+That matches the Postgres.app-style local workflow and also matches the repo's
3636+`docker-compose.dev.yaml` if you want disposable local Postgres and Tap
3737+containers instead.
43384444-Start the API in local file mode:
3939+Start the local database:
45404641```bash
4747-pnpm api:run:api # or: just api-dev
4242+just db-up
4843```
49445050-This serves the API and search site on `http://localhost:8080` using
5151-`packages/api/twister-dev.db`.
4545+Run the mobile app:
52465353-To run the API against remote Turso instead:
4747+```bash
4848+pnpm dev
4949+```
5050+5151+Run the API against local Postgres:
54525553```bash
5656-just api-dev remote
5454+just api-dev
5755```
58565959-Run the API smoke checks from the repo root:
5757+Run the indexer against local Postgres:
60586159```bash
6262-uv run --project packages/scripts/api twister-api-smoke
6060+just api-run-indexer
6361```
64626565-To verify admin endpoints as well, ensure `ADMIN_AUTH_TOKEN` is present in the
6666-environment before running the smoke script.
6363+Use `just api-dev sqlite` or `just api-run-indexer sqlite` only for the
6464+temporary SQLite rollback path.
67656868-To run the indexer in local file mode as well:
6666+If you want the app to call the local API, put this in `apps/twisted/.env.local`:
69677068```bash
7171-pnpm api:run:indexer # or: just api-run-indexer
6969+VITE_TWISTER_API_BASE_URL=http://localhost:8080
7270```
73717474-To run the indexer against remote Turso, `packages/api/.env` needs:
7575-7676-- `TAP_URL`
7777-- `TAP_AUTH_PASSWORD`
7878-- `INDEXED_COLLECTIONS`
7272+Run the API smoke checks from the repo root:
79738074```bash
8181-just api-run-indexer remote
7575+uv run --project packages/scripts/api twister-api-smoke
8276```
83778484-Typical local setup is three terminals:
7878+If `ADMIN_AUTH_TOKEN` is present, the smoke script also checks admin status.
85798686-1. `pnpm dev`
8787-2. `pnpm api:run:api`
8888-3. `pnpm api:run:indexer`
8080+## Deployment
89819090-If you want the app to call the local API, put this in `apps/twisted/.env.local`:
8282+Production deployment now uses Coolify plus a separate Coolify-managed
8383+PostgreSQL instance. The backend services are defined in
8484+`docker-compose.prod.yaml`.
91859292-```bash
9393-VITE_TWISTER_API_BASE_URL=http://localhost:8080
9494-```
9595-9696-Dev builds keep the current OAuth flow available. Production builds are read-only
9797-and hide auth entry points for now.
8686+See [`docs/reference/deployment-walkthrough.md`](docs/reference/deployment-walkthrough.md)
8787+for the full setup, bootstrap, backup, and cutover flow.
98889989## Attributions
10090101101-This project relies heavily on the work of the [Tangled team](https://tangled.org/tangled.org) (duh)
102102-and the infrastructure made available by [microcosm](https://microcosm.blue), specifically
103103-Lightrail and Constellation.
9191+This project relies heavily on the work of the
9292+[Tangled team](https://tangled.org/tangled.org) and the infrastructure made
9393+available by [microcosm](https://microcosm.blue), especially Lightrail and
9494+Constellation.
···2233## Reference
4455-Completed work — architecture, APIs, and data models as built.
66-77-- [`reference/api.md`](reference/api.md) — Go search API service
55+- [`reference/api.md`](reference/api.md) — API runtime, config, and data model
66+- [`reference/deployment-walkthrough.md`](reference/deployment-walkthrough.md) — Coolify + Postgres deploy guide
77+- [`reference/metrics.md`](reference/metrics.md) — rollout checks for API, indexer, and Postgres
88+- [`reference/resync.md`](reference/resync.md) — backfill and rebuild recovery playbook
89- [`reference/app.md`](reference/app.md) — Ionic Vue mobile app
99-- [`reference/deployment-walkthrough.md`](reference/deployment-walkthrough.md) — Railway deployment guide
1010- [`reference/lexicons.md`](reference/lexicons.md) — Tangled AT Protocol record types
1111-- [`reference/metrics.md`](reference/metrics.md) — Railway and Turso usage checks after deploy
1212-- [`reference/resync.md`](reference/resync.md) — Backfill and repo-resync recovery playbook
13111412## Specs
15131616-Forward-looking designs for remaining work.
1717-1818-- [`specs/data-sources.md`](specs/data-sources.md) — Constellation, Tangled XRPC, Tap, AT Protocol, Bluesky OAuth
1919-- [`specs/search.md`](specs/search.md) — Search stabilization, indexing, activity cache, and later ranking work
2020-- [`specs/app-features.md`](specs/app-features.md) — Remaining mobile app features
1414+- [`specs/data-sources.md`](specs/data-sources.md) — upstream systems and API sources
1515+- [`specs/search.md`](specs/search.md) — current search/indexing direction
1616+- [`specs/app-features.md`](specs/app-features.md) — remaining mobile app features
21172222-## ADR Research
1818+## ADRs
23192424-Focused option analysis for pending architectural decisions.
2525-2626-- [`adr/pg.md`](adr/pg.md) — PostgreSQL as a production backend option for Twister search
2727-- [`adr/turso.md`](adr/turso.md) — Turso/libSQL as a production backend option for Twister search
2828-- [`adr/storage.md`](adr/storage.md) — Accepted production storage decision for Twister search
2020+- [`adr/storage.md`](adr/storage.md) — accepted Postgres + Coolify storage decision
2121+- [`adr/pg.md`](adr/pg.md) — PostgreSQL research and tradeoffs
2222+- [`adr/turso.md`](adr/turso.md) — superseded Turso research kept for history
29233024## Roadmap
31253232-- [`roadmap.md`](roadmap.md) — All remaining milestones and tasks
2626+- [`roadmap.md`](roadmap.md) — current milestone list
+42-166
docs/adr/storage.md
···11---
22-title: ADR - Choose Turso For Production Search Storage
33-updated: 2026-03-25
22+title: ADR - Choose PostgreSQL And Coolify For Search Storage
33+updated: 2026-03-26
44status: accepted
55---
6677## Decision
8899-Twister will use Turso as the production database backend for search and indexing.
1010-1111-This decision is based on current project constraints, not on a claim that Turso is universally superior to PostgreSQL. The goal is to ship a production-capable search service with the lowest migration cost from the current codebase while keeping room to revisit the decision later if the workload changes materially.
1212-1313-## Context
1414-1515-Twister's current search hardening work has two separate concerns:
1616-1717-1. make local experimentation cheaper and less messy
1818-2. choose a production backend deliberately instead of letting the experimentation setup turn into production by accident
1919-2020-The codebase already relies on SQLite/libSQL-style behavior:
2121-2222-- SQLite FTS5 search
2323-- SQLite-oriented migrations
2424-- `database/sql` access via `github.com/tursodatabase/libsql-client-go`
2525-- local experimentation through `file:` databases
2626-2727-The production candidates researched were:
2828-2929-- PostgreSQL
3030-- Turso remote/libSQL
3131-- Turso embedded-replica style deployment
3232-3333-The supporting research is recorded in:
3434-3535-- [PostgreSQL research](pg.md)
3636-- [Turso research](turso.md)
3737-3838-## Why Turso
3939-4040-### 1. Lowest Migration Cost
4141-4242-Turso preserves the current SQLite/libSQL query model and avoids a full rewrite of:
4343-4444-- search queries
4545-- migration files
4646-- ranking behavior
4747-- snippet generation behavior
4848-- search regression expectations
4949-5050-PostgreSQL remains a credible long-term option, but adopting it now would force a larger rewrite at exactly the point where search hardening should focus on ingestion correctness, smoke tests, read-through indexing, and the activity cache.
5151-5252-### 2. Best Match For Current Priorities
5353-5454-The immediate work is not to invent a new search architecture. It is to stabilize:
5555-5656-- Tap ingestion
5757-- read-through indexing
5858-- JetStream activity caching
5959-- local experimentation workflows
6060-- end-to-end smoke testing
6161-6262-Turso lets the project do that without changing database families midstream.
6363-6464-### 3. Clear Path From Experimentation To Production
6565-6666-The local `file:` workflow remains the right choice for development and experimentation. For production, the chosen backend family is still Turso, which gives the project a cleaner transition than moving from local SQLite semantics to PostgreSQL semantics all at once.
6767-6868-### 4. Embedded Replicas Stay Optional
6969-7070-This ADR does not require Turso embedded replicas immediately.
7171-7272-The production choice is Turso as the backend family. The initial production shape can be plain remote libSQL if that is the least risky deployment path. Embedded replicas remain a future optimization if the Go driver and build constraints become acceptable.
7373-7474-## Why Not PostgreSQL Right Now
7575-7676-PostgreSQL was the strongest long-term alternative, but it loses on near-term fit.
7777-7878-Reasons not to choose it now:
99+Twisted will use PostgreSQL as the primary database backend for search,
1010+indexing, queue state, and activity cache. The production deploy target is a
1111+Coolify application for `api`, `indexer`, and `tap` plus a separate
1212+Coolify-managed PostgreSQL instance.
79138080-- it requires rewriting the current FTS5-based search implementation
8181-- it changes search behavior during a hardening phase where behavior stability matters
8282-- it increases migration scope before the ingestion model itself is stabilized
8383-- it solves an architectural future that the project has not yet fully reached
1414+## Why
84158585-If Twister later becomes a larger multi-process, write-heavy service with operational requirements that outgrow Turso, PostgreSQL can be reconsidered with better evidence.
1616+- PostgreSQL is the better fit for long-running multi-service deployment.
1717+- Coolify gives the project a straightforward Git-to-deploy path with built-in
1818+ Traefik and a managed database resource.
1919+- The current service shape already wants two long-lived processes writing to
2020+ one shared database.
2121+- A local PostgreSQL workflow keeps development closer to production than the
2222+ old Turso split.
86238724## Consequences
88258926### Positive
90279191-- minimum code churn from the current search implementation
9292-- fastest path to production-capable search hardening
9393-- preserves current SQLite FTS behavior as the baseline
9494-- keeps experimentation and production closer together conceptually
2828+- one mainstream database for local and remote environments
2929+- simpler production backups and restore story
3030+- easier operational model for `api` and `indexer`
3131+- no dependency on Turso-specific SQLite extension behavior
95329633### Negative
97349898-- production remains in the SQLite/libSQL family, which may be less conventional than PostgreSQL for some operational teams
9999-- embedded replicas are not a drop-in next step in the current Go setup
100100-- a later move to PostgreSQL would still be a meaningful migration if Twister grows past Turso's sweet spot
101101-102102-## Production Shape
103103-104104-The production recommendation is:
105105-106106-1. keep local `file:` databases for experimentation and development
107107-2. use Turso remote/libSQL as the default production target
108108-3. evaluate embedded replicas only after the main search-hardening work is stable
109109-110110-This avoids coupling the production decision to a premature embedded-replica rollout.
111111-112112-## Follow-Up Work
113113-114114-- define the migration path from the experimental local DB to the production Turso database
115115-- document backup and restore procedures for both local experimentation and production
116116-- keep PostgreSQL as a revisit option if production requirements change
117117-- explicitly evaluate embedded replicas later against Go driver and build constraints
118118-119119-## Experimental Local DB Procedures
120120-121121-The experimental local DB is a workflow aid, not a production artifact.
122122-123123-Operational rules:
124124-125125-1. Keep the database file out of git and treat it as disposable.
126126-2. Use stop-and-copy backups for anything worth preserving.
127127-3. Prefer restore-or-rebuild over repair if the DB becomes suspect.
128128-4. Allow the file to grow during active experiments, then compact or delete it afterward.
129129-130130-The concrete local backup, restore, and disk-growth procedures live in
131131-[docs/reference/api.md](/Users/owais/Projects/Twisted/docs/reference/api.md).
132132-133133-## Migration Path To Production Turso
134134-135135-The migration path is intentionally code-first, not file-first.
136136-137137-Do not promote `twister-dev.db` directly into production. The experimental DB proves schema, queries, and workflow assumptions, but the production dataset should be rebuilt from authoritative upstream sources.
138138-139139-### Phase 1: Stabilize Local Behavior
140140-141141-- finalize schema changes in embedded migrations
142142-- validate search behavior locally
143143-- validate smoke tests against the local workflow
144144-145145-Exit condition:
146146-147147-- a fresh local database can be created from migrations and pass the smoke-test baseline
148148-149149-### Phase 2: Prepare Turso Production Target
150150-151151-- provision the production Turso database
152152-- enable the required SQLite/libSQL features used by Twister
153153-- configure production credentials and environment variables
154154-- verify migrations apply cleanly to an empty production-shaped database
155155-156156-Exit condition:
157157-158158-- Twister can start against an empty Turso database and complete migrations successfully
159159-160160-### Phase 3: Rebuild The Dataset From Sources Of Truth
161161-162162-- start the indexer against Turso
163163-- use Tap backfill and repo-resync paths to rebuild the searchable corpus
164164-- let read-through indexing fill misses during verification
165165-- build the JetStream activity cache from a recent timestamp cursor rather than from copied local state
166166-167167-Exit condition:
3535+- the search layer must move off SQLite FTS5
3636+- ranking and snippet behavior will change
3737+- SQLite remains only as a temporary rollback path during migration
16838169169-- the production Turso dataset is populated from Tap, repo recovery paths, and API-triggered indexing rather than from a copied experimental DB file
3939+## Search Shape
17040171171-### Phase 4: Verify And Cut Over
4141+Keyword search will use PostgreSQL full-text search:
17242173173-- run the API smoke scripts against the Turso-backed environment
174174-- confirm health, search, document fetches, indexing, and activity cache behavior
175175-- switch app traffic only after the smoke-test baseline passes
4343+- weighted `tsvector`
4444+- `websearch_to_tsquery('simple', ...)`
4545+- `ts_rank_cd`
4646+- `ts_headline`
17647177177-Exit condition:
4848+The HTTP response shape stays stable, but exact scores and snippets are not
4949+expected to match the previous FTS5 implementation.
17850179179-- production traffic points at the Turso-backed deployment and the local experimental DB is no longer part of the serving path
5151+## Migration Plan
18052181181-## Explicit Non-Goal For Migration
5353+1. add PostgreSQL connection/config support and local defaults
5454+2. add a primary PostgreSQL migration set
5555+3. move search and store implementations to PostgreSQL
5656+4. deploy `api`, `indexer`, and `tap` from `docker-compose.prod.yaml`
5757+5. rebuild data through `backfill`, `enrich`, and `reindex`
5858+6. cut traffic over only after smoke checks pass
18259183183-The migration plan does not include a direct file copy from local SQLite to production Turso as the default rollout path. If a one-off import becomes necessary later, it should be treated as a separate migration task with its own validation steps.
6060+## Explicit Non-Goal
18461185185-## Revisit Conditions
6262+The default migration does not include a Turso-to-PostgreSQL data import. The
6363+serving dataset should be rebuilt from authoritative upstream sources.
18664187187-Re-open this ADR if any of the following become true:
6565+## Related Records
18866189189-- Twister needs multiple high-write production workers across separate hosts
190190-- operational requirements start favoring standard PostgreSQL tooling over libSQL continuity
191191-- embedded replicas prove impractical in the Go runtime the project wants to keep
192192-- semantic and hybrid search work introduces storage requirements that fit PostgreSQL materially better
6767+- `docs/adr/pg.md` remains the background research for this decision
6868+- `docs/adr/turso.md` is retained as superseded historical context
+25-106
docs/adr/turso.md
···11---
22title: ADR Research - Turso For Production Search
33-updated: 2026-03-25
44-status: research
33+updated: 2026-03-26
44+status: superseded
55---
6677-## Summary
77+## Status
8899-Turso is the lowest-migration production candidate because Twister already uses libSQL/SQLite-style storage and query patterns. It preserves the current mental model and minimizes rewrite cost.
99+This research record is kept for history. It no longer describes the active
1010+deployment direction.
10111111-The open question is not whether Turso can work, but which Turso mode fits production:
1212-1313-- remote libSQL primary
1414-- local experimentation via plain `file:` SQLite
1515-- Turso embedded-replica style local-read, remote-sync patterns
1616-1717-## Why Consider It
1818-1919-Twister already depends on:
2020-2121-- `github.com/tursodatabase/libsql-client-go`
2222-- SQLite-style migrations
2323-- SQLite FTS5 behavior
2424-2525-That makes Turso the shortest path from current code to a production-capable deployment.
2626-2727-## Fit For Twister
2828-2929-### Strengths
3030-3131-#### Lowest Rewrite Cost
3232-3333-Staying with Turso/libSQL keeps Twister in the same family of database semantics it already uses. Compared with PostgreSQL, this means less work in:
1212+## Historical Summary
34133535-- search query rewrites
3636-- migration rewrites
3737-- ranking behavior drift
3838-- compatibility testing
1414+Turso was originally attractive because Twisted already used SQLite-style
1515+queries, migrations, and FTS5 search behavior. It offered the shortest path
1616+from local file-backed development to a remotely hosted production database.
39174040-#### Good Match For Local Experimentation
1818+## Why It Was Superseded
41194242-The current hardening plan already relies on local `file:` workflows to reduce the messiness and cost of experimentation. Turso and libSQL naturally support this style of development.
2020+The project has now chosen PostgreSQL plus Coolify instead.
43214444-#### Embedded-Replica Model Is Relevant
2222+Main reasons:
45234646-Turso's embedded replica story is directly relevant to Twister's workload because it allows:
2424+- Twisted now runs as multiple long-lived services against one shared database
2525+- the project wants standard production operations and restore tooling
2626+- local and production environments should converge on one database family
2727+- the cost of carrying Turso-specific behavior forward outweighed the migration
47284848-- local reads from a file-backed database
4949-- sync to a remote primary
5050-- read-your-writes behavior for the initiating replica
5151-- periodic background sync
2929+## What Still Matters From This Research
52305353-On paper, this is a strong match for a search service that wants cheap local reads while keeping a remote production database.
3131+- rebuilding the dataset from upstream sources remains the safer default than
3232+ promoting an experimental local database
3333+- embedded-replica ideas were interesting but were not a fit for the Go stack
3434+ the project kept
3535+- search behavior changes must be treated as product-visible, not just as a
3636+ storage swap
54375555-## Costs And Risks
3838+## Current Source Of Truth
56395757-### Remote Turso Alone Does Not Solve The Current Pain
5858-5959-The current problem statement came from burning reads and writes during experimentation. A plain remote Turso deployment keeps the same basic cost surface, even if production operations are cleaner than ad hoc local experiments.
6060-6161-### Embedded Replicas Have Important Caveats
6262-6363-Turso's embedded replicas are promising, but the docs call out constraints that matter for Twister:
6464-6565-- they require a real filesystem
6666-- they are not suitable for serverless environments without disk
6767-- local DB files should not be opened while syncing
6868-- sync behavior can amplify writes because replication is frame-based
6969-7070-This means the operational model has to be chosen carefully. It is not a free "best of both worlds" switch.
7171-7272-### Current Go Stack Makes The Best Turso Story Harder
7373-7474-This is the biggest repo-specific caveat.
7575-7676-The Turso Go quickstart notes that `github.com/tursodatabase/libsql-client-go/libsql` does not support embedded replicas. Twister currently uses that library for remote libSQL access, while local file mode is handled separately with `modernc.org/sqlite`.
7777-7878-Twister also currently builds with `CGO_ENABLED=0` in `packages/api/justfile`.
7979-8080-That means the cleanest embedded-replica path may require:
8181-8282-- changing drivers
8383-- reconsidering the pure-Go build constraint
8484-- accepting CGO in production builds, or waiting for a better pure-Go story
8585-8686-So while Turso embedded replicas are attractive in principle, they are not a drop-in upgrade for the current codebase.
8787-8888-## Repo-Specific Implications
8989-9090-- Remote Turso/libSQL is the easiest production continuation of the current code.
9191-- Local `file:` mode is already useful for stabilizing experimentation.
9292-- Embedded replicas are strategically interesting but would likely force deeper driver and build changes than the current roadmap implies.
9393-9494-## When Turso Is The Better Choice
9595-9696-Choose Turso if most of the following are true:
9797-9898-- minimizing migration cost is the top priority
9999-- preserving current SQLite FTS behavior matters
100100-- production can tolerate a simpler deployment model, especially early on
101101-- We want to keep search and experimentation close to the current implementation
102102-103103-## When Turso Needs Extra Caution
104104-105105-Be careful with Turso if most of the following are true:
106106-107107-- Twister needs multiple production writers across separate hosts
108108-- the system must avoid CGO and keep pure-Go builds
109109-- We expect embedded replicas to be a near-term production feature
110110-- operational simplicity matters more than minimizing query rewrites
111111-112112-## Recommendation
113113-114114-Turso is the best near-term production candidate if the goal is minimum code churn and continuity with the current search stack.
115115-116116-Remote Turso is the easiest short path. Embedded replicas are the most interesting medium-term Turso option, but they should be treated as additional engineering work rather than an assumption, especially given the current Go driver and build setup.
117117-118118-## Sources
119119-120120-- [Turso Go quickstart](https://docs.turso.tech/sdk/go/quickstart#local-only)
121121-- [Turso embedded replicas docs](https://docs.turso.tech/features/embedded-replicas)
4040+See `docs/adr/storage.md` for the accepted storage decision.
+71-186
docs/reference/api.md
···11---
22title: API Service Reference
33-updated: 2026-03-24
33+updated: 2026-03-26
44---
5566-Twister is a Go service that indexes Tangled content and serves a search API. It connects to the AT Protocol ecosystem via Tap (firehose consumer), XRPC (direct record lookups), and Constellation (backlink queries), storing indexed data in Turso/libSQL with FTS5 full-text search.
66+Twisted is a Go service that indexes Tangled content, serves search, and caches
77+recent activity. It uses PostgreSQL for the primary runtime and retains a
88+temporary local SQLite fallback behind `--local`.
7988-## Architecture
1010+## Runtime Modes
9111010-The service is a single Go binary with multiple subcommands, each running a different runtime mode. All modes share the same database and configuration layer.
1111-1212-**Runtime modes:**
1313-1414-| Command | Purpose |
1515-| ------------- | ------------------------------------------------------------------- |
1616-| `api` (serve) | HTTP search API server |
1717-| `indexer` | Consumes Tap firehose events, normalizes and indexes records |
1818-| `backfill` | Discovers users from seed files, registers them with Tap |
1919-| `enrich` | Backfills missing metadata (repo names, handles, web URLs) via XRPC |
2020-| `reindex` | Re-syncs all documents into the FTS index |
2121-| `healthcheck` | One-shot liveness probe for container orchestration |
2222-2323-The `embed-worker` and `reembed` commands exist as stubs for the upcoming semantic search pipeline (Nomic Embed Text v1.5 deployed via Railway template).
2424-2525-All commands accept a `--local` flag that switches to a local SQLite file and text-format logging for development.
1212+| Command | Purpose |
1313+| --- | --- |
1414+| `api` | HTTP API server |
1515+| `indexer` | Tap consumer and index writer |
1616+| `backfill` | register repos with Tap |
1717+| `enrich` | fill missing repo names, handles, and web URLs |
1818+| `reindex` | re-upsert documents and finalize the search index |
1919+| `healthcheck` | one-shot config and process probe |
26202721## HTTP API
28222929-The API server binds to `:8080` by default (configurable via `HTTP_BIND_ADDR`). CORS is open (`*` origin, GET/OPTIONS).
2323+- `GET /healthz` — liveness probe
2424+- `GET /readyz` — readiness probe, checks database reachability
2525+- `GET /search` — keyword search
2626+- `GET /documents/{id}` — fetch one indexed document
2727+- `GET /admin/status` — cursor and queue state when admin routes are enabled
30283131-### Search
2929+The API also serves the built-in search/docs site from `/` and `/docs*`.
32303333-**`GET /search`** — Main search endpoint. Routes to keyword, semantic, or hybrid based on `mode` parameter.
3131+## Search
34323535-**`GET /search/keyword`** — Full-text search via FTS5 with BM25 scoring.
3333+Keyword search is implemented with PostgreSQL full-text search.
36343737-Parameters:
3535+- weighted fields: title, author handle, repo name, summary, body, tags
3636+- query parser: `websearch_to_tsquery('simple', ...)`
3737+- ranking: `ts_rank_cd`
3838+- snippets: `ts_headline`
38393939-- `q` (required) — Query string
4040-- `limit` (1–100, default 20) — Results per page
4141-- `offset` (default 0) — Pagination offset
4242-- `collection` — Filter by AT Protocol collection NSID
4343-- `type` — Filter by record type (repo, issue, pull, profile, string)
4444-- `author` — Filter by handle or DID
4545-- `repo` — Filter by repo name or DID
4646-- `language` — Filter by primary language
4747-- `from`, `to` — Date range (ISO 8601)
4848-- `state` — Filter issues/PRs by state (open, closed, merged)
4949-- `mode` — Search mode (keyword, semantic, hybrid)
4040+Response shape stays the same as the previous FTS5 API. Ranking and snippet
4141+details are allowed to differ from the SQLite-era implementation.
50425151-Response includes query metadata, total count, and an array of results each containing: ID, collection, record type, title, summary, body snippet (with `<mark>` highlights), score, repo name, author handle, DID, AT-URI, web URL, and timestamps.
4343+## Database
52445353-**`GET /documents/{id}`** — Fetch a single document by stable ID.
4545+Primary backend: PostgreSQL.
54465555-### Health
5656-5757-- **`GET /healthz`** — Liveness probe, always 200
5858-- **`GET /readyz`** — Readiness probe, pings database
4747+Main tables:
59486060-### Admin
4949+- `documents`
5050+- `sync_state`
5151+- `identity_handles`
5252+- `record_state`
5353+- `indexing_jobs`
5454+- `indexing_audit`
5555+- `jetstream_events`
61566262-When `ENABLE_ADMIN_ENDPOINTS=true` with a configured `ADMIN_AUTH_TOKEN`:
5757+`documents` stores a generated weighted `tsvector` column plus a GIN index for
5858+keyword search.
63596464-- **`GET /admin/status`** — Tap cursor, JetStream cursor, document count, and
6565- read-through queue status
6666-- **`GET /admin/indexing/jobs`** — List queue rows, filtered by `status`,
6767- `source`, or `document`
6868-- **`GET /admin/indexing/audit`** — List append-only audit rows, filtered by
6969- `source`, `decision`, or `document`
7070-- **`POST /admin/indexing/enqueue`** — Queue a single record by explicit body
7171-- **`POST /admin/reindex`** — Trigger FTS re-sync
6060+## Configuration
72617373-### Smoke Checks
6262+Primary env vars:
74637575-Smoke checks for the API surface live in `packages/scripts/api/`.
6464+- `DATABASE_URL`
6565+- `HTTP_BIND_ADDR`
6666+- `INDEXER_HEALTH_ADDR`
6767+- `TAP_URL`
6868+- `TAP_AUTH_PASSWORD`
6969+- `INDEXED_COLLECTIONS`
7070+- `READ_THROUGH_MODE`
7171+- `READ_THROUGH_COLLECTIONS`
7272+- `READ_THROUGH_MAX_ATTEMPTS`
7373+- `ENABLE_ADMIN_ENDPOINTS`
7474+- `ADMIN_AUTH_TOKEN`
76757777-From the repo root:
7676+Default local database URL:
78777978```sh
8080-uv run --project packages/scripts/api twister-api-smoke
7979+postgresql://localhost/${USER}_dev?sslmode=disable
8180```
82818383-If `ADMIN_AUTH_TOKEN` is present in the environment, the smoke script can also
8484-verify `GET /admin/status`.
8585-8686-### Static Content
8282+`--local` is deprecated and switches to the legacy SQLite fallback at
8383+`packages/api/twister-dev.db`.
87848888-The API also serves a search site with live search and API documentation at `/` and `/docs*`, built with Alpine.js (no build step, embedded in `internal/view/`).
8585+## Local Operation
89869090-## Database
8787+Start local Postgres with the repo compose file:
91889292-Turso (libSQL) with the following tables:
8989+```sh
9090+just db-up
9191+just api-dev
9292+just api-run-indexer
9393+```
93949494-**documents** — Core search index. Each record gets a stable ID of `did|collection|rkey`. Stores title, body, summary, metadata (repo name, author handle, web URL, language, tags), and timestamps. Soft-deleted via `deleted_at`.
9595+That dev compose file also runs Tap locally at `ws://localhost:2480/channel`.
95969696-**documents_fts** — FTS5 virtual table for full-text search over title, body, summary, repo name, author handle, and tags. Uses `unicode61` tokenizer with tuned BM25 weights (title weighted highest at 2.5, then author handle at 2.0, summary at 1.5).
9797-9898-**sync_state** — Cursor tracking for the Tap consumer. Stores consumer name, current cursor, high water mark, and last update time. Enables crash-safe resume.
9999-100100-**identity_handles** — DID-to-handle cache. Updated from Tap identity events and XRPC lookups.
101101-102102-**record_state** — Issue and PR state cache (open/closed/merged). Keyed by subject AT-URI.
103103-104104-**indexing_jobs** — Durable read-through/admin queue with status, lease owner,
105105-lease expiry, retry counters, and terminal states (`failed`, `dead_letter`).
106106-107107-**indexing_audit** — Append-only record of enqueue decisions, retries, skips,
108108-completions, and dead letters.
109109-110110-**document_embeddings** — Vector storage (768-dim F32_BLOB with DiskANN cosine index). Schema ready but not yet populated.
111111-112112-**embedding_jobs** — Async embedding job queue. Schema ready but worker not yet active.
113113-114114-## Indexing Pipeline
115115-116116-The indexer connects to Tap via WebSocket, consuming AT Protocol record events in real-time. For each event:
117117-118118-1. Filter against the configured collection allowlist (supports wildcards like `sh.tangled.*`)
119119-2. Route to the appropriate normalizer based on collection
120120-3. Normalize into a document (extract title, body, summary, metadata)
121121-4. Optionally enrich via XRPC (resolve author handle, repo name, web URL)
122122-5. Upsert into the database (auto-syncs FTS)
123123-6. Persist the Tap cursor and then acknowledge the event
124124-125125-The indexer resumes from its last cursor on restart and replays idempotently.
126126-It logs status every 30 seconds and uses exponential backoff (1s–5s) for
127127-transient failures.
128128-129129-Read-through indexing is `missing` by default. Only allowed collections can be
130130-queued, detail reads queue single focal records, and bulk list handlers no
131131-longer enqueue whole collections.
132132-133133-## Record Normalizers
134134-135135-Each AT Protocol collection has a dedicated normalizer that extracts searchable content:
136136-137137-| Collection | Record Type | Searchable | Content |
138138-| ------------------------------- | ------------- | ------------------------ | --------------------------- |
139139-| `sh.tangled.repo` | repo | Yes (if named) | Name, description, topics |
140140-| `sh.tangled.repo.issue` | issue | Yes | Title, body, repo reference |
141141-| `sh.tangled.repo.pull` | pull | Yes | Title, body, target branch |
142142-| `sh.tangled.repo.issue.comment` | issue_comment | Yes (if has body) | Comment body |
143143-| `sh.tangled.repo.pull.comment` | pull_comment | Yes (if has body) | Comment body |
144144-| `sh.tangled.string` | string | Yes (if has content) | Filename, contents |
145145-| `sh.tangled.actor.profile` | profile | Yes (if has description) | Profile description |
146146-| `sh.tangled.graph.follow` | follow | No | Graph edge only |
147147-148148-State records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) update the `record_state` table rather than creating documents.
149149-150150-## XRPC Client
151151-152152-The built-in XRPC client provides typed access to AT Protocol endpoints with caching (1-hour TTL for DID docs and repo names):
153153-154154-- DID resolution via PLC Directory (`did:plc:`) or `.well-known/did.json` (`did:web:`)
155155-- Identity resolution (PDS endpoint + handle from DID document)
156156-- Record fetching (`com.atproto.repo.getRecord`, `com.atproto.repo.listRecords`)
157157-- Repo name resolution from `sh.tangled.repo` records
158158-- Web URL construction for Tangled entities
159159-160160-## Backfill
161161-162162-The backfill command now defaults to `--source lightrail`: it calls
163163-`com.atproto.sync.listReposByCollection`, dedupes returned DIDs, and batch
164164-submits them to Tap. `--source graph` keeps the older seed-file follow and
165165-collaborator crawl for targeted fallback runs.
166166-167167-## Configuration
168168-169169-All configuration is via environment variables (with `.env` file support):
170170-171171-| Variable | Default | Purpose |
172172-| -------------------------- | ----------------------- | ----------------------------------------------- |
173173-| `TURSO_DATABASE_URL` | — | Database connection (required unless `--local`) |
174174-| `TURSO_AUTH_TOKEN` | — | Auth token (required for remote) |
175175-| `TAP_URL` | — | Tap WebSocket URL |
176176-| `TAP_AUTH_PASSWORD` | — | Tap admin password |
177177-| `INDEXED_COLLECTIONS` | all | Collection allowlist (CSV, supports wildcards) |
178178-| `READ_THROUGH_MODE` | missing | `off`, `missing`, or `broad` |
179179-| `READ_THROUGH_COLLECTIONS` | `INDEXED_COLLECTIONS` | Read-through allowlist |
180180-| `READ_THROUGH_MAX_ATTEMPTS`| 5 | Retries before `dead_letter` |
181181-| `HTTP_BIND_ADDR` | `:8080` | API server bind address |
182182-| `INDEXER_HEALTH_ADDR` | `:9090` | Indexer health probe address |
183183-| `LOG_LEVEL` | info | debug/info/warn/error |
184184-| `LOG_FORMAT` | json | json or text |
185185-| `ENABLE_ADMIN_ENDPOINTS` | false | Enable admin routes |
186186-| `ADMIN_AUTH_TOKEN` | — | Bearer token for admin |
187187-| `ENABLE_INGEST_ENRICHMENT` | true | XRPC enrichment at ingest time |
188188-| `PLC_DIRECTORY_URL` | `https://plc.directory` | PLC Directory |
189189-| `XRPC_TIMEOUT` | 15s | XRPC HTTP timeout |
190190-191191-Recommended production practice is to use explicit search-relevant collection
192192-lists for `INDEXED_COLLECTIONS` and `READ_THROUGH_COLLECTIONS`, not
193193-`sh.tangled.*`, and to leave `sh.tangled.graph.follow` out of both.
9797+Use `just api-dev sqlite` only when you need the temporary SQLite rollback path.
1949819599## Deployment
196100197197-Deployed on Railway with three services:
198198-199199-- **api** — HTTP server (port 8080, health at `/readyz`)
200200-- **indexer** — Tap consumer (health at `:9090/health`)
201201-- **tap** — Tap instance (external dependency)
202202-203203-All services share the same Turso database. The API and indexer are separate deployments of the same binary with different subcommands.
204204-205205-## Experimental Local DB
206206-207207-The local development database lives at `packages/api/twister-dev.db` when the
208208-API runs with `--local`.
209209-210210-Operational rules:
211211-212212-1. Stop the API before backup or restore.
213213-2. Copy `twister-dev.db` and any matching `-wal` or `-shm` files together.
214214-3. Prefer restore-or-rebuild over repair if the file becomes suspect.
215215-4. Let the DB grow during active experiments, then compact or delete it later.
101101+Production uses:
216102217217-Useful local inspection:
103103+- Coolify Application with `docker-compose.prod.yaml`
104104+- separate Coolify-managed PostgreSQL resource
105105+- private Tap service from the pinned Indigo image
106106+- built-in Coolify Traefik for the public `api` domain
218107219219-```sh
220220-cd packages/api
221221-du -h twister-dev.db*
222222-ls -lh twister-dev.db*
223223-```
108108+See `docs/reference/deployment-walkthrough.md` for the full production flow.
+77-95
docs/reference/deployment-walkthrough.md
···11# Deployment Walkthrough
2233-This repo maps cleanly to Railway, but only for the backend pieces.
33+Twisted deploys to Coolify as one Compose application with three services:
4455-- Deploy `packages/api` to Railway as two services: `api` and `indexer`.
66-- Keep the Ionic + Capacitor app on your machine or in CI for native builds.
77-- Point the mobile app at the Railway `api` service with
88- `VITE_TWISTER_API_BASE_URL`.
55+- `api`: public HTTP service
66+- `indexer`: private Tap consumer
77+- `tap`: private Indigo Tap service
981010-## What Railway Should Host
99+PostgreSQL is a separate Coolify-managed resource.
11101212-Railway is a good home for the Go services in this repo:
1111+## Files
13121414-- `api`: serves HTTP routes, docs, search, proxies, and readiness checks
1515-- `indexer`: consumes Tap, writes into Turso, and exposes its own health endpoint
1616-Railway is not the place that ships the native iOS or Android app. You still
1717-build, sign, and distribute the Capacitor shells separately.
1313+- production compose: `docker-compose.prod.yaml`
1414+- local dev compose: `docker-compose.dev.yaml`
1515+- app image build: `packages/api/Dockerfile`
1616+- Tap image: `ghcr.io/bluesky-social/indigo/tap:sha-4f47add43060c27e8a37d9d76482ecddf001fcd8`
18171918## Prerequisites
20192121-Before you start, have these ready:
2020+- Coolify access
2121+- one Coolify PostgreSQL resource
2222+- this repo connected to Coolify
2323+- explicit `INDEXED_COLLECTIONS` and `READ_THROUGH_COLLECTIONS`
2424+- one shared Tap admin password
22252323-- a Railway account and the Railway CLI
2424-- a Turso database URL and auth token
2525-- a Tap URL and Tap auth password
2626-From this machine:
2626+## Provision PostgreSQL
27272828-```sh
2929-cd /Users/owais/Projects/Twisted
3030-railway login
3131-```
2828+Create the PostgreSQL resource first.
32293333-## Create The Railway Project
3434-3535-In the Railway dashboard, create one empty project with two empty services:
3636-3737-- `api`
3838-- `indexer`
3939-Then link this repo to that project:
3030+- keep the generated connection string in Coolify secrets as `DATABASE_URL`
3131+- use PostgreSQL backups from the database resource
3232+- point both `api` and `indexer` at the same database
40334141-```sh
4242-cd /Users/owais/Projects/Twisted
4343-railway link
4444-```
3434+## Create The Coolify App
45354646-## Configure Service Shape
3636+In Coolify:
47374848-Both services should deploy from the same local path:
3838+1. create a new Application
3939+2. choose the Docker Compose build pack
4040+3. point it at this repo
4141+4. set base directory to `/`
4242+5. set compose file location to `/docker-compose.prod.yaml`
49435050-- path: `packages/api`
5151-- build source: `packages/api/Dockerfile`
5252-Set the service start commands in Railway:
5353-- `api`: `twister api`
5454-- `indexer`: `twister indexer`
5555-The checked-in Dockerfile already builds the `twister` binary.
4444+Do not add your own Traefik container. Coolify already provides the proxy.
56455757-## Set Variables
4646+## Set Environment Variables
58475959-Use shared variables for values both services need:
4848+Shared:
60496161-- `TURSO_DATABASE_URL`
6262-- `TURSO_AUTH_TOKEN`
5050+- `DATABASE_URL`
5151+- `INDEXED_COLLECTIONS`
6352- `LOG_LEVEL=info`
6453- `LOG_FORMAT=json`
6565-Set these on `api`:
6666-- `HTTP_BIND_ADDR=0.0.0.0:${{ PORT }}`
5454+- `TAP_AUTH_PASSWORD=<required>`
5555+5656+`api`:
5757+5858+- `HTTP_BIND_ADDR=:8080`
6759- `SEARCH_DEFAULT_LIMIT=20`
6860- `SEARCH_MAX_LIMIT=100`
6961- `READ_THROUGH_MODE=missing`
7070-- `READ_THROUGH_COLLECTIONS=<explicit search collection CSV>`
6262+- `READ_THROUGH_COLLECTIONS=<explicit CSV>`
7163- `READ_THROUGH_MAX_ATTEMPTS=5`
7264- `ENABLE_ADMIN_ENDPOINTS=false`
7373-- `ADMIN_AUTH_TOKEN=<set this if admin routes are enabled>`
7474-Set these on `indexer`:
7575-- `INDEXER_HEALTH_ADDR=0.0.0.0:${{ PORT }}`
7676-- `TAP_URL=<your Tap URL>`
7777-- `TAP_AUTH_PASSWORD=<your Tap password>`
7878-- `INDEXED_COLLECTIONS=<matching explicit search collection CSV>`
7979-- `ENABLE_INGEST_ENRICHMENT=true`
8080-Do not use `sh.tangled.*` for those allowlists. Match the Lightrail-backed
8181-search collection set and leave `sh.tangled.graph.follow` out.
8282-Optional OAuth variables for a Railway-hosted web client metadata endpoint:
8383-- `OAUTH_CLIENT_ID`
8484-- `OAUTH_REDIRECT_URIS`
8585-The `${{ PORT }}` reference matters. Railway health checks run against the
8686-service port it injects, so the process must listen on that port.
6565+- `ADMIN_AUTH_TOKEN=<optional>`
6666+- `OAUTH_CLIENT_ID=<optional>`
6767+- `OAUTH_REDIRECT_URIS=<optional CSV>`
87688888-## Deploy From This Machine
6969+`indexer`:
89709090-From the repo root, deploy `packages/api` into each Railway service:
7171+- `INDEXER_HEALTH_ADDR=:9090`
7272+- `TAP_URL=ws://tap:2480/channel`
7373+- `ENABLE_INGEST_ENRICHMENT=true`
91749292-```sh
9393-cd /Users/owais/Projects/Twisted
9494-railway up packages/api --path-as-root --service api
9595-railway up packages/api --path-as-root --service indexer
9696-```
7575+`tap`:
97769898-`--path-as-root` is important in this monorepo. It makes `packages/api` the
9999-deployment root instead of archiving the whole repo.
7777+- `TAP_COLLECTION_FILTERS=<optional explicit CSV>`
7878+- optional persistent volume override if you do not want the default `/data`
10079101101-## Configure Health Checks
8080+Use explicit search collections. Do not use `sh.tangled.*` in production.
10281103103-Set the health check path in Railway for each service:
8282+## Domains And Health Checks
10483105105-- `api`: `/readyz`
106106-- `indexer`: `/health`
107107-`/readyz` is the better API check because it verifies database reachability.
8484+Expose only `api` publicly.
10885109109-## First Bootstrap
8686+- assign the domain in Coolify to the `api` service
8787+- if `api` stays on `:8080`, include that internal port in the Coolify mapping
8888+- configure readiness checks against `GET /readyz`
8989+- keep `indexer` and `tap` private
9090+- monitor `indexer` with `GET /health`
11091111111-A fresh environment is not search-ready just because the services booted.
9292+## First Bootstrap
11293113113-1. Deploy `api`.
114114-2. Deploy `indexer`.
115115-3. Confirm the `api` domain returns `200` from `/readyz`.
116116-4. Confirm the `indexer` returns `200` from `/health`.
117117-5. Run the initial backfill against the same Turso and Tap environment.
118118-Use Railway shell so the command runs inside the live `indexer` environment:
9494+1. deploy `tap`
9595+2. deploy `api`
9696+3. deploy `indexer`
9797+4. confirm `api` returns `200` from `/readyz`
9898+5. confirm `indexer` returns `200` from `/health`
9999+6. confirm `indexer` can reach `ws://tap:2480/channel`
100100+7. open a Coolify terminal in the `indexer` service and run:
119101120102```sh
121121-cd /Users/owais/Projects/Twisted
122122-railway link # Select indexer service if prompted
123123-railway shell
124124-twister backfill --source lightrail
103103+twister backfill
104104+twister enrich
105105+twister reindex
125106```
126107127127-Do not call the environment ready until that first backfill has completed.
108108+This rebuilds the serving dataset from authoritative sources. Do not import the
109109+old Turso data as the default migration path.
128110129129-## Point The App At Railway
111111+## Point The App At Coolify
130112131131-For local app builds, set the Railway API URL in `apps/twisted/.env`:
113113+For local app builds:
132114133115```sh
134116VITE_TWISTER_API_BASE_URL=https://<your-api-domain>
135117```
136118137137-Then build or run the app as usual:
119119+Then run the app normally with `pnpm --dir apps/twisted dev` or `build`.
120120+121121+## Rollback Notes
138122139139-```sh
140140-pnpm --dir apps/twisted dev
141141-pnpm --dir apps/twisted build
142142-pnpm --dir apps/twisted exec cap sync
143143-```
123123+- keep the SQLite `--local` path only as a temporary development fallback
124124+- rollback production by restoring PostgreSQL and redeploying the prior app
125125+- treat PostgreSQL restore as the database rollback primitive
+47-79
docs/reference/metrics.md
···11# Metrics To Watch
2233-Use this after deploying the Lightrail-backed backfill flow and detail-only
44-read-through changes.
33+Use this after rolling out the Coolify + PostgreSQL deployment.
5466-## Goal
55+## Goals
7687Confirm that:
981010-- the API stops creating broad read-through churn during browse traffic
1111-- the indexer still keeps search current through Tap
1212-- bootstrap backfills become cheaper and more predictable
99+- `api` stays stable under browse-heavy traffic
1010+- `indexer` keeps search current through Tap
1111+- PostgreSQL handles ingest, queue churn, and activity writes without backlog
13121414-## Railway
1313+## Coolify Application
15141616-Watch both `api` and `indexer` for 24 to 48 hours after deploy.
1515+Watch both services for 24 to 48 hours after deploy.
17161818-### API service
1717+### API
19182019Expected direction:
21202222-- lower average CPU
2323-- fewer latency spikes on browse-heavy endpoints
2424-- lower memory churn from fewer queued background jobs
2121+- lower latency spikes
2222+- stable memory
2323+- flat restart count
25242625Useful checks:
27262828-- CPU usage before and after deploy
2929-- memory usage before and after deploy
3030-- request latency for browse-heavy periods
2727+- request latency
2828+- CPU and memory
2929+- `/readyz` failures
3130- restart count
32313333-If this change is helping, the API should look flatter under normal browsing,
3434-especially when clients hit repo lists, issue lists, pull lists, or follows.
3535-3636-### Indexer service
3232+### Indexer
37333834Expected direction:
39354040-- similar steady-state load during normal Tap ingest
4141-- shorter, more deliberate spikes only when `twister backfill` is run
3636+- steady-state load during Tap ingest
3737+- bounded spikes during `backfill`, `enrich`, and `reindex`
3838+- `/health` stays green outside deploy windows
42394340Useful checks:
44414545-- CPU during normal operation
4646-- CPU during `twister backfill --source lightrail`
4747-- memory during backfill
4242+- CPU and memory
4843- restart count
4444+- Tap reconnect frequency
4545+- queue drain time after backfill
49465050-The indexer may still spike during an initial bootstrap. That is expected. The
5151-important change is that the API should stop causing constant incidental work.
5252-5353-## Turso
5454-5555-This is where the clearest savings should show up.
4747+## PostgreSQL
56485749Expected direction:
58505959-- fewer write operations
6060-- fewer row updates in indexing job tables
6161-- lower write amplification from browse traffic
5151+- predictable connection count
5252+- stable write latency during ingest
5353+- no long-lived lock buildup in `indexing_jobs`
5454+- bounded table growth in `jetstream_events` and `indexing_audit`
62556356Useful checks:
64576565-- total row writes
6666-- total queries
6767-- write-heavy windows during normal app usage
6868-- latency on write statements if you have it
6969-7070-The main reduction should come from no longer enqueueing whole list responses
7171-into `indexing_jobs` during browse requests.
5858+- connections
5959+- disk growth
6060+- slow queries
6161+- write latency
6262+- backup duration and restore confidence
72637373-## Twister Admin Signals
6464+## Admin Signals
74657575-If admin endpoints are enabled, compare these before and after deploy:
6666+If admin routes are enabled, compare:
76677768- `read_through.pending`
7869- `read_through.processing`
7970- `read_through.failed`
8071- `read_through.dead_letter`
8181-- `read_through.last_processed_at`
7272+- `tap.cursor`
7373+- `jetstream.cursor`
82748383-Healthy post-change behavior:
7575+Healthy behavior:
84768577- pending stays near zero most of the time
8686-- processing only bumps when detail pages fetch missing records
8787-- failed and dead-letter counts grow slowly, not continuously
8888-8989-Relevant endpoint:
9090-9191-```sh
9292-curl -H "Authorization: Bearer $ADMIN_AUTH_TOKEN" http://<api-host>/admin/status
9393-```
9494-9595-## What To Compare
9696-9797-Use the same day-of-week and similar traffic windows if possible.
9898-9999-Good comparisons:
100100-101101-- 24 hours before deploy vs 24 hours after deploy
102102-- one browse-heavy period before vs after
103103-- one bootstrap backfill run before vs after
7878+- processing drains after bursts
7979+- failed and dead-letter stay small and explainable
1048010581## Success Signals
10682107107-Treat the rollout as successful if most of these are true:
108108-109109-- API CPU is lower or less spiky under normal browsing
110110-- Turso writes drop during browse-heavy traffic
111111-- read-through queue counts stay close to zero most of the time
112112-- backfill runs complete with fewer upstream calls and cleaner batching
113113-- search freshness still tracks Tap ingest without visible regressions
8383+- `/readyz` and `/health` remain consistently green
8484+- search freshness tracks Tap ingest
8585+- backfill and enrich jobs complete without manual cleanup
8686+- PostgreSQL latency stays stable during bootstrap and normal use
1148711588## Failure Signals
11689117117-Investigate if you see any of these:
118118-119119-- search misses rise after deploy
120120-- detail pages repeatedly enqueue the same records
121121-- `read_through.pending` grows and does not drain
122122-- indexer CPU stays elevated long after a bootstrap run
123123-- Turso writes do not drop despite the handler changes
124124-125125-If that happens, inspect Tap coverage first, then spot-check whether operators
126126-ran `twister backfill --source lightrail` for the environment.
9090+- queue counts rise and do not drain
9191+- `/readyz` flips during normal browse traffic
9292+- search misses rise after cutover
9393+- PostgreSQL write latency climbs during normal ingest
9494+- restores or backups are failing or taking too long
+54-117
docs/reference/resync.md
···33updated: 2026-03-26
44---
5566-Twister's search index has three recovery paths. Choose based on what broke.
77-88-| Situation | Recovery path |
99-| ----------------------------------------------------- | -------------------------------------------- |
1010-| FTS index corrupted or drifted from stored documents | `twister reindex` |
1111-| Documents missing — never received via Tap | `twister backfill` + let the indexer consume |
1212-| Documents missing — received but fields empty/wrong | `twister enrich` |
1313-| Full index loss — DB dropped or migrated | backfill then reindex then enrich |
1414-| Tap cursor too far ahead — events skipped after a gap | cursor reset via `sync_state` table |
1515-1616----
66+Twisted has three recovery tools. Choose based on what broke.
1771818-## Paths Overview
1919-2020-**Tap** is the authoritative ingest and backfill path. Documents reach the index
2121-when the `indexer` consumes events from Tap. Completeness depends on which DIDs
2222-Tap is tracking.
2323-2424-**Read-through indexing** now runs in `missing` mode by default: when the API
2525-fetches a record that is absent or stale, and the collection is allowed, it
2626-enqueues a background job. Bulk list reads no longer enqueue entire collections.
2727-2828-**JetStream** feeds only the activity cache (`/activity`). It does not contribute
2929-to the search index.
3030-3131----
88+| Situation | Recovery path |
99+| --- | --- |
1010+| Search results wrong but documents exist | `twister reindex` |
1111+| Documents missing because Tap never delivered them | `twister backfill` |
1212+| Documents exist but derived metadata is empty or stale | `twister enrich` |
1313+| Full database loss or migration to a fresh PostgreSQL instance | backfill, enrich, reindex |
32143315## Commands
34163517### `twister indexer`
36183737-Runs the Tap consumer. Must be running continuously for real-time indexing.
3838-Persists cursor to `sync_state` table under consumer name `indexer-tap-v1`.
1919+Runs the Tap consumer continuously. Persists its cursor in `sync_state`.
39204021### `twister backfill`
41224242-Defaults to `--source lightrail`: discovers DIDs from
4343-`com.atproto.sync.listReposByCollection` and submits them to Tap in batches.
4444-Use `--source graph` only for targeted fallback seeding from handles or DIDs.
2323+Default source is `lightrail`. Use graph mode only for targeted fallback.
45244625```sh
4747-# full-network dry-run first
4826twister backfill --dry-run
4949-5050-# full-network bootstrap
5127twister backfill
5252-5353-# targeted fallback
5454-twister backfill --source graph --seeds seeds.txt --max-hops 2 \
5555- --concurrency 5 --batch-size 10 --batch-delay 1s
2828+twister backfill --source graph --seeds seeds.txt --max-hops 2
5629```
57305858-Safe to re-run. Discovery deduplicates and `repos/add` is treated as idempotent.
3131+Safe to rerun. Discovery is deduplicated and Tap registration is treated as
3232+idempotent.
59336034### `twister reindex`
61356262-Re-upserts stored documents into the FTS table and runs `optimize`. Does not
6363-re-fetch from upstream — only re-processes what is already in the DB.
3636+Re-upserts stored documents so PostgreSQL recomputes search state from the
3737+canonical `documents` rows.
64386539```sh
6666-twister reindex # all documents
4040+twister reindex
6741twister reindex --collection sh.tangled.repo
6842twister reindex --did did:plc:abc123
6969-twister reindex --dry-run # preview without writing
4343+twister reindex --dry-run
7044```
71457272-Run this when: FTS results are stale after a schema migration, after a bulk
7373-document import, or whenever search quality seems inconsistent with stored data.
7474-7546### `twister enrich`
76477777-Resolves missing `author_handle`, `repo_name`, and `web_url` via XRPC for
7878-documents already in the DB.
4848+Fills missing `author_handle`, `repo_name`, and `web_url`.
79498050```sh
8181-twister enrich # all documents
5151+twister enrich
8252twister enrich --collection sh.tangled.repo.issue
8353twister enrich --did did:plc:abc123
8454twister enrich --dry-run
8555```
8686-8787-Run this when: search results show documents with empty author handles, or
8888-after deploying enrichment logic changes.
8989-9090----
91569257## Scenario Playbooks
93589494-### FTS index out of sync
5959+### Search drift
95609696-Documents exist in the DB but search returns wrong/stale results.
6161+If search results look stale but the document rows are present:
97629863```sh
9999-twister reindex --dry-run # confirm scope
100100-twister reindex # re-upsert + FTS optimize
6464+twister reindex --dry-run
6565+twister reindex
10166```
10267103103-Verify with `GET /search?q=<known-term>`.
6868+### Missing documents
10469105105-### Documents missing from search
7070+If a record is fetchable through the API but not searchable:
10671107107-Fetch a known record directly. If it returns from `/actors/{handle}/repos/{repo}`
108108-but does not appear in `/search`, the document was never indexed.
109109-110110-1. Check if the DID is tracked by Tap. If not, run `backfill`:
111111-112112- ```sh
113113- twister backfill --source graph --seeds <handle-or-did> --max-hops 0
114114- ```
115115-116116-2. Once Tap is tracking the DID, the `indexer` will deliver historical events.
117117- Monitor progress via `GET /admin/status` and inspect backlog or failures with
118118- `GET /admin/indexing/jobs` and `GET /admin/indexing/audit`.
7272+1. make sure Tap is tracking the DID
7373+2. run targeted `backfill` if needed
7474+3. let `indexer` drain
7575+4. re-run `enrich` if metadata is still incomplete
11976120120-3. If you need the record indexed immediately, fetch the detail endpoint through
121121- the API or enqueue it explicitly with `POST /admin/indexing/enqueue`.
7777+### Metadata gaps
12278123123-### Enrichment gaps
124124-125125-Documents appear in search but `author_handle` or `repo_name` is empty.
7979+If `author_handle` or `repo_name` is empty:
1268012781```sh
128128-twister enrich --dry-run # preview what would be resolved
129129-twister enrich # apply
130130-twister reindex # re-sync FTS after field updates
8282+twister enrich --dry-run
8383+twister enrich
8484+twister reindex
13185```
13286133133-### Full index recovery
8787+### Full PostgreSQL rebuild
13488135135-Use this sequence after a DB drop, migration to a new Turso database, or other
136136-full-loss event.
8989+Use this after restoring to a fresh database or moving to a new PostgreSQL
9090+instance.
13791138138-1. Confirm migrations ran: `twister api --local` performs `store.Migrate` on startup.
139139-2. Register repos with Tap:
9292+1. start `api` once so migrations run
9393+2. start `indexer`
9494+3. run `twister backfill`
9595+4. run `twister enrich`
9696+5. run `twister reindex`
9797+6. verify `/readyz`, `/health`, and smoke checks
14098141141- ```sh
142142- twister backfill --dry-run
143143- twister backfill
144144- ```
145145-146146-3. Start the indexer and let it consume: `twister indexer`
147147-4. Once backfill is complete, enrich fields and re-sync FTS:
148148-149149- ```sh
150150- twister enrich
151151- twister reindex
152152- ```
153153-154154-5. Verify: `GET /admin/status` for cursor progress, `GET /readyz` for DB health.
9999+This is the default migration path from the old Turso-backed deployment too.
155100156101### Tap cursor reset
157102158158-If the indexer cursor is ahead of what Tap will deliver (e.g., after a Tap
159159-instance reset), events will be skipped until the cursor catches up.
160160-161161-To reset the cursor and reprocess from the beginning of Tap's retention window:
103103+If the Tap cursor is ahead of the retained event window:
162104163105```sql
164106DELETE FROM sync_state WHERE consumer_name = 'indexer-tap-v1';
165107```
166108167167-Then restart the `indexer`. It will start from the head of the stream and
168168-process all events Tap delivers.
169169-170170-> **Note:** This does not cause duplicate documents — `UpsertDocument` is
171171-> idempotent. It may reprocess a large backlog depending on Tap retention.
109109+Then restart the `indexer`.
172110173173----
111111+## Status Checks
174112175175-## Checking Status
176176-177177-With `ENABLE_ADMIN_ENDPOINTS=true`:
113113+With admin routes enabled:
178114179115```sh
180116curl -H "Authorization: Bearer $ADMIN_AUTH_TOKEN" \
181117 http://localhost:8080/admin/status
182118```
183119184184-Response includes:
120120+Watch:
185121186186-- `tap.cursor` and `tap.updated_at`
187187-- `jetstream.cursor` and `jetstream.updated_at`
122122+- `tap.cursor`
123123+- `jetstream.cursor`
188124- `documents`
189189-- `read_through.pending`, `processing`, `completed`, `failed`, `dead_letter`
190190-- `read_through.oldest_pending_age_s` and `oldest_running_age_s`
191191-- `read_through.last_completed_at` and `last_processed_at`
125125+- `read_through.pending`
126126+- `read_through.processing`
127127+- `read_through.failed`
128128+- `read_through.dead_letter`
+4-4
docs/roadmap.md
···11---
22title: Roadmap
33-updated: 2026-03-25
33+updated: 2026-03-26
44---
5566## API: Search Stabilization
7788Highest priority. This work blocks further investment in search quality and broader discovery features.
991010-- [x] Stabilize local development and experimentation around a local `file:` database
1010+- [x] Stabilize local development around PostgreSQL, with SQLite kept only as a rollback path
1111- [x] Document backup, restore, and disk-growth procedures for the experimental local DB
1212- [x] Research production backend options: PostgreSQL, Turso remote/libSQL, and Turso embedded replicas
1313- [x] Write a production storage decision record with workload and operational tradeoffs, using `docs/adr/pg.md` and `docs/adr/turso.md`
···33333434Completed on [2026-03-25](../CHANGELOG.md#2026-03-25)
35353636-## API: FTS5 Search Quality
3636+## API: Keyword Search Quality
37373838Improve keyword search quality without external dependencies.
39394040**Depends on:** API: Search Stabilization
41414242- [ ] Synonym expansion at query time (e.g. "repo" matches "repository")
4343-- [ ] Stemming tokenizer (porter or unicode61+porter)
4343+- [ ] Stemming and parser tuning for PostgreSQL full-text search
4444- [ ] Prefix search support for autocomplete
4545- [ ] Field weight tuning based on real query patterns
4646- [ ] Recency boost for recently updated content
+35-239
docs/specs/search.md
···11---
22title: Search
33-updated: 2026-03-25
33+updated: 2026-03-26
44---
5566-> Warning: this document is pretty long. Look at the roadmap and ADR summaries for a
77-> high-level overview, or jump to the relevant sections.
88-99-Search now has two phases:
1010-1111-1. Stabilize indexing and activity caching so search is cheap and reliable.
1212-2. Enhance keyword search quality with FTS5 features once the base pipeline is stable.
66+Twisted search is now operationally centered on PostgreSQL, Tap ingest, and a
77+small set of rebuild tools.
1381414-## Immediate Priority
99+## Current State
15101616-The current highest-priority search work is operational, not ranking:
1717-1818-- Stabilize experimentation around a local `file:` database workflow.
1919-- Add cURL smoke tests for search, document fetches, indexing, and activity reads.
2020-- Enqueue background indexing when the API fetches records that are not yet searchable.
2121-- Cache recent JetStream activity server-side with a persisted 24-hour cursor.
2222-2323-Production storage is Turso cloud. The reasoning is recorded in `docs/adr/storage.md`, with the comparison inputs in `docs/adr/pg.md` and `docs/adr/turso.md`.
2424-2525-These tasks block further work on search quality improvements.
2626-2727-## Planning Decisions
2828-2929-### Why This Comes First
3030-3131-Search quality is currently constrained more by ingestion cost and freshness gaps than by ranking quality.
3232-The next iteration should make Twister cheaper to operate, resilient across restarts, and able to backfill misses on demand before any new semantic or hybrid work.
3333-3434-### Resolved Questions
3535-3636-#### Local-Only Storage
3737-3838-Twister can already run against a local `file:` database. That is useful for stabilizing development and experimentation while the indexing model is still changing. It should not automatically be treated as the final production architecture.
3939-4040-The production storage question remains open and should compare at least:
4141-4242-- PostgreSQL with native full-text search and conventional operational tooling
4343-- Turso remote/libSQL
4444-- Turso with embedded replicas or similar local-read, remote-sync patterns
4545-4646-That comparison has been completed, and the current production choice is Turso.
4747-4848-#### Tangled First-Commit Timestamp
4949-5050-The first Tangled commit timestamp is useful as a lower-bound hint for one-time experiments, but it should not become the default replay cursor.
5151-JetStream has to default to recent history (< 72 hours from now is what's possible) so bootstrap cost stays bounded.
5252-5353-#### Tap Versus JetStream
5454-5555-Tap remains the authoritative indexing and bulk backfill path. JetStream should power only a bounded recent-activity cache.
5656-Read-through API indexing closes gaps when a user fetches a record before Tap has delivered it.
1111+- primary storage: PostgreSQL
1212+- local default URL: `postgresql://localhost/${USER}_dev?sslmode=disable`
1313+- production deploy target: Coolify application plus managed PostgreSQL
1414+- legacy fallback: local SQLite behind `--local`
57155816## Goals
59176060-- Reduce search-related reads and writes enough that remote Turso cost is no longer the dominant constraint.
6161-- Keep indexed content fresh enough for browsing and search without requiring a full-network rebuild after routine restarts.
6262-- Serve recent activity cheaply from a local cache.
6363-- Add a smoke-test layer that verifies search and indexing behavior end to end.
6464-6565-## Current Search Mode
6666-6767-### Keyword Search (Implemented)
6868-6969-Full-text search is powered by SQLite FTS5 with BM25 scoring. Queries match title, body, summary, repo name, author handle, and tags. Results are ranked with field-specific weights and snippets highlight matches with `<mark>` tags.
7070-7171-## Stabilization Plan
7272-7373-### Storage
7474-7575-Twister should use a local `file:` database to stabilize experimentation and reduce the messiness of iteration while the indexing pipeline is being hardened. Production storage should remain explicitly undecided until the project compares PostgreSQL and Turso-based options against the final workload.
7676-7777-Requirements:
7878-7979-- keep local-file mode as the simplest path for development and experimentation
8080-- document what assumptions the local path makes about single-host or shared-disk execution
8181-- document backup, restore, and disk-growth procedures
8282-- produce a production storage decision record comparing PostgreSQL and Turso options, starting from `docs/adr/pg.md` and `docs/adr/turso.md`
8383-8484-Evaluation criteria for the production decision:
8585-8686-- write-heavy ingestion behavior
8787-- FTS quality and indexing ergonomics
8888-- operational complexity and backup story
8989-- latency for reads and writes
9090-- failure recovery and restore workflow
9191-- support for future semantic search requirements
9292-9393-Acceptance:
9494-9595-- local development no longer depends on remote Turso for routine experimentation
9696-- the production backend choice is documented with explicit tradeoffs
9797-- the chosen production backend has a migration path from the experimental local setup
9898-9999-The concrete local DB operating procedure lives in `docs/reference/api.md`.
100100-The production migration path is documented in `docs/adr/storage.md`.
101101-102102-### Read-Through Indexing
103103-104104-When the API fetches a repo, issue, PR, profile, or similar detail record
105105-directly from upstream, it should enqueue background indexing work only when
106106-that record is missing or stale. Tap remains the primary ingest path;
107107-read-through indexing only closes gaps.
108108-109109-Requirements:
110110-111111-- add a durable job table for on-demand indexing
112112-- deduplicate jobs by stable document identity
113113-- reuse the existing normalization and upsert path
114114-- trigger jobs from detail handlers that already fetch upstream records
115115-- do not enqueue whole collections from list or browse handlers
116116-117117-Acceptance:
118118-119119-- a fetched-but-missing record becomes searchable shortly after the first successful API read
120120-- repeated page views do not create unbounded duplicate work
121121-- queue state and terminal failures are inspectable through admin endpoints
122122-- failures are visible through logs and smoke tests
123123-124124-### Activity Cache
125125-126126-JetStream should back a recent-activity cache, not the main search index. The server should persist a timestamp cursor, seed it to `now - ~24h` on first boot, rewind slightly on reconnect, and expire old events aggressively.
127127-128128-Requirements:
129129-130130-- add a dedicated activity cache table
131131-- persist a separate JetStream consumer cursor
132132-- seed missing cursors to recent history, not full history
133133-- keep retention bounded by age and row count
134134-135135-Acceptance:
136136-137137-- common activity reads can be served from the cache
138138-- restarts resume from the stored timestamp cursor
139139-- reconnects are idempotent and tolerate a short rewind window
1818+- keep search fresh through Tap ingest and targeted backfill
1919+- preserve the current `/search` API contract
2020+- make local development and production use the same database family
2121+- keep rebuild and recovery workflows simple enough to rehearse
14022141141-### Smoke Tests
2323+## Keyword Search
14224143143-Twister needs cURL-based smoke tests covering:
2525+Implemented with PostgreSQL full-text search:
14426145145-- `GET /healthz`
146146-- `GET /readyz`
147147-- `GET /search`
148148-- `GET /documents/{id}`
149149-- one fetch path that should enqueue indexing
150150-- one activity endpoint backed by the cache
2727+- weighted `tsvector` over title, author handle, repo name, summary, body, tags
2828+- `websearch_to_tsquery('simple', ...)`
2929+- `ts_rank_cd`
3030+- `ts_headline`
15131152152-Acceptance:
3232+Result scores and snippets may differ from the old SQLite FTS5 implementation.
15333154154-- one local command can verify the critical API surface
155155-- the same scripts can run against staging or production by changing the base URL
3434+## Ingest Model
15635157157-## Operational Model
3636+1. Tap is the authoritative indexing path.
3737+2. Read-through indexing fills misses from detail fetches.
3838+3. JetStream powers only the bounded activity cache.
3939+4. `backfill`, `enrich`, and `reindex` rebuild the serving dataset.
15840159159-1. Tap ingests the authoritative search corpus.
160160-2. Direct API reads enqueue background indexing for misses.
161161-3. JetStream fills only the recent-activity cache.
162162-4. Smoke tests guard the critical paths.
163163-5. FTS5 quality improvements (synonyms, stemming, prefix search) follow once the base pipeline is stable.
4141+## Operational Rules
16442165165-## Backfill Strategy
4343+- use explicit collection allowlists in production
4444+- do not import Turso data as the default migration path
4545+- treat PostgreSQL backups and restore drills as part of normal operations
4646+- keep the SQLite path only until the PostgreSQL rollout is fully bedded in
16647167167-- Search index backfill should continue to use Tap admin backfill, firehose-driven repo sync, or repo export based resync.
168168-- Activity cache bootstrap should use a recent JetStream timestamp cursor, defaulting to `now - 24h`.
169169-- A manual cursor override can exist for one-time replay experiments, but it should not be the default startup path.
4848+## Next Search Work
17049171171-## API Contract
172172-173173-**`GET /search`** — Unified endpoint, routes by `mode` parameter.
174174-175175-### Parameters
176176-177177-| Param | Required | Default | Description |
178178-| ------------ | -------- | ------- | ------------------------------------- |
179179-| `q` | Yes | — | Query string |
180180-| `mode` | No | keyword | keyword |
181181-| `limit` | No | 20 | Results per page (1–100) |
182182-| `offset` | No | 0 | Pagination offset |
183183-| `collection` | No | — | Filter by collection NSID |
184184-| `type` | No | — | Filter by record type |
185185-| `author` | No | — | Filter by handle or DID |
186186-| `repo` | No | — | Filter by repo name or DID |
187187-| `language` | No | — | Filter by primary language |
188188-| `from` | No | — | Created after (ISO 8601) |
189189-| `to` | No | — | Created before (ISO 8601) |
190190-| `state` | No | — | Issue/PR state (open, closed, merged) |
191191-192192-### Response
193193-194194-```json
195195-{
196196- "query": "tangled vue",
197197- "mode": "keyword",
198198- "total": 42,
199199- "limit": 20,
200200- "offset": 0,
201201- "results": [
202202- {
203203- "id": "did:plc:abc|sh.tangled.repo|my-repo",
204204- "collection": "sh.tangled.repo",
205205- "record_type": "repo",
206206- "title": "my-repo",
207207- "summary": "A Vue component library",
208208- "body_snippet": "...building <mark>Vue</mark> components for <mark>Tangled</mark>...",
209209- "score": 4.82,
210210- "matched_by": ["keyword"],
211211- "repo_name": "my-repo",
212212- "author_handle": "alice.bsky.social",
213213- "did": "did:plc:abc",
214214- "at_uri": "at://did:plc:abc/sh.tangled.repo/my-repo",
215215- "web_url": "https://tangled.sh/alice.bsky.social/my-repo",
216216- "created_at": "2026-01-15T10:00:00Z",
217217- "updated_at": "2026-03-20T14:30:00Z"
218218- }
219219- ]
220220-}
221221-```
222222-223223-## Search Strategy
224224-225225-Indexing via Tap is useful but has proven unreliable for maintaining complete, up-to-date coverage. The approach:
226226-227227-1. **Keyword search is the foundation.** It works now and covers the primary use case — finding repos, issues, and people by name or content.
228228-229229-2. **Constellation supplements search results.** Star counts and follower counts from Constellation can be used as ranking signals without needing to index interaction records ourselves.
230230-231231-3. **Read-through indexing closes freshness gaps.** If a user can fetch a record, the system should be able to make it searchable shortly after.
232232-233233-4. **JetStream is for recent activity, not authoritative indexing.** Use it to power the cached feed, not to replace Tap or repo re-sync.
234234-235235-5. **FTS5 enhancements are the next quality step.** Synonym expansion, stemming, and prefix search improve discovery without external dependencies.
236236-237237-6. **Graceful degradation.** The mobile app treats the search API as optional. If Twister is unavailable, handle-based direct browsing still works. Search results link into the same browsing screens.
238238-239239-## Quality Improvements (Planned)
240240-241241-- Synonym expansion at query time (e.g. "repo" matches "repository")
242242-- Stemming tokenizer (porter or unicode61+porter)
243243-- Prefix search support for autocomplete
244244-- Field weight tuning based on real query patterns
245245-- Recency boost for recently updated content
246246-- Collection-aware ranking
247247-- Star count as a ranking signal
248248-- State filtering defaults
249249-- Better snippet generation
250250-- Relevance test fixtures
251251-252252-## Mobile Integration
253253-254254-The app calls the search API from the Explore tab. Results are displayed in segmented views (repos, users, issues/PRs).
255255-Each result links to the corresponding browsing screen (repo detail, profile, issue detail).
256256-257257-When the search API is unavailable, the Explore tab shows an appropriate state rather than breaking.
258258-The Home tab's handle-based browsing is fully independent of search.
5050+- synonym expansion
5151+- stemming and better tokenizer choices
5252+- field weight tuning from real queries
5353+- recency boosts
5454+- relevance fixtures that assert behavior, not exact score strings
+11-2
justfile
···4141api-build:
4242 just --justfile packages/api/justfile build
43434444-# Run API. Usage: just api-dev [mode], mode: local|remote (default local)
4444+# Run API. Usage: just api-dev [mode], mode: local|remote|sqlite (default local)
4545api-dev mode="local":
4646 just --justfile packages/api/justfile run-api {{mode}}
47474848-# Run indexer. Usage: just api-run-indexer [mode], mode: local|remote (default local)
4848+# Run indexer. Usage: just api-run-indexer [mode], mode: local|remote|sqlite (default local)
4949api-run-indexer mode="local":
5050 just --justfile packages/api/justfile run-indexer {{mode}}
5151+5252+db-up:
5353+ docker compose -f docker-compose.dev.yaml up -d postgres tap
5454+5555+db-down:
5656+ docker compose -f docker-compose.dev.yaml down
5757+5858+db-psql:
5959+ psql "postgresql://localhost/${USER:-postgres}_dev?sslmode=disable"
51605261api-test:
5362 just --justfile packages/api/justfile test
+34-117
packages/api/doc.go
···11// Twister is the Tap-backed indexing and search API for Tangled.
22//
33-// It proxies upstream AT Protocol services such as knots, PDS endpoints,
44-// Bluesky, Constellation, and Jetstream so the app can use a single origin.
55-//
63// Requirements
74//
85// - Go 1.25+
99-// - A Turso database, or local SQLite for development
66+// - PostgreSQL for the normal local and production workflow
107//
118// Running locally
129//
1313-// cd packages/api
1414-// go run . api --local
1515-//
1616-// The local API listens on :8080 by default and uses packages/api/twister-dev.db.
1717-// Logs are printed as text when --local is set.
1818-//
1919-// # API smoke tests
2020-//
2121-// Smoke checks live in packages/scripts/api/. From the repo root:
2222-//
2323-// uv run --project packages/scripts/api twister-api-smoke
2424-//
2525-// Optional base URL override:
2626-//
2727-// TWISTER_API_BASE_URL=http://localhost:8080 \
2828-// uv run --project packages/scripts/api twister-api-smoke
1010+// cd /Users/owais/Projects/Twisted
1111+// just db-up
1212+// just api-dev
1313+// just api-run-indexer
2914//
3030-// # Experimental local DB operations
1515+// The default local database URL is:
3116//
3232-// The experimental local database lives at packages/api/twister-dev.db when
3333-// you run Twister with --local. Treat it as disposable unless you explicitly
3434-// back it up.
1717+// postgresql://localhost/${USER}_dev?sslmode=disable
3518//
3636-// Backup:
1919+// That matches a Postgres.app-style setup and the repo's dev compose file.
3720//
3838-// 1. Stop the Twister process using the local DB.
3939-// 2. Copy the database file and any SQLite sidecar files if they exist.
4040-//
4141-// Example:
2121+// # Legacy fallback
4222//
4343-// cd packages/api
4444-// mkdir -p backups
4545-// timestamp="$(date +%Y%m%d-%H%M%S)"
4646-// cp twister-dev.db "backups/twister-dev-${timestamp}.db"
4747-// test -f twister-dev.db-wal && cp twister-dev.db-wal "backups/twister-dev-${timestamp}.db-wal"
4848-// test -f twister-dev.db-shm && cp twister-dev.db-shm "backups/twister-dev-${timestamp}.db-shm"
2323+// `--local` is deprecated and switches the service to the temporary SQLite
2424+// fallback at packages/api/twister-dev.db.
4925//
5050-// Restore:
2626+// go run . api --local
5127//
5252-// 1. Stop the Twister process.
5353-// 2. Move the current local DB aside if you want to keep it.
5454-// 3. Copy the backup file back to twister-dev.db.
5555-// 4. Restore matching -wal and -shm files only if they came from the same set.
2828+// Smoke checks
5629//
5757-// Example:
3030+// uv run --project packages/scripts/api twister-api-smoke
5831//
5959-// cd packages/api
6060-// mv twister-dev.db "twister-dev.db.broken.$(date +%Y%m%d-%H%M%S)" 2>/dev/null || true
6161-// cp backups/twister-dev-YYYYMMDD-HHMMSS.db twister-dev.db
3232+// Optional base URL override:
6233//
6363-// Disk growth:
3434+// TWISTER_API_BASE_URL=http://localhost:8080 \
3535+// uv run --project packages/scripts/api twister-api-smoke
6436//
6565-// The local DB grows because of indexed documents, FTS tables, activity cache
6666-// rows, and repeated backfill or reindex runs.
3737+// Environment variables
6738//
6868-// Recommended operating procedure:
6969-//
7070-// 1. Check file growth periodically.
7171-// 2. Delete and rebuild the DB freely when the dataset is no longer useful.
7272-// 3. Run VACUUM only when you intentionally want to compact a long-lived DB.
7373-// 4. Keep old backups out of the repo and rotate them manually.
7474-//
7575-// Inspection commands:
7676-//
7777-// cd packages/api
7878-// du -h twister-dev.db*
7979-// ls -lh twister-dev.db*
8080-//
8181-// Failure recovery: prefer restore-or-rebuild over manual repair if the
8282-// experimental DB becomes
8383-// suspicious or inconsistent. It is a developer convenience database, not the
8484-// source of truth.
8585-//
8686-// # Environment variables
8787-//
8888-// Copy .env.example to .env in the repo root or packages/api/. The server loads
8989-// .env, ../.env, and ../../.env automatically.
9090-//
9191-// - TURSO_DATABASE_URL: Turso/libSQL connection URL, required unless --local
9292-// - TURSO_AUTH_TOKEN: auth token, required for non-file URLs
9393-// - HTTP_BIND_ADDR: default :8080
9494-// - LOG_LEVEL: debug, info, warn, or error; default info
9595-// - LOG_FORMAT: json or text; default json
9696-// - SEARCH_DEFAULT_LIMIT: default 20
9797-// - SEARCH_MAX_LIMIT: default 100
3939+// - DATABASE_URL: primary database connection URL
4040+// - HTTP_BIND_ADDR: API bind address, default :8080
4141+// - INDEXER_HEALTH_ADDR: indexer health bind address, default :9090
4242+// - LOG_LEVEL: debug, info, warn, or error
4343+// - LOG_FORMAT: json or text
4444+// - TAP_URL: Tap WebSocket URL, default ws://localhost:2480/channel in local indexer runs
4545+// - TAP_AUTH_PASSWORD: Tap admin password, default twisted-dev in local indexer runs
4646+// - INDEXED_COLLECTIONS: comma-separated AT collections to index
4747+// - READ_THROUGH_MODE: off or missing; default missing
4848+// - READ_THROUGH_COLLECTIONS: read-through allowlist
4949+// - READ_THROUGH_MAX_ATTEMPTS: retries before dead_letter
9850// - ENABLE_ADMIN_ENDPOINTS: default false
9999-// - ADMIN_AUTH_TOKEN: bearer token for admin endpoints
100100-// - CONSTELLATION_URL: default https://constellation.microcosm.blue
101101-// - CONSTELLATION_USER_AGENT: user-agent sent to Constellation
102102-// - TAP_URL: Tap firehose URL, indexer only
103103-// - TAP_AUTH_PASSWORD: Tap auth password, indexer only
104104-// - INDEXED_COLLECTIONS: comma-separated AT collections to index
105105-// - READ_THROUGH_MODE: off, missing, or broad; default missing
106106-// - READ_THROUGH_COLLECTIONS: read-through allowlist, default INDEXED_COLLECTIONS
107107-// - READ_THROUGH_MAX_ATTEMPTS: max retries before dead_letter, default 5
5151+// - ADMIN_AUTH_TOKEN: bearer token for admin routes
10852//
10953// CLI commands
11054//
···11357// twister backfill
11458// twister reindex
11559// twister enrich
116116-//
117117-// Enrich:
6060+// twister healthcheck
11861//
119119-// Resolves missing author_handle, repo_name, and web_url fields on documents
120120-// already in the database.
6262+// # Deployment
12163//
122122-// twister enrich --local
123123-// twister enrich --local --collection sh.tangled.repo
124124-// twister enrich --local --did did:plc:abc123
125125-// twister enrich --local --dry-run
126126-//
127127-// Flags: --collection, --did, --document, --dry-run, --concurrency (default 5).
128128-//
129129-// Proxy endpoints
130130-//
131131-// - GET /proxy/knot/{host}/{nsid} -> https://{host}/xrpc/{nsid}
132132-// - GET /proxy/pds/{host}/{nsid} -> https://{host}/xrpc/{nsid}
133133-// - GET /proxy/bsky/{nsid} -> https://public.api.bsky.app/xrpc/{nsid}
134134-// - GET /identity/resolve -> https://bsky.social/xrpc/com.atproto.identity.resolveHandle
135135-// - GET /identity/did/{did} -> https://plc.directory/{did} or /.well-known/did.json
136136-// - GET /backlinks/count -> Constellation getBacklinksCount, cached
137137-// - WS /activity/stream -> wss://jetstream2.us-east.bsky.network/subscribe
138138-//
139139-// # Admin endpoints
140140-//
141141-// Admin routes require ENABLE_ADMIN_ENDPOINTS=true. If ADMIN_AUTH_TOKEN is set,
142142-// requests must send Authorization: Bearer <ADMIN_AUTH_TOKEN>.
143143-//
144144-// - GET /admin/status: cursor state, queue counts, oldest ages, last activity
145145-// - GET /admin/indexing/jobs: inspect queue rows by status, source, or document
146146-// - GET /admin/indexing/audit: inspect append-only indexing audit rows
147147-// - POST /admin/indexing/enqueue: queue one explicit record for indexing
148148-// - POST /admin/reindex: re-sync all or filtered documents into the FTS index
6464+// Production uses Coolify for the `api`, `indexer`, and `tap` services plus a
6565+// separate Coolify-managed PostgreSQL resource. See docs/reference/deployment-walkthrough.md.
14966package main
···11-// Package reindex re-syncs documents to the FTS index from stored fields.
11+// Package reindex re-syncs documents to the search index from stored fields.
22// It is used by the `twister reindex` CLI command and the POST /admin/reindex endpoint.
33package reindex
44···4141}
42424343// Run reindexes documents matching opts.
4444-// It re-upserts each document (which re-syncs the FTS virtual table) and then
4545-// runs an FTS optimize pass to merge Tantivy/FTS5 segments.
4444+// It re-upserts each document and then runs any backend-specific search index
4545+// optimization step.
4646func (r *Runner) Run(ctx context.Context, opts Options) (*Result, error) {
4747 filter := store.DocumentFilter{
4848 Collection: opts.Collection,
···103103 }
104104105105 if !opts.DryRun {
106106- r.log.Info("reindex: optimizing fts index")
107107- if err := r.store.OptimizeFTS(ctx); err != nil {
108108- r.log.Error("reindex: fts optimize failed", slog.String("error", err.Error()))
106106+ r.log.Info("reindex: finalizing search index")
107107+ if err := r.store.OptimizeSearchIndex(ctx); err != nil {
108108+ r.log.Error("reindex: search index finalize failed", slog.String("error", err.Error()))
109109 result.Errors++
110110 }
111111 }
···11+CREATE TABLE IF NOT EXISTS documents (
22+ id TEXT PRIMARY KEY,
33+ did TEXT NOT NULL,
44+ collection TEXT NOT NULL,
55+ rkey TEXT NOT NULL,
66+ at_uri TEXT NOT NULL,
77+ cid TEXT NOT NULL,
88+ record_type TEXT NOT NULL,
99+ title TEXT,
1010+ body TEXT,
1111+ summary TEXT,
1212+ repo_did TEXT,
1313+ repo_name TEXT,
1414+ author_handle TEXT,
1515+ tags_json TEXT,
1616+ language TEXT,
1717+ created_at TEXT,
1818+ updated_at TEXT,
1919+ indexed_at TEXT NOT NULL,
2020+ web_url TEXT DEFAULT '',
2121+ deleted_at TEXT,
2222+ search_vector TSVECTOR GENERATED ALWAYS AS (
2323+ setweight(to_tsvector('simple', COALESCE(title, '')), 'A') ||
2424+ setweight(to_tsvector('simple', COALESCE(author_handle, '')), 'A') ||
2525+ setweight(to_tsvector('simple', COALESCE(repo_name, '')), 'B') ||
2626+ setweight(to_tsvector('simple', COALESCE(summary, '')), 'B') ||
2727+ setweight(to_tsvector('simple', COALESCE(body, '')), 'C') ||
2828+ setweight(to_tsvector('simple', COALESCE(tags_json, '')), 'D')
2929+ ) STORED
3030+);
3131+3232+CREATE INDEX IF NOT EXISTS idx_documents_did ON documents(did);
3333+CREATE INDEX IF NOT EXISTS idx_documents_collection ON documents(collection);
3434+CREATE INDEX IF NOT EXISTS idx_documents_record_type ON documents(record_type);
3535+CREATE INDEX IF NOT EXISTS idx_documents_repo_did ON documents(repo_did);
3636+CREATE INDEX IF NOT EXISTS idx_documents_created_at ON documents(created_at);
3737+CREATE INDEX IF NOT EXISTS idx_documents_deleted_at ON documents(deleted_at);
3838+CREATE INDEX IF NOT EXISTS idx_documents_search_vector ON documents USING GIN(search_vector);
3939+4040+CREATE TABLE IF NOT EXISTS sync_state (
4141+ consumer_name TEXT PRIMARY KEY,
4242+ cursor TEXT NOT NULL,
4343+ high_water_mark TEXT,
4444+ updated_at TEXT NOT NULL
4545+);
4646+4747+CREATE TABLE IF NOT EXISTS identity_handles (
4848+ did TEXT PRIMARY KEY,
4949+ handle TEXT NOT NULL,
5050+ is_active BOOLEAN NOT NULL DEFAULT TRUE,
5151+ status TEXT,
5252+ updated_at TEXT NOT NULL
5353+);
5454+5555+CREATE INDEX IF NOT EXISTS idx_identity_handles_handle ON identity_handles(handle);
5656+5757+CREATE TABLE IF NOT EXISTS record_state (
5858+ subject_uri TEXT PRIMARY KEY,
5959+ state TEXT NOT NULL,
6060+ updated_at TEXT NOT NULL
6161+);
···11+CREATE TABLE IF NOT EXISTS indexing_jobs (
22+ document_id TEXT PRIMARY KEY,
33+ did TEXT NOT NULL,
44+ collection TEXT NOT NULL,
55+ rkey TEXT NOT NULL,
66+ cid TEXT NOT NULL,
77+ record_json TEXT NOT NULL,
88+ source TEXT NOT NULL DEFAULT 'read_through',
99+ status TEXT NOT NULL,
1010+ attempts INTEGER NOT NULL DEFAULT 0,
1111+ last_error TEXT,
1212+ scheduled_at TEXT NOT NULL,
1313+ updated_at TEXT NOT NULL,
1414+ lease_owner TEXT DEFAULT '',
1515+ lease_expires_at TEXT DEFAULT '',
1616+ completed_at TEXT DEFAULT ''
1717+);
1818+1919+CREATE INDEX IF NOT EXISTS idx_indexing_jobs_status_scheduled
2020+ ON indexing_jobs(status, scheduled_at, updated_at);
2121+2222+CREATE INDEX IF NOT EXISTS idx_indexing_jobs_claim
2323+ ON indexing_jobs(status, scheduled_at, lease_expires_at, updated_at);
···11+CREATE TABLE IF NOT EXISTS jetstream_events (
22+ id BIGSERIAL PRIMARY KEY,
33+ time_us BIGINT NOT NULL,
44+ did TEXT NOT NULL,
55+ kind TEXT NOT NULL,
66+ collection TEXT,
77+ rkey TEXT,
88+ operation TEXT,
99+ payload TEXT NOT NULL,
1010+ received_at TEXT NOT NULL
1111+);
1212+1313+CREATE INDEX IF NOT EXISTS idx_jetstream_events_time_us
1414+ ON jetstream_events(time_us DESC);
···11+CREATE TABLE IF NOT EXISTS indexing_audit (
22+ id BIGSERIAL PRIMARY KEY,
33+ source TEXT NOT NULL,
44+ document_id TEXT NOT NULL,
55+ collection TEXT NOT NULL,
66+ cid TEXT NOT NULL,
77+ decision TEXT NOT NULL,
88+ attempt INTEGER NOT NULL DEFAULT 0,
99+ error TEXT,
1010+ created_at TEXT NOT NULL
1111+);
1212+1313+CREATE INDEX IF NOT EXISTS idx_indexing_audit_created
1414+ ON indexing_audit(created_at DESC);
1515+1616+CREATE INDEX IF NOT EXISTS idx_indexing_audit_document
1717+ ON indexing_audit(document_id, created_at DESC);