refactor: switch to pg driver · desertthunder.dev/twisted@9cf669c

+43 -52

README.md

··· 1 1 # Twisted 2 2 3 - Twisted is a monorepo for a Tangled mobile client and the supporting Tap-backed indexing API. 3 + Twisted is a monorepo for a Tangled mobile client and a Tap-backed indexing API. 4 4 5 5 ## Projects 6 6 7 - - `apps/twisted`: Ionic Vue client for browsing Tangled repos, profiles, issues, PRs, and indexed search results 8 - - `packages/api`: Go service that consumes Tangled records through Tap, fills gaps in the public Tangled API, and serves search 9 - - `docs`: top-level specs and plans, split by project under `docs/app` and `docs/api` 7 + - `apps/twisted`: Ionic Vue client for browsing Tangled repos, profiles, issues, PRs, and search 8 + - `packages/api`: Go service for ingest, search, read-through indexing, and activity cache 9 + - `docs`: project docs, ADRs, and operational references 10 10 11 11 ## Architecture 12 12 13 - The app still uses Tangled's public knot and PDS APIs for canonical repo and profile data. The API project adds two complementary capabilities: 13 + The app still reads canonical repo and profile data from Tangled and AT Protocol APIs. 14 + The API adds: 14 15 15 - 1. Global search over indexed Tangled content 16 - 2. Index-backed summaries for data that is hard to derive from the public API alone, such as followers 16 + 1. network-wide search over indexed Tangled content 17 + 2. index-backed summaries that are hard to derive from public APIs alone 17 18 18 - That keeps direct browsing honest while giving the client one place to ask for cross-network discovery and graph augmentation. 19 + The backend now targets PostgreSQL for both local and remote deployments. 19 20 20 21 ## Development 21 22 22 - Use the top-level [`justfile`](justfile) for common workflows (`just --list` to view) 23 - 24 - Use `apps/twisted/.env.local` for machine-local overrides such as a localhost API or OAuth callback. 25 - 26 - ## Run Locally 27 - 28 - Install dependencies once from the repo root: 23 + Install JS dependencies once: 29 24 30 25 ```bash 31 26 pnpm install 32 27 ``` 33 28 34 - Start the Ionic/Vite app: 29 + Default local database URL: 35 30 36 31 ```bash 37 - pnpm dev # or: just dev 32 + postgresql://localhost/${USER}_dev?sslmode=disable 38 33 ``` 39 34 40 - That serves the client from `apps/twisted` with Vite. 41 - 42 - To run the Go API locally for routine experimentation, no Turso credentials are required. 35 + That matches the Postgres.app-style local workflow and also matches the repo's 36 + `docker-compose.dev.yaml` if you want disposable local Postgres and Tap 37 + containers instead. 43 38 44 - Start the API in local file mode: 39 + Start the local database: 45 40 46 41 ```bash 47 - pnpm api:run:api # or: just api-dev 42 + just db-up 48 43 ``` 49 44 50 - This serves the API and search site on `http://localhost:8080` using 51 - `packages/api/twister-dev.db`. 45 + Run the mobile app: 52 46 53 - To run the API against remote Turso instead: 47 + ```bash 48 + pnpm dev 49 + ``` 50 + 51 + Run the API against local Postgres: 54 52 55 53 ```bash 56 - just api-dev remote 54 + just api-dev 57 55 ``` 58 56 59 - Run the API smoke checks from the repo root: 57 + Run the indexer against local Postgres: 60 58 61 59 ```bash 62 - uv run --project packages/scripts/api twister-api-smoke 60 + just api-run-indexer 63 61 ``` 64 62 65 - To verify admin endpoints as well, ensure `ADMIN_AUTH_TOKEN` is present in the 66 - environment before running the smoke script. 63 + Use `just api-dev sqlite` or `just api-run-indexer sqlite` only for the 64 + temporary SQLite rollback path. 67 65 68 - To run the indexer in local file mode as well: 66 + If you want the app to call the local API, put this in `apps/twisted/.env.local`: 69 67 70 68 ```bash 71 - pnpm api:run:indexer # or: just api-run-indexer 69 + VITE_TWISTER_API_BASE_URL=http://localhost:8080 72 70 ``` 73 71 74 - To run the indexer against remote Turso, `packages/api/.env` needs: 75 - 76 - - `TAP_URL` 77 - - `TAP_AUTH_PASSWORD` 78 - - `INDEXED_COLLECTIONS` 72 + Run the API smoke checks from the repo root: 79 73 80 74 ```bash 81 - just api-run-indexer remote 75 + uv run --project packages/scripts/api twister-api-smoke 82 76 ``` 83 77 84 - Typical local setup is three terminals: 78 + If `ADMIN_AUTH_TOKEN` is present, the smoke script also checks admin status. 85 79 86 - 1. `pnpm dev` 87 - 2. `pnpm api:run:api` 88 - 3. `pnpm api:run:indexer` 80 + ## Deployment 89 81 90 - If you want the app to call the local API, put this in `apps/twisted/.env.local`: 82 + Production deployment now uses Coolify plus a separate Coolify-managed 83 + PostgreSQL instance. The backend services are defined in 84 + `docker-compose.prod.yaml`. 91 85 92 - ```bash 93 - VITE_TWISTER_API_BASE_URL=http://localhost:8080 94 - ``` 95 - 96 - Dev builds keep the current OAuth flow available. Production builds are read-only 97 - and hide auth entry points for now. 86 + See [`docs/reference/deployment-walkthrough.md`](docs/reference/deployment-walkthrough.md) 87 + for the full setup, bootstrap, backup, and cutover flow. 98 88 99 89 ## Attributions 100 90 101 - This project relies heavily on the work of the [Tangled team](https://tangled.org/tangled.org) (duh) 102 - and the infrastructure made available by [microcosm](https://microcosm.blue), specifically 103 - Lightrail and Constellation. 91 + This project relies heavily on the work of the 92 + [Tangled team](https://tangled.org/tangled.org) and the infrastructure made 93 + available by [microcosm](https://microcosm.blue), especially Lightrail and 94 + Constellation.

+39

docker-compose.dev.yaml

··· 1 + services: 2 + postgres: 3 + image: postgres:17-alpine 4 + restart: unless-stopped 5 + ports: 6 + - "5432:5432" 7 + environment: 8 + POSTGRES_USER: ${USER:-postgres} 9 + POSTGRES_DB: ${USER:-postgres}_dev 10 + POSTGRES_HOST_AUTH_METHOD: trust 11 + volumes: 12 + - twisted-postgres:/var/lib/postgresql/data 13 + healthcheck: 14 + test: 15 + [ 16 + "CMD-SHELL", 17 + "pg_isready -h localhost -p 5432 -U ${USER:-postgres} -d ${USER:-postgres}_dev", 18 + ] 19 + interval: 10s 20 + timeout: 5s 21 + retries: 5 22 + 23 + tap: 24 + image: ghcr.io/bluesky-social/indigo/tap:sha-4f47add43060c27e8a37d9d76482ecddf001fcd8 25 + restart: unless-stopped 26 + ports: 27 + - "2480:2480" 28 + environment: 29 + TAP_BIND: :2480 30 + TAP_DATABASE_URL: sqlite:///data/tap.db 31 + TAP_ADMIN_PASSWORD: ${TAP_AUTH_PASSWORD:-twisted-dev} 32 + TAP_LOG_LEVEL: ${TAP_LOG_LEVEL:-info} 33 + TAP_COLLECTION_FILTERS: ${TAP_COLLECTION_FILTERS:-} 34 + volumes: 35 + - twisted-tap:/data 36 + 37 + volumes: 38 + twisted-postgres: 39 + twisted-tap:

+60

docker-compose.prod.yaml

··· 1 + services: 2 + api: 3 + build: 4 + context: ./packages/api 5 + dockerfile: Dockerfile 6 + command: ["twister", "api"] 7 + restart: unless-stopped 8 + environment: 9 + DATABASE_URL: ${DATABASE_URL:?DATABASE_URL is required} 10 + HTTP_BIND_ADDR: ${HTTP_BIND_ADDR:-:8080} 11 + LOG_LEVEL: ${LOG_LEVEL:-info} 12 + LOG_FORMAT: ${LOG_FORMAT:-json} 13 + SEARCH_DEFAULT_LIMIT: ${SEARCH_DEFAULT_LIMIT:-20} 14 + SEARCH_MAX_LIMIT: ${SEARCH_MAX_LIMIT:-100} 15 + SEARCH_DEFAULT_MODE: ${SEARCH_DEFAULT_MODE:-keyword} 16 + READ_THROUGH_MODE: ${READ_THROUGH_MODE:-missing} 17 + READ_THROUGH_COLLECTIONS: ${READ_THROUGH_COLLECTIONS:-} 18 + READ_THROUGH_MAX_ATTEMPTS: ${READ_THROUGH_MAX_ATTEMPTS:-5} 19 + INDEXED_COLLECTIONS: ${INDEXED_COLLECTIONS:-} 20 + ENABLE_ADMIN_ENDPOINTS: ${ENABLE_ADMIN_ENDPOINTS:-false} 21 + ADMIN_AUTH_TOKEN: ${ADMIN_AUTH_TOKEN:-} 22 + OAUTH_CLIENT_ID: ${OAUTH_CLIENT_ID:-} 23 + OAUTH_REDIRECT_URIS: ${OAUTH_REDIRECT_URIS:-} 24 + expose: 25 + - "8080" 26 + 27 + indexer: 28 + build: 29 + context: ./packages/api 30 + dockerfile: Dockerfile 31 + command: ["twister", "indexer"] 32 + restart: unless-stopped 33 + environment: 34 + DATABASE_URL: ${DATABASE_URL:?DATABASE_URL is required} 35 + INDEXER_HEALTH_ADDR: ${INDEXER_HEALTH_ADDR:-:9090} 36 + LOG_LEVEL: ${LOG_LEVEL:-info} 37 + LOG_FORMAT: ${LOG_FORMAT:-json} 38 + TAP_URL: ${TAP_URL:-ws://tap:2480/channel} 39 + TAP_AUTH_PASSWORD: ${TAP_AUTH_PASSWORD:?TAP_AUTH_PASSWORD is required} 40 + INDEXED_COLLECTIONS: ${INDEXED_COLLECTIONS:?INDEXED_COLLECTIONS is required} 41 + ENABLE_INGEST_ENRICHMENT: ${ENABLE_INGEST_ENRICHMENT:-true} 42 + expose: 43 + - "9090" 44 + 45 + tap: 46 + image: ghcr.io/bluesky-social/indigo/tap:sha-4f47add43060c27e8a37d9d76482ecddf001fcd8 47 + restart: unless-stopped 48 + environment: 49 + TAP_BIND: :2480 50 + TAP_DATABASE_URL: ${TAP_DATABASE_URL:-sqlite:///data/tap.db} 51 + TAP_ADMIN_PASSWORD: ${TAP_AUTH_PASSWORD:?TAP_AUTH_PASSWORD is required} 52 + TAP_LOG_LEVEL: ${TAP_LOG_LEVEL:-info} 53 + TAP_COLLECTION_FILTERS: ${TAP_COLLECTION_FILTERS:-} 54 + volumes: 55 + - tap-data:/data 56 + expose: 57 + - "2480" 58 + 59 + volumes: 60 + tap-data:

+12 -18

docs/README.md

··· 2 2 3 3 ## Reference 4 4 5 - Completed work — architecture, APIs, and data models as built. 6 - 7 - - [`reference/api.md`](reference/api.md) — Go search API service 5 + - [`reference/api.md`](reference/api.md) — API runtime, config, and data model 6 + - [`reference/deployment-walkthrough.md`](reference/deployment-walkthrough.md) — Coolify + Postgres deploy guide 7 + - [`reference/metrics.md`](reference/metrics.md) — rollout checks for API, indexer, and Postgres 8 + - [`reference/resync.md`](reference/resync.md) — backfill and rebuild recovery playbook 8 9 - [`reference/app.md`](reference/app.md) — Ionic Vue mobile app 9 - - [`reference/deployment-walkthrough.md`](reference/deployment-walkthrough.md) — Railway deployment guide 10 10 - [`reference/lexicons.md`](reference/lexicons.md) — Tangled AT Protocol record types 11 - - [`reference/metrics.md`](reference/metrics.md) — Railway and Turso usage checks after deploy 12 - - [`reference/resync.md`](reference/resync.md) — Backfill and repo-resync recovery playbook 13 11 14 12 ## Specs 15 13 16 - Forward-looking designs for remaining work. 17 - 18 - - [`specs/data-sources.md`](specs/data-sources.md) — Constellation, Tangled XRPC, Tap, AT Protocol, Bluesky OAuth 19 - - [`specs/search.md`](specs/search.md) — Search stabilization, indexing, activity cache, and later ranking work 20 - - [`specs/app-features.md`](specs/app-features.md) — Remaining mobile app features 14 + - [`specs/data-sources.md`](specs/data-sources.md) — upstream systems and API sources 15 + - [`specs/search.md`](specs/search.md) — current search/indexing direction 16 + - [`specs/app-features.md`](specs/app-features.md) — remaining mobile app features 21 17 22 - ## ADR Research 18 + ## ADRs 23 19 24 - Focused option analysis for pending architectural decisions. 25 - 26 - - [`adr/pg.md`](adr/pg.md) — PostgreSQL as a production backend option for Twister search 27 - - [`adr/turso.md`](adr/turso.md) — Turso/libSQL as a production backend option for Twister search 28 - - [`adr/storage.md`](adr/storage.md) — Accepted production storage decision for Twister search 20 + - [`adr/storage.md`](adr/storage.md) — accepted Postgres + Coolify storage decision 21 + - [`adr/pg.md`](adr/pg.md) — PostgreSQL research and tradeoffs 22 + - [`adr/turso.md`](adr/turso.md) — superseded Turso research kept for history 29 23 30 24 ## Roadmap 31 25 32 - - [`roadmap.md`](roadmap.md) — All remaining milestones and tasks 26 + - [`roadmap.md`](roadmap.md) — current milestone list

+42 -166

docs/adr/storage.md

··· 1 1 --- 2 - title: ADR - Choose Turso For Production Search Storage 3 - updated: 2026-03-25 2 + title: ADR - Choose PostgreSQL And Coolify For Search Storage 3 + updated: 2026-03-26 4 4 status: accepted 5 5 --- 6 6 7 7 ## Decision 8 8 9 - Twister will use Turso as the production database backend for search and indexing. 10 - 11 - This decision is based on current project constraints, not on a claim that Turso is universally superior to PostgreSQL. The goal is to ship a production-capable search service with the lowest migration cost from the current codebase while keeping room to revisit the decision later if the workload changes materially. 12 - 13 - ## Context 14 - 15 - Twister's current search hardening work has two separate concerns: 16 - 17 - 1. make local experimentation cheaper and less messy 18 - 2. choose a production backend deliberately instead of letting the experimentation setup turn into production by accident 19 - 20 - The codebase already relies on SQLite/libSQL-style behavior: 21 - 22 - - SQLite FTS5 search 23 - - SQLite-oriented migrations 24 - - `database/sql` access via `github.com/tursodatabase/libsql-client-go` 25 - - local experimentation through `file:` databases 26 - 27 - The production candidates researched were: 28 - 29 - - PostgreSQL 30 - - Turso remote/libSQL 31 - - Turso embedded-replica style deployment 32 - 33 - The supporting research is recorded in: 34 - 35 - - [PostgreSQL research](pg.md) 36 - - [Turso research](turso.md) 37 - 38 - ## Why Turso 39 - 40 - ### 1. Lowest Migration Cost 41 - 42 - Turso preserves the current SQLite/libSQL query model and avoids a full rewrite of: 43 - 44 - - search queries 45 - - migration files 46 - - ranking behavior 47 - - snippet generation behavior 48 - - search regression expectations 49 - 50 - PostgreSQL remains a credible long-term option, but adopting it now would force a larger rewrite at exactly the point where search hardening should focus on ingestion correctness, smoke tests, read-through indexing, and the activity cache. 51 - 52 - ### 2. Best Match For Current Priorities 53 - 54 - The immediate work is not to invent a new search architecture. It is to stabilize: 55 - 56 - - Tap ingestion 57 - - read-through indexing 58 - - JetStream activity caching 59 - - local experimentation workflows 60 - - end-to-end smoke testing 61 - 62 - Turso lets the project do that without changing database families midstream. 63 - 64 - ### 3. Clear Path From Experimentation To Production 65 - 66 - The local `file:` workflow remains the right choice for development and experimentation. For production, the chosen backend family is still Turso, which gives the project a cleaner transition than moving from local SQLite semantics to PostgreSQL semantics all at once. 67 - 68 - ### 4. Embedded Replicas Stay Optional 69 - 70 - This ADR does not require Turso embedded replicas immediately. 71 - 72 - The production choice is Turso as the backend family. The initial production shape can be plain remote libSQL if that is the least risky deployment path. Embedded replicas remain a future optimization if the Go driver and build constraints become acceptable. 73 - 74 - ## Why Not PostgreSQL Right Now 75 - 76 - PostgreSQL was the strongest long-term alternative, but it loses on near-term fit. 77 - 78 - Reasons not to choose it now: 9 + Twisted will use PostgreSQL as the primary database backend for search, 10 + indexing, queue state, and activity cache. The production deploy target is a 11 + Coolify application for `api`, `indexer`, and `tap` plus a separate 12 + Coolify-managed PostgreSQL instance. 79 13 80 - - it requires rewriting the current FTS5-based search implementation 81 - - it changes search behavior during a hardening phase where behavior stability matters 82 - - it increases migration scope before the ingestion model itself is stabilized 83 - - it solves an architectural future that the project has not yet fully reached 14 + ## Why 84 15 85 - If Twister later becomes a larger multi-process, write-heavy service with operational requirements that outgrow Turso, PostgreSQL can be reconsidered with better evidence. 16 + - PostgreSQL is the better fit for long-running multi-service deployment. 17 + - Coolify gives the project a straightforward Git-to-deploy path with built-in 18 + Traefik and a managed database resource. 19 + - The current service shape already wants two long-lived processes writing to 20 + one shared database. 21 + - A local PostgreSQL workflow keeps development closer to production than the 22 + old Turso split. 86 23 87 24 ## Consequences 88 25 89 26 ### Positive 90 27 91 - - minimum code churn from the current search implementation 92 - - fastest path to production-capable search hardening 93 - - preserves current SQLite FTS behavior as the baseline 94 - - keeps experimentation and production closer together conceptually 28 + - one mainstream database for local and remote environments 29 + - simpler production backups and restore story 30 + - easier operational model for `api` and `indexer` 31 + - no dependency on Turso-specific SQLite extension behavior 95 32 96 33 ### Negative 97 34 98 - - production remains in the SQLite/libSQL family, which may be less conventional than PostgreSQL for some operational teams 99 - - embedded replicas are not a drop-in next step in the current Go setup 100 - - a later move to PostgreSQL would still be a meaningful migration if Twister grows past Turso's sweet spot 101 - 102 - ## Production Shape 103 - 104 - The production recommendation is: 105 - 106 - 1. keep local `file:` databases for experimentation and development 107 - 2. use Turso remote/libSQL as the default production target 108 - 3. evaluate embedded replicas only after the main search-hardening work is stable 109 - 110 - This avoids coupling the production decision to a premature embedded-replica rollout. 111 - 112 - ## Follow-Up Work 113 - 114 - - define the migration path from the experimental local DB to the production Turso database 115 - - document backup and restore procedures for both local experimentation and production 116 - - keep PostgreSQL as a revisit option if production requirements change 117 - - explicitly evaluate embedded replicas later against Go driver and build constraints 118 - 119 - ## Experimental Local DB Procedures 120 - 121 - The experimental local DB is a workflow aid, not a production artifact. 122 - 123 - Operational rules: 124 - 125 - 1. Keep the database file out of git and treat it as disposable. 126 - 2. Use stop-and-copy backups for anything worth preserving. 127 - 3. Prefer restore-or-rebuild over repair if the DB becomes suspect. 128 - 4. Allow the file to grow during active experiments, then compact or delete it afterward. 129 - 130 - The concrete local backup, restore, and disk-growth procedures live in 131 - [docs/reference/api.md](/Users/owais/Projects/Twisted/docs/reference/api.md). 132 - 133 - ## Migration Path To Production Turso 134 - 135 - The migration path is intentionally code-first, not file-first. 136 - 137 - Do not promote `twister-dev.db` directly into production. The experimental DB proves schema, queries, and workflow assumptions, but the production dataset should be rebuilt from authoritative upstream sources. 138 - 139 - ### Phase 1: Stabilize Local Behavior 140 - 141 - - finalize schema changes in embedded migrations 142 - - validate search behavior locally 143 - - validate smoke tests against the local workflow 144 - 145 - Exit condition: 146 - 147 - - a fresh local database can be created from migrations and pass the smoke-test baseline 148 - 149 - ### Phase 2: Prepare Turso Production Target 150 - 151 - - provision the production Turso database 152 - - enable the required SQLite/libSQL features used by Twister 153 - - configure production credentials and environment variables 154 - - verify migrations apply cleanly to an empty production-shaped database 155 - 156 - Exit condition: 157 - 158 - - Twister can start against an empty Turso database and complete migrations successfully 159 - 160 - ### Phase 3: Rebuild The Dataset From Sources Of Truth 161 - 162 - - start the indexer against Turso 163 - - use Tap backfill and repo-resync paths to rebuild the searchable corpus 164 - - let read-through indexing fill misses during verification 165 - - build the JetStream activity cache from a recent timestamp cursor rather than from copied local state 166 - 167 - Exit condition: 35 + - the search layer must move off SQLite FTS5 36 + - ranking and snippet behavior will change 37 + - SQLite remains only as a temporary rollback path during migration 168 38 169 - - the production Turso dataset is populated from Tap, repo recovery paths, and API-triggered indexing rather than from a copied experimental DB file 39 + ## Search Shape 170 40 171 - ### Phase 4: Verify And Cut Over 41 + Keyword search will use PostgreSQL full-text search: 172 42 173 - - run the API smoke scripts against the Turso-backed environment 174 - - confirm health, search, document fetches, indexing, and activity cache behavior 175 - - switch app traffic only after the smoke-test baseline passes 43 + - weighted `tsvector` 44 + - `websearch_to_tsquery('simple', ...)` 45 + - `ts_rank_cd` 46 + - `ts_headline` 176 47 177 - Exit condition: 48 + The HTTP response shape stays stable, but exact scores and snippets are not 49 + expected to match the previous FTS5 implementation. 178 50 179 - - production traffic points at the Turso-backed deployment and the local experimental DB is no longer part of the serving path 51 + ## Migration Plan 180 52 181 - ## Explicit Non-Goal For Migration 53 + 1. add PostgreSQL connection/config support and local defaults 54 + 2. add a primary PostgreSQL migration set 55 + 3. move search and store implementations to PostgreSQL 56 + 4. deploy `api`, `indexer`, and `tap` from `docker-compose.prod.yaml` 57 + 5. rebuild data through `backfill`, `enrich`, and `reindex` 58 + 6. cut traffic over only after smoke checks pass 182 59 183 - The migration plan does not include a direct file copy from local SQLite to production Turso as the default rollout path. If a one-off import becomes necessary later, it should be treated as a separate migration task with its own validation steps. 60 + ## Explicit Non-Goal 184 61 185 - ## Revisit Conditions 62 + The default migration does not include a Turso-to-PostgreSQL data import. The 63 + serving dataset should be rebuilt from authoritative upstream sources. 186 64 187 - Re-open this ADR if any of the following become true: 65 + ## Related Records 188 66 189 - - Twister needs multiple high-write production workers across separate hosts 190 - - operational requirements start favoring standard PostgreSQL tooling over libSQL continuity 191 - - embedded replicas prove impractical in the Go runtime the project wants to keep 192 - - semantic and hybrid search work introduces storage requirements that fit PostgreSQL materially better 67 + - `docs/adr/pg.md` remains the background research for this decision 68 + - `docs/adr/turso.md` is retained as superseded historical context

+25 -106

docs/adr/turso.md

··· 1 1 --- 2 2 title: ADR Research - Turso For Production Search 3 - updated: 2026-03-25 4 - status: research 3 + updated: 2026-03-26 4 + status: superseded 5 5 --- 6 6 7 - ## Summary 7 + ## Status 8 8 9 - Turso is the lowest-migration production candidate because Twister already uses libSQL/SQLite-style storage and query patterns. It preserves the current mental model and minimizes rewrite cost. 9 + This research record is kept for history. It no longer describes the active 10 + deployment direction. 10 11 11 - The open question is not whether Turso can work, but which Turso mode fits production: 12 - 13 - - remote libSQL primary 14 - - local experimentation via plain `file:` SQLite 15 - - Turso embedded-replica style local-read, remote-sync patterns 16 - 17 - ## Why Consider It 18 - 19 - Twister already depends on: 20 - 21 - - `github.com/tursodatabase/libsql-client-go` 22 - - SQLite-style migrations 23 - - SQLite FTS5 behavior 24 - 25 - That makes Turso the shortest path from current code to a production-capable deployment. 26 - 27 - ## Fit For Twister 28 - 29 - ### Strengths 30 - 31 - #### Lowest Rewrite Cost 32 - 33 - Staying with Turso/libSQL keeps Twister in the same family of database semantics it already uses. Compared with PostgreSQL, this means less work in: 12 + ## Historical Summary 34 13 35 - - search query rewrites 36 - - migration rewrites 37 - - ranking behavior drift 38 - - compatibility testing 14 + Turso was originally attractive because Twisted already used SQLite-style 15 + queries, migrations, and FTS5 search behavior. It offered the shortest path 16 + from local file-backed development to a remotely hosted production database. 39 17 40 - #### Good Match For Local Experimentation 18 + ## Why It Was Superseded 41 19 42 - The current hardening plan already relies on local `file:` workflows to reduce the messiness and cost of experimentation. Turso and libSQL naturally support this style of development. 20 + The project has now chosen PostgreSQL plus Coolify instead. 43 21 44 - #### Embedded-Replica Model Is Relevant 22 + Main reasons: 45 23 46 - Turso's embedded replica story is directly relevant to Twister's workload because it allows: 24 + - Twisted now runs as multiple long-lived services against one shared database 25 + - the project wants standard production operations and restore tooling 26 + - local and production environments should converge on one database family 27 + - the cost of carrying Turso-specific behavior forward outweighed the migration 47 28 48 - - local reads from a file-backed database 49 - - sync to a remote primary 50 - - read-your-writes behavior for the initiating replica 51 - - periodic background sync 29 + ## What Still Matters From This Research 52 30 53 - On paper, this is a strong match for a search service that wants cheap local reads while keeping a remote production database. 31 + - rebuilding the dataset from upstream sources remains the safer default than 32 + promoting an experimental local database 33 + - embedded-replica ideas were interesting but were not a fit for the Go stack 34 + the project kept 35 + - search behavior changes must be treated as product-visible, not just as a 36 + storage swap 54 37 55 - ## Costs And Risks 38 + ## Current Source Of Truth 56 39 57 - ### Remote Turso Alone Does Not Solve The Current Pain 58 - 59 - The current problem statement came from burning reads and writes during experimentation. A plain remote Turso deployment keeps the same basic cost surface, even if production operations are cleaner than ad hoc local experiments. 60 - 61 - ### Embedded Replicas Have Important Caveats 62 - 63 - Turso's embedded replicas are promising, but the docs call out constraints that matter for Twister: 64 - 65 - - they require a real filesystem 66 - - they are not suitable for serverless environments without disk 67 - - local DB files should not be opened while syncing 68 - - sync behavior can amplify writes because replication is frame-based 69 - 70 - This means the operational model has to be chosen carefully. It is not a free "best of both worlds" switch. 71 - 72 - ### Current Go Stack Makes The Best Turso Story Harder 73 - 74 - This is the biggest repo-specific caveat. 75 - 76 - The Turso Go quickstart notes that `github.com/tursodatabase/libsql-client-go/libsql` does not support embedded replicas. Twister currently uses that library for remote libSQL access, while local file mode is handled separately with `modernc.org/sqlite`. 77 - 78 - Twister also currently builds with `CGO_ENABLED=0` in `packages/api/justfile`. 79 - 80 - That means the cleanest embedded-replica path may require: 81 - 82 - - changing drivers 83 - - reconsidering the pure-Go build constraint 84 - - accepting CGO in production builds, or waiting for a better pure-Go story 85 - 86 - So while Turso embedded replicas are attractive in principle, they are not a drop-in upgrade for the current codebase. 87 - 88 - ## Repo-Specific Implications 89 - 90 - - Remote Turso/libSQL is the easiest production continuation of the current code. 91 - - Local `file:` mode is already useful for stabilizing experimentation. 92 - - Embedded replicas are strategically interesting but would likely force deeper driver and build changes than the current roadmap implies. 93 - 94 - ## When Turso Is The Better Choice 95 - 96 - Choose Turso if most of the following are true: 97 - 98 - - minimizing migration cost is the top priority 99 - - preserving current SQLite FTS behavior matters 100 - - production can tolerate a simpler deployment model, especially early on 101 - - We want to keep search and experimentation close to the current implementation 102 - 103 - ## When Turso Needs Extra Caution 104 - 105 - Be careful with Turso if most of the following are true: 106 - 107 - - Twister needs multiple production writers across separate hosts 108 - - the system must avoid CGO and keep pure-Go builds 109 - - We expect embedded replicas to be a near-term production feature 110 - - operational simplicity matters more than minimizing query rewrites 111 - 112 - ## Recommendation 113 - 114 - Turso is the best near-term production candidate if the goal is minimum code churn and continuity with the current search stack. 115 - 116 - Remote Turso is the easiest short path. Embedded replicas are the most interesting medium-term Turso option, but they should be treated as additional engineering work rather than an assumption, especially given the current Go driver and build setup. 117 - 118 - ## Sources 119 - 120 - - [Turso Go quickstart](https://docs.turso.tech/sdk/go/quickstart#local-only) 121 - - [Turso embedded replicas docs](https://docs.turso.tech/features/embedded-replicas) 40 + See `docs/adr/storage.md` for the accepted storage decision.

+71 -186

docs/reference/api.md

··· 1 1 --- 2 2 title: API Service Reference 3 - updated: 2026-03-24 3 + updated: 2026-03-26 4 4 --- 5 5 6 - Twister is a Go service that indexes Tangled content and serves a search API. It connects to the AT Protocol ecosystem via Tap (firehose consumer), XRPC (direct record lookups), and Constellation (backlink queries), storing indexed data in Turso/libSQL with FTS5 full-text search. 6 + Twisted is a Go service that indexes Tangled content, serves search, and caches 7 + recent activity. It uses PostgreSQL for the primary runtime and retains a 8 + temporary local SQLite fallback behind `--local`. 7 9 8 - ## Architecture 10 + ## Runtime Modes 9 11 10 - The service is a single Go binary with multiple subcommands, each running a different runtime mode. All modes share the same database and configuration layer. 11 - 12 - **Runtime modes:** 13 - 14 - | Command | Purpose | 15 - | ------------- | ------------------------------------------------------------------- | 16 - | `api` (serve) | HTTP search API server | 17 - | `indexer` | Consumes Tap firehose events, normalizes and indexes records | 18 - | `backfill` | Discovers users from seed files, registers them with Tap | 19 - | `enrich` | Backfills missing metadata (repo names, handles, web URLs) via XRPC | 20 - | `reindex` | Re-syncs all documents into the FTS index | 21 - | `healthcheck` | One-shot liveness probe for container orchestration | 22 - 23 - The `embed-worker` and `reembed` commands exist as stubs for the upcoming semantic search pipeline (Nomic Embed Text v1.5 deployed via Railway template). 24 - 25 - All commands accept a `--local` flag that switches to a local SQLite file and text-format logging for development. 12 + | Command | Purpose | 13 + | --- | --- | 14 + | `api` | HTTP API server | 15 + | `indexer` | Tap consumer and index writer | 16 + | `backfill` | register repos with Tap | 17 + | `enrich` | fill missing repo names, handles, and web URLs | 18 + | `reindex` | re-upsert documents and finalize the search index | 19 + | `healthcheck` | one-shot config and process probe | 26 20 27 21 ## HTTP API 28 22 29 - The API server binds to `:8080` by default (configurable via `HTTP_BIND_ADDR`). CORS is open (`*` origin, GET/OPTIONS). 23 + - `GET /healthz` — liveness probe 24 + - `GET /readyz` — readiness probe, checks database reachability 25 + - `GET /search` — keyword search 26 + - `GET /documents/{id}` — fetch one indexed document 27 + - `GET /admin/status` — cursor and queue state when admin routes are enabled 30 28 31 - ### Search 29 + The API also serves the built-in search/docs site from `/` and `/docs*`. 32 30 33 - **`GET /search`** — Main search endpoint. Routes to keyword, semantic, or hybrid based on `mode` parameter. 31 + ## Search 34 32 35 - **`GET /search/keyword`** — Full-text search via FTS5 with BM25 scoring. 33 + Keyword search is implemented with PostgreSQL full-text search. 36 34 37 - Parameters: 35 + - weighted fields: title, author handle, repo name, summary, body, tags 36 + - query parser: `websearch_to_tsquery('simple', ...)` 37 + - ranking: `ts_rank_cd` 38 + - snippets: `ts_headline` 38 39 39 - - `q` (required) — Query string 40 - - `limit` (1–100, default 20) — Results per page 41 - - `offset` (default 0) — Pagination offset 42 - - `collection` — Filter by AT Protocol collection NSID 43 - - `type` — Filter by record type (repo, issue, pull, profile, string) 44 - - `author` — Filter by handle or DID 45 - - `repo` — Filter by repo name or DID 46 - - `language` — Filter by primary language 47 - - `from`, `to` — Date range (ISO 8601) 48 - - `state` — Filter issues/PRs by state (open, closed, merged) 49 - - `mode` — Search mode (keyword, semantic, hybrid) 40 + Response shape stays the same as the previous FTS5 API. Ranking and snippet 41 + details are allowed to differ from the SQLite-era implementation. 50 42 51 - Response includes query metadata, total count, and an array of results each containing: ID, collection, record type, title, summary, body snippet (with `<mark>` highlights), score, repo name, author handle, DID, AT-URI, web URL, and timestamps. 43 + ## Database 52 44 53 - **`GET /documents/{id}`** — Fetch a single document by stable ID. 45 + Primary backend: PostgreSQL. 54 46 55 - ### Health 56 - 57 - - **`GET /healthz`** — Liveness probe, always 200 58 - - **`GET /readyz`** — Readiness probe, pings database 47 + Main tables: 59 48 60 - ### Admin 49 + - `documents` 50 + - `sync_state` 51 + - `identity_handles` 52 + - `record_state` 53 + - `indexing_jobs` 54 + - `indexing_audit` 55 + - `jetstream_events` 61 56 62 - When `ENABLE_ADMIN_ENDPOINTS=true` with a configured `ADMIN_AUTH_TOKEN`: 57 + `documents` stores a generated weighted `tsvector` column plus a GIN index for 58 + keyword search. 63 59 64 - - **`GET /admin/status`** — Tap cursor, JetStream cursor, document count, and 65 - read-through queue status 66 - - **`GET /admin/indexing/jobs`** — List queue rows, filtered by `status`, 67 - `source`, or `document` 68 - - **`GET /admin/indexing/audit`** — List append-only audit rows, filtered by 69 - `source`, `decision`, or `document` 70 - - **`POST /admin/indexing/enqueue`** — Queue a single record by explicit body 71 - - **`POST /admin/reindex`** — Trigger FTS re-sync 60 + ## Configuration 72 61 73 - ### Smoke Checks 62 + Primary env vars: 74 63 75 - Smoke checks for the API surface live in `packages/scripts/api/`. 64 + - `DATABASE_URL` 65 + - `HTTP_BIND_ADDR` 66 + - `INDEXER_HEALTH_ADDR` 67 + - `TAP_URL` 68 + - `TAP_AUTH_PASSWORD` 69 + - `INDEXED_COLLECTIONS` 70 + - `READ_THROUGH_MODE` 71 + - `READ_THROUGH_COLLECTIONS` 72 + - `READ_THROUGH_MAX_ATTEMPTS` 73 + - `ENABLE_ADMIN_ENDPOINTS` 74 + - `ADMIN_AUTH_TOKEN` 76 75 77 - From the repo root: 76 + Default local database URL: 78 77 79 78 ```sh 80 - uv run --project packages/scripts/api twister-api-smoke 79 + postgresql://localhost/${USER}_dev?sslmode=disable 81 80 ``` 82 81 83 - If `ADMIN_AUTH_TOKEN` is present in the environment, the smoke script can also 84 - verify `GET /admin/status`. 85 - 86 - ### Static Content 82 + `--local` is deprecated and switches to the legacy SQLite fallback at 83 + `packages/api/twister-dev.db`. 87 84 88 - The API also serves a search site with live search and API documentation at `/` and `/docs*`, built with Alpine.js (no build step, embedded in `internal/view/`). 85 + ## Local Operation 89 86 90 - ## Database 87 + Start local Postgres with the repo compose file: 91 88 92 - Turso (libSQL) with the following tables: 89 + ```sh 90 + just db-up 91 + just api-dev 92 + just api-run-indexer 93 + ``` 93 94 94 - **documents** — Core search index. Each record gets a stable ID of `did|collection|rkey`. Stores title, body, summary, metadata (repo name, author handle, web URL, language, tags), and timestamps. Soft-deleted via `deleted_at`. 95 + That dev compose file also runs Tap locally at `ws://localhost:2480/channel`. 95 96 96 - **documents_fts** — FTS5 virtual table for full-text search over title, body, summary, repo name, author handle, and tags. Uses `unicode61` tokenizer with tuned BM25 weights (title weighted highest at 2.5, then author handle at 2.0, summary at 1.5). 97 - 98 - **sync_state** — Cursor tracking for the Tap consumer. Stores consumer name, current cursor, high water mark, and last update time. Enables crash-safe resume. 99 - 100 - **identity_handles** — DID-to-handle cache. Updated from Tap identity events and XRPC lookups. 101 - 102 - **record_state** — Issue and PR state cache (open/closed/merged). Keyed by subject AT-URI. 103 - 104 - **indexing_jobs** — Durable read-through/admin queue with status, lease owner, 105 - lease expiry, retry counters, and terminal states (`failed`, `dead_letter`). 106 - 107 - **indexing_audit** — Append-only record of enqueue decisions, retries, skips, 108 - completions, and dead letters. 109 - 110 - **document_embeddings** — Vector storage (768-dim F32_BLOB with DiskANN cosine index). Schema ready but not yet populated. 111 - 112 - **embedding_jobs** — Async embedding job queue. Schema ready but worker not yet active. 113 - 114 - ## Indexing Pipeline 115 - 116 - The indexer connects to Tap via WebSocket, consuming AT Protocol record events in real-time. For each event: 117 - 118 - 1. Filter against the configured collection allowlist (supports wildcards like `sh.tangled.*`) 119 - 2. Route to the appropriate normalizer based on collection 120 - 3. Normalize into a document (extract title, body, summary, metadata) 121 - 4. Optionally enrich via XRPC (resolve author handle, repo name, web URL) 122 - 5. Upsert into the database (auto-syncs FTS) 123 - 6. Persist the Tap cursor and then acknowledge the event 124 - 125 - The indexer resumes from its last cursor on restart and replays idempotently. 126 - It logs status every 30 seconds and uses exponential backoff (1s–5s) for 127 - transient failures. 128 - 129 - Read-through indexing is `missing` by default. Only allowed collections can be 130 - queued, detail reads queue single focal records, and bulk list handlers no 131 - longer enqueue whole collections. 132 - 133 - ## Record Normalizers 134 - 135 - Each AT Protocol collection has a dedicated normalizer that extracts searchable content: 136 - 137 - | Collection | Record Type | Searchable | Content | 138 - | ------------------------------- | ------------- | ------------------------ | --------------------------- | 139 - | `sh.tangled.repo` | repo | Yes (if named) | Name, description, topics | 140 - | `sh.tangled.repo.issue` | issue | Yes | Title, body, repo reference | 141 - | `sh.tangled.repo.pull` | pull | Yes | Title, body, target branch | 142 - | `sh.tangled.repo.issue.comment` | issue_comment | Yes (if has body) | Comment body | 143 - | `sh.tangled.repo.pull.comment` | pull_comment | Yes (if has body) | Comment body | 144 - | `sh.tangled.string` | string | Yes (if has content) | Filename, contents | 145 - | `sh.tangled.actor.profile` | profile | Yes (if has description) | Profile description | 146 - | `sh.tangled.graph.follow` | follow | No | Graph edge only | 147 - 148 - State records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) update the `record_state` table rather than creating documents. 149 - 150 - ## XRPC Client 151 - 152 - The built-in XRPC client provides typed access to AT Protocol endpoints with caching (1-hour TTL for DID docs and repo names): 153 - 154 - - DID resolution via PLC Directory (`did:plc:`) or `.well-known/did.json` (`did:web:`) 155 - - Identity resolution (PDS endpoint + handle from DID document) 156 - - Record fetching (`com.atproto.repo.getRecord`, `com.atproto.repo.listRecords`) 157 - - Repo name resolution from `sh.tangled.repo` records 158 - - Web URL construction for Tangled entities 159 - 160 - ## Backfill 161 - 162 - The backfill command now defaults to `--source lightrail`: it calls 163 - `com.atproto.sync.listReposByCollection`, dedupes returned DIDs, and batch 164 - submits them to Tap. `--source graph` keeps the older seed-file follow and 165 - collaborator crawl for targeted fallback runs. 166 - 167 - ## Configuration 168 - 169 - All configuration is via environment variables (with `.env` file support): 170 - 171 - | Variable | Default | Purpose | 172 - | -------------------------- | ----------------------- | ----------------------------------------------- | 173 - | `TURSO_DATABASE_URL` | — | Database connection (required unless `--local`) | 174 - | `TURSO_AUTH_TOKEN` | — | Auth token (required for remote) | 175 - | `TAP_URL` | — | Tap WebSocket URL | 176 - | `TAP_AUTH_PASSWORD` | — | Tap admin password | 177 - | `INDEXED_COLLECTIONS` | all | Collection allowlist (CSV, supports wildcards) | 178 - | `READ_THROUGH_MODE` | missing | `off`, `missing`, or `broad` | 179 - | `READ_THROUGH_COLLECTIONS` | `INDEXED_COLLECTIONS` | Read-through allowlist | 180 - | `READ_THROUGH_MAX_ATTEMPTS`| 5 | Retries before `dead_letter` | 181 - | `HTTP_BIND_ADDR` | `:8080` | API server bind address | 182 - | `INDEXER_HEALTH_ADDR` | `:9090` | Indexer health probe address | 183 - | `LOG_LEVEL` | info | debug/info/warn/error | 184 - | `LOG_FORMAT` | json | json or text | 185 - | `ENABLE_ADMIN_ENDPOINTS` | false | Enable admin routes | 186 - | `ADMIN_AUTH_TOKEN` | — | Bearer token for admin | 187 - | `ENABLE_INGEST_ENRICHMENT` | true | XRPC enrichment at ingest time | 188 - | `PLC_DIRECTORY_URL` | `https://plc.directory` | PLC Directory | 189 - | `XRPC_TIMEOUT` | 15s | XRPC HTTP timeout | 190 - 191 - Recommended production practice is to use explicit search-relevant collection 192 - lists for `INDEXED_COLLECTIONS` and `READ_THROUGH_COLLECTIONS`, not 193 - `sh.tangled.*`, and to leave `sh.tangled.graph.follow` out of both. 97 + Use `just api-dev sqlite` only when you need the temporary SQLite rollback path. 194 98 195 99 ## Deployment 196 100 197 - Deployed on Railway with three services: 198 - 199 - - **api** — HTTP server (port 8080, health at `/readyz`) 200 - - **indexer** — Tap consumer (health at `:9090/health`) 201 - - **tap** — Tap instance (external dependency) 202 - 203 - All services share the same Turso database. The API and indexer are separate deployments of the same binary with different subcommands. 204 - 205 - ## Experimental Local DB 206 - 207 - The local development database lives at `packages/api/twister-dev.db` when the 208 - API runs with `--local`. 209 - 210 - Operational rules: 211 - 212 - 1. Stop the API before backup or restore. 213 - 2. Copy `twister-dev.db` and any matching `-wal` or `-shm` files together. 214 - 3. Prefer restore-or-rebuild over repair if the file becomes suspect. 215 - 4. Let the DB grow during active experiments, then compact or delete it later. 101 + Production uses: 216 102 217 - Useful local inspection: 103 + - Coolify Application with `docker-compose.prod.yaml` 104 + - separate Coolify-managed PostgreSQL resource 105 + - private Tap service from the pinned Indigo image 106 + - built-in Coolify Traefik for the public `api` domain 218 107 219 - ```sh 220 - cd packages/api 221 - du -h twister-dev.db* 222 - ls -lh twister-dev.db* 223 - ``` 108 + See `docs/reference/deployment-walkthrough.md` for the full production flow.

+77 -95

docs/reference/deployment-walkthrough.md

··· 1 1 # Deployment Walkthrough 2 2 3 - This repo maps cleanly to Railway, but only for the backend pieces. 3 + Twisted deploys to Coolify as one Compose application with three services: 4 4 5 - - Deploy `packages/api` to Railway as two services: `api` and `indexer`. 6 - - Keep the Ionic + Capacitor app on your machine or in CI for native builds. 7 - - Point the mobile app at the Railway `api` service with 8 - `VITE_TWISTER_API_BASE_URL`. 5 + - `api`: public HTTP service 6 + - `indexer`: private Tap consumer 7 + - `tap`: private Indigo Tap service 9 8 10 - ## What Railway Should Host 9 + PostgreSQL is a separate Coolify-managed resource. 11 10 12 - Railway is a good home for the Go services in this repo: 11 + ## Files 13 12 14 - - `api`: serves HTTP routes, docs, search, proxies, and readiness checks 15 - - `indexer`: consumes Tap, writes into Turso, and exposes its own health endpoint 16 - Railway is not the place that ships the native iOS or Android app. You still 17 - build, sign, and distribute the Capacitor shells separately. 13 + - production compose: `docker-compose.prod.yaml` 14 + - local dev compose: `docker-compose.dev.yaml` 15 + - app image build: `packages/api/Dockerfile` 16 + - Tap image: `ghcr.io/bluesky-social/indigo/tap:sha-4f47add43060c27e8a37d9d76482ecddf001fcd8` 18 17 19 18 ## Prerequisites 20 19 21 - Before you start, have these ready: 20 + - Coolify access 21 + - one Coolify PostgreSQL resource 22 + - this repo connected to Coolify 23 + - explicit `INDEXED_COLLECTIONS` and `READ_THROUGH_COLLECTIONS` 24 + - one shared Tap admin password 22 25 23 - - a Railway account and the Railway CLI 24 - - a Turso database URL and auth token 25 - - a Tap URL and Tap auth password 26 - From this machine: 26 + ## Provision PostgreSQL 27 27 28 - ```sh 29 - cd /Users/owais/Projects/Twisted 30 - railway login 31 - ``` 28 + Create the PostgreSQL resource first. 32 29 33 - ## Create The Railway Project 34 - 35 - In the Railway dashboard, create one empty project with two empty services: 36 - 37 - - `api` 38 - - `indexer` 39 - Then link this repo to that project: 30 + - keep the generated connection string in Coolify secrets as `DATABASE_URL` 31 + - use PostgreSQL backups from the database resource 32 + - point both `api` and `indexer` at the same database 40 33 41 - ```sh 42 - cd /Users/owais/Projects/Twisted 43 - railway link 44 - ``` 34 + ## Create The Coolify App 45 35 46 - ## Configure Service Shape 36 + In Coolify: 47 37 48 - Both services should deploy from the same local path: 38 + 1. create a new Application 39 + 2. choose the Docker Compose build pack 40 + 3. point it at this repo 41 + 4. set base directory to `/` 42 + 5. set compose file location to `/docker-compose.prod.yaml` 49 43 50 - - path: `packages/api` 51 - - build source: `packages/api/Dockerfile` 52 - Set the service start commands in Railway: 53 - - `api`: `twister api` 54 - - `indexer`: `twister indexer` 55 - The checked-in Dockerfile already builds the `twister` binary. 44 + Do not add your own Traefik container. Coolify already provides the proxy. 56 45 57 - ## Set Variables 46 + ## Set Environment Variables 58 47 59 - Use shared variables for values both services need: 48 + Shared: 60 49 61 - - `TURSO_DATABASE_URL` 62 - - `TURSO_AUTH_TOKEN` 50 + - `DATABASE_URL` 51 + - `INDEXED_COLLECTIONS` 63 52 - `LOG_LEVEL=info` 64 53 - `LOG_FORMAT=json` 65 - Set these on `api`: 66 - - `HTTP_BIND_ADDR=0.0.0.0:${{ PORT }}` 54 + - `TAP_AUTH_PASSWORD=<required>` 55 + 56 + `api`: 57 + 58 + - `HTTP_BIND_ADDR=:8080` 67 59 - `SEARCH_DEFAULT_LIMIT=20` 68 60 - `SEARCH_MAX_LIMIT=100` 69 61 - `READ_THROUGH_MODE=missing` 70 - - `READ_THROUGH_COLLECTIONS=<explicit search collection CSV>` 62 + - `READ_THROUGH_COLLECTIONS=<explicit CSV>` 71 63 - `READ_THROUGH_MAX_ATTEMPTS=5` 72 64 - `ENABLE_ADMIN_ENDPOINTS=false` 73 - - `ADMIN_AUTH_TOKEN=<set this if admin routes are enabled>` 74 - Set these on `indexer`: 75 - - `INDEXER_HEALTH_ADDR=0.0.0.0:${{ PORT }}` 76 - - `TAP_URL=<your Tap URL>` 77 - - `TAP_AUTH_PASSWORD=<your Tap password>` 78 - - `INDEXED_COLLECTIONS=<matching explicit search collection CSV>` 79 - - `ENABLE_INGEST_ENRICHMENT=true` 80 - Do not use `sh.tangled.*` for those allowlists. Match the Lightrail-backed 81 - search collection set and leave `sh.tangled.graph.follow` out. 82 - Optional OAuth variables for a Railway-hosted web client metadata endpoint: 83 - - `OAUTH_CLIENT_ID` 84 - - `OAUTH_REDIRECT_URIS` 85 - The `${{ PORT }}` reference matters. Railway health checks run against the 86 - service port it injects, so the process must listen on that port. 65 + - `ADMIN_AUTH_TOKEN=<optional>` 66 + - `OAUTH_CLIENT_ID=<optional>` 67 + - `OAUTH_REDIRECT_URIS=<optional CSV>` 87 68 88 - ## Deploy From This Machine 69 + `indexer`: 89 70 90 - From the repo root, deploy `packages/api` into each Railway service: 71 + - `INDEXER_HEALTH_ADDR=:9090` 72 + - `TAP_URL=ws://tap:2480/channel` 73 + - `ENABLE_INGEST_ENRICHMENT=true` 91 74 92 - ```sh 93 - cd /Users/owais/Projects/Twisted 94 - railway up packages/api --path-as-root --service api 95 - railway up packages/api --path-as-root --service indexer 96 - ``` 75 + `tap`: 97 76 98 - `--path-as-root` is important in this monorepo. It makes `packages/api` the 99 - deployment root instead of archiving the whole repo. 77 + - `TAP_COLLECTION_FILTERS=<optional explicit CSV>` 78 + - optional persistent volume override if you do not want the default `/data` 100 79 101 - ## Configure Health Checks 80 + Use explicit search collections. Do not use `sh.tangled.*` in production. 102 81 103 - Set the health check path in Railway for each service: 82 + ## Domains And Health Checks 104 83 105 - - `api`: `/readyz` 106 - - `indexer`: `/health` 107 - `/readyz` is the better API check because it verifies database reachability. 84 + Expose only `api` publicly. 108 85 109 - ## First Bootstrap 86 + - assign the domain in Coolify to the `api` service 87 + - if `api` stays on `:8080`, include that internal port in the Coolify mapping 88 + - configure readiness checks against `GET /readyz` 89 + - keep `indexer` and `tap` private 90 + - monitor `indexer` with `GET /health` 110 91 111 - A fresh environment is not search-ready just because the services booted. 92 + ## First Bootstrap 112 93 113 - 1. Deploy `api`. 114 - 2. Deploy `indexer`. 115 - 3. Confirm the `api` domain returns `200` from `/readyz`. 116 - 4. Confirm the `indexer` returns `200` from `/health`. 117 - 5. Run the initial backfill against the same Turso and Tap environment. 118 - Use Railway shell so the command runs inside the live `indexer` environment: 94 + 1. deploy `tap` 95 + 2. deploy `api` 96 + 3. deploy `indexer` 97 + 4. confirm `api` returns `200` from `/readyz` 98 + 5. confirm `indexer` returns `200` from `/health` 99 + 6. confirm `indexer` can reach `ws://tap:2480/channel` 100 + 7. open a Coolify terminal in the `indexer` service and run: 119 101 120 102 ```sh 121 - cd /Users/owais/Projects/Twisted 122 - railway link # Select indexer service if prompted 123 - railway shell 124 - twister backfill --source lightrail 103 + twister backfill 104 + twister enrich 105 + twister reindex 125 106 ``` 126 107 127 - Do not call the environment ready until that first backfill has completed. 108 + This rebuilds the serving dataset from authoritative sources. Do not import the 109 + old Turso data as the default migration path. 128 110 129 - ## Point The App At Railway 111 + ## Point The App At Coolify 130 112 131 - For local app builds, set the Railway API URL in `apps/twisted/.env`: 113 + For local app builds: 132 114 133 115 ```sh 134 116 VITE_TWISTER_API_BASE_URL=https://<your-api-domain> 135 117 ``` 136 118 137 - Then build or run the app as usual: 119 + Then run the app normally with `pnpm --dir apps/twisted dev` or `build`. 120 + 121 + ## Rollback Notes 138 122 139 - ```sh 140 - pnpm --dir apps/twisted dev 141 - pnpm --dir apps/twisted build 142 - pnpm --dir apps/twisted exec cap sync 143 - ``` 123 + - keep the SQLite `--local` path only as a temporary development fallback 124 + - rollback production by restoring PostgreSQL and redeploying the prior app 125 + - treat PostgreSQL restore as the database rollback primitive

+47 -79

docs/reference/metrics.md

··· 1 1 # Metrics To Watch 2 2 3 - Use this after deploying the Lightrail-backed backfill flow and detail-only 4 - read-through changes. 3 + Use this after rolling out the Coolify + PostgreSQL deployment. 5 4 6 - ## Goal 5 + ## Goals 7 6 8 7 Confirm that: 9 8 10 - - the API stops creating broad read-through churn during browse traffic 11 - - the indexer still keeps search current through Tap 12 - - bootstrap backfills become cheaper and more predictable 9 + - `api` stays stable under browse-heavy traffic 10 + - `indexer` keeps search current through Tap 11 + - PostgreSQL handles ingest, queue churn, and activity writes without backlog 13 12 14 - ## Railway 13 + ## Coolify Application 15 14 16 - Watch both `api` and `indexer` for 24 to 48 hours after deploy. 15 + Watch both services for 24 to 48 hours after deploy. 17 16 18 - ### API service 17 + ### API 19 18 20 19 Expected direction: 21 20 22 - - lower average CPU 23 - - fewer latency spikes on browse-heavy endpoints 24 - - lower memory churn from fewer queued background jobs 21 + - lower latency spikes 22 + - stable memory 23 + - flat restart count 25 24 26 25 Useful checks: 27 26 28 - - CPU usage before and after deploy 29 - - memory usage before and after deploy 30 - - request latency for browse-heavy periods 27 + - request latency 28 + - CPU and memory 29 + - `/readyz` failures 31 30 - restart count 32 31 33 - If this change is helping, the API should look flatter under normal browsing, 34 - especially when clients hit repo lists, issue lists, pull lists, or follows. 35 - 36 - ### Indexer service 32 + ### Indexer 37 33 38 34 Expected direction: 39 35 40 - - similar steady-state load during normal Tap ingest 41 - - shorter, more deliberate spikes only when `twister backfill` is run 36 + - steady-state load during Tap ingest 37 + - bounded spikes during `backfill`, `enrich`, and `reindex` 38 + - `/health` stays green outside deploy windows 42 39 43 40 Useful checks: 44 41 45 - - CPU during normal operation 46 - - CPU during `twister backfill --source lightrail` 47 - - memory during backfill 42 + - CPU and memory 48 43 - restart count 44 + - Tap reconnect frequency 45 + - queue drain time after backfill 49 46 50 - The indexer may still spike during an initial bootstrap. That is expected. The 51 - important change is that the API should stop causing constant incidental work. 52 - 53 - ## Turso 54 - 55 - This is where the clearest savings should show up. 47 + ## PostgreSQL 56 48 57 49 Expected direction: 58 50 59 - - fewer write operations 60 - - fewer row updates in indexing job tables 61 - - lower write amplification from browse traffic 51 + - predictable connection count 52 + - stable write latency during ingest 53 + - no long-lived lock buildup in `indexing_jobs` 54 + - bounded table growth in `jetstream_events` and `indexing_audit` 62 55 63 56 Useful checks: 64 57 65 - - total row writes 66 - - total queries 67 - - write-heavy windows during normal app usage 68 - - latency on write statements if you have it 69 - 70 - The main reduction should come from no longer enqueueing whole list responses 71 - into `indexing_jobs` during browse requests. 58 + - connections 59 + - disk growth 60 + - slow queries 61 + - write latency 62 + - backup duration and restore confidence 72 63 73 - ## Twister Admin Signals 64 + ## Admin Signals 74 65 75 - If admin endpoints are enabled, compare these before and after deploy: 66 + If admin routes are enabled, compare: 76 67 77 68 - `read_through.pending` 78 69 - `read_through.processing` 79 70 - `read_through.failed` 80 71 - `read_through.dead_letter` 81 - - `read_through.last_processed_at` 72 + - `tap.cursor` 73 + - `jetstream.cursor` 82 74 83 - Healthy post-change behavior: 75 + Healthy behavior: 84 76 85 77 - pending stays near zero most of the time 86 - - processing only bumps when detail pages fetch missing records 87 - - failed and dead-letter counts grow slowly, not continuously 88 - 89 - Relevant endpoint: 90 - 91 - ```sh 92 - curl -H "Authorization: Bearer $ADMIN_AUTH_TOKEN" http://<api-host>/admin/status 93 - ``` 94 - 95 - ## What To Compare 96 - 97 - Use the same day-of-week and similar traffic windows if possible. 98 - 99 - Good comparisons: 100 - 101 - - 24 hours before deploy vs 24 hours after deploy 102 - - one browse-heavy period before vs after 103 - - one bootstrap backfill run before vs after 78 + - processing drains after bursts 79 + - failed and dead-letter stay small and explainable 104 80 105 81 ## Success Signals 106 82 107 - Treat the rollout as successful if most of these are true: 108 - 109 - - API CPU is lower or less spiky under normal browsing 110 - - Turso writes drop during browse-heavy traffic 111 - - read-through queue counts stay close to zero most of the time 112 - - backfill runs complete with fewer upstream calls and cleaner batching 113 - - search freshness still tracks Tap ingest without visible regressions 83 + - `/readyz` and `/health` remain consistently green 84 + - search freshness tracks Tap ingest 85 + - backfill and enrich jobs complete without manual cleanup 86 + - PostgreSQL latency stays stable during bootstrap and normal use 114 87 115 88 ## Failure Signals 116 89 117 - Investigate if you see any of these: 118 - 119 - - search misses rise after deploy 120 - - detail pages repeatedly enqueue the same records 121 - - `read_through.pending` grows and does not drain 122 - - indexer CPU stays elevated long after a bootstrap run 123 - - Turso writes do not drop despite the handler changes 124 - 125 - If that happens, inspect Tap coverage first, then spot-check whether operators 126 - ran `twister backfill --source lightrail` for the environment. 90 + - queue counts rise and do not drain 91 + - `/readyz` flips during normal browse traffic 92 + - search misses rise after cutover 93 + - PostgreSQL write latency climbs during normal ingest 94 + - restores or backups are failing or taking too long

+54 -117

docs/reference/resync.md

··· 3 3 updated: 2026-03-26 4 4 --- 5 5 6 - Twister's search index has three recovery paths. Choose based on what broke. 7 - 8 - | Situation | Recovery path | 9 - | ----------------------------------------------------- | -------------------------------------------- | 10 - | FTS index corrupted or drifted from stored documents | `twister reindex` | 11 - | Documents missing — never received via Tap | `twister backfill` + let the indexer consume | 12 - | Documents missing — received but fields empty/wrong | `twister enrich` | 13 - | Full index loss — DB dropped or migrated | backfill then reindex then enrich | 14 - | Tap cursor too far ahead — events skipped after a gap | cursor reset via `sync_state` table | 15 - 16 - --- 6 + Twisted has three recovery tools. Choose based on what broke. 17 7 18 - ## Paths Overview 19 - 20 - **Tap** is the authoritative ingest and backfill path. Documents reach the index 21 - when the `indexer` consumes events from Tap. Completeness depends on which DIDs 22 - Tap is tracking. 23 - 24 - **Read-through indexing** now runs in `missing` mode by default: when the API 25 - fetches a record that is absent or stale, and the collection is allowed, it 26 - enqueues a background job. Bulk list reads no longer enqueue entire collections. 27 - 28 - **JetStream** feeds only the activity cache (`/activity`). It does not contribute 29 - to the search index. 30 - 31 - --- 8 + | Situation | Recovery path | 9 + | --- | --- | 10 + | Search results wrong but documents exist | `twister reindex` | 11 + | Documents missing because Tap never delivered them | `twister backfill` | 12 + | Documents exist but derived metadata is empty or stale | `twister enrich` | 13 + | Full database loss or migration to a fresh PostgreSQL instance | backfill, enrich, reindex | 32 14 33 15 ## Commands 34 16 35 17 ### `twister indexer` 36 18 37 - Runs the Tap consumer. Must be running continuously for real-time indexing. 38 - Persists cursor to `sync_state` table under consumer name `indexer-tap-v1`. 19 + Runs the Tap consumer continuously. Persists its cursor in `sync_state`. 39 20 40 21 ### `twister backfill` 41 22 42 - Defaults to `--source lightrail`: discovers DIDs from 43 - `com.atproto.sync.listReposByCollection` and submits them to Tap in batches. 44 - Use `--source graph` only for targeted fallback seeding from handles or DIDs. 23 + Default source is `lightrail`. Use graph mode only for targeted fallback. 45 24 46 25 ```sh 47 - # full-network dry-run first 48 26 twister backfill --dry-run 49 - 50 - # full-network bootstrap 51 27 twister backfill 52 - 53 - # targeted fallback 54 - twister backfill --source graph --seeds seeds.txt --max-hops 2 \ 55 - --concurrency 5 --batch-size 10 --batch-delay 1s 28 + twister backfill --source graph --seeds seeds.txt --max-hops 2 56 29 ``` 57 30 58 - Safe to re-run. Discovery deduplicates and `repos/add` is treated as idempotent. 31 + Safe to rerun. Discovery is deduplicated and Tap registration is treated as 32 + idempotent. 59 33 60 34 ### `twister reindex` 61 35 62 - Re-upserts stored documents into the FTS table and runs `optimize`. Does not 63 - re-fetch from upstream — only re-processes what is already in the DB. 36 + Re-upserts stored documents so PostgreSQL recomputes search state from the 37 + canonical `documents` rows. 64 38 65 39 ```sh 66 - twister reindex # all documents 40 + twister reindex 67 41 twister reindex --collection sh.tangled.repo 68 42 twister reindex --did did:plc:abc123 69 - twister reindex --dry-run # preview without writing 43 + twister reindex --dry-run 70 44 ``` 71 45 72 - Run this when: FTS results are stale after a schema migration, after a bulk 73 - document import, or whenever search quality seems inconsistent with stored data. 74 - 75 46 ### `twister enrich` 76 47 77 - Resolves missing `author_handle`, `repo_name`, and `web_url` via XRPC for 78 - documents already in the DB. 48 + Fills missing `author_handle`, `repo_name`, and `web_url`. 79 49 80 50 ```sh 81 - twister enrich # all documents 51 + twister enrich 82 52 twister enrich --collection sh.tangled.repo.issue 83 53 twister enrich --did did:plc:abc123 84 54 twister enrich --dry-run 85 55 ``` 86 - 87 - Run this when: search results show documents with empty author handles, or 88 - after deploying enrichment logic changes. 89 - 90 - --- 91 56 92 57 ## Scenario Playbooks 93 58 94 - ### FTS index out of sync 59 + ### Search drift 95 60 96 - Documents exist in the DB but search returns wrong/stale results. 61 + If search results look stale but the document rows are present: 97 62 98 63 ```sh 99 - twister reindex --dry-run # confirm scope 100 - twister reindex # re-upsert + FTS optimize 64 + twister reindex --dry-run 65 + twister reindex 101 66 ``` 102 67 103 - Verify with `GET /search?q=<known-term>`. 68 + ### Missing documents 104 69 105 - ### Documents missing from search 70 + If a record is fetchable through the API but not searchable: 106 71 107 - Fetch a known record directly. If it returns from `/actors/{handle}/repos/{repo}` 108 - but does not appear in `/search`, the document was never indexed. 109 - 110 - 1. Check if the DID is tracked by Tap. If not, run `backfill`: 111 - 112 - ```sh 113 - twister backfill --source graph --seeds <handle-or-did> --max-hops 0 114 - ``` 115 - 116 - 2. Once Tap is tracking the DID, the `indexer` will deliver historical events. 117 - Monitor progress via `GET /admin/status` and inspect backlog or failures with 118 - `GET /admin/indexing/jobs` and `GET /admin/indexing/audit`. 72 + 1. make sure Tap is tracking the DID 73 + 2. run targeted `backfill` if needed 74 + 3. let `indexer` drain 75 + 4. re-run `enrich` if metadata is still incomplete 119 76 120 - 3. If you need the record indexed immediately, fetch the detail endpoint through 121 - the API or enqueue it explicitly with `POST /admin/indexing/enqueue`. 77 + ### Metadata gaps 122 78 123 - ### Enrichment gaps 124 - 125 - Documents appear in search but `author_handle` or `repo_name` is empty. 79 + If `author_handle` or `repo_name` is empty: 126 80 127 81 ```sh 128 - twister enrich --dry-run # preview what would be resolved 129 - twister enrich # apply 130 - twister reindex # re-sync FTS after field updates 82 + twister enrich --dry-run 83 + twister enrich 84 + twister reindex 131 85 ``` 132 86 133 - ### Full index recovery 87 + ### Full PostgreSQL rebuild 134 88 135 - Use this sequence after a DB drop, migration to a new Turso database, or other 136 - full-loss event. 89 + Use this after restoring to a fresh database or moving to a new PostgreSQL 90 + instance. 137 91 138 - 1. Confirm migrations ran: `twister api --local` performs `store.Migrate` on startup. 139 - 2. Register repos with Tap: 92 + 1. start `api` once so migrations run 93 + 2. start `indexer` 94 + 3. run `twister backfill` 95 + 4. run `twister enrich` 96 + 5. run `twister reindex` 97 + 6. verify `/readyz`, `/health`, and smoke checks 140 98 141 - ```sh 142 - twister backfill --dry-run 143 - twister backfill 144 - ``` 145 - 146 - 3. Start the indexer and let it consume: `twister indexer` 147 - 4. Once backfill is complete, enrich fields and re-sync FTS: 148 - 149 - ```sh 150 - twister enrich 151 - twister reindex 152 - ``` 153 - 154 - 5. Verify: `GET /admin/status` for cursor progress, `GET /readyz` for DB health. 99 + This is the default migration path from the old Turso-backed deployment too. 155 100 156 101 ### Tap cursor reset 157 102 158 - If the indexer cursor is ahead of what Tap will deliver (e.g., after a Tap 159 - instance reset), events will be skipped until the cursor catches up. 160 - 161 - To reset the cursor and reprocess from the beginning of Tap's retention window: 103 + If the Tap cursor is ahead of the retained event window: 162 104 163 105 ```sql 164 106 DELETE FROM sync_state WHERE consumer_name = 'indexer-tap-v1'; 165 107 ``` 166 108 167 - Then restart the `indexer`. It will start from the head of the stream and 168 - process all events Tap delivers. 169 - 170 - > **Note:** This does not cause duplicate documents — `UpsertDocument` is 171 - > idempotent. It may reprocess a large backlog depending on Tap retention. 109 + Then restart the `indexer`. 172 110 173 - --- 111 + ## Status Checks 174 112 175 - ## Checking Status 176 - 177 - With `ENABLE_ADMIN_ENDPOINTS=true`: 113 + With admin routes enabled: 178 114 179 115 ```sh 180 116 curl -H "Authorization: Bearer $ADMIN_AUTH_TOKEN" \ 181 117 http://localhost:8080/admin/status 182 118 ``` 183 119 184 - Response includes: 120 + Watch: 185 121 186 - - `tap.cursor` and `tap.updated_at` 187 - - `jetstream.cursor` and `jetstream.updated_at` 122 + - `tap.cursor` 123 + - `jetstream.cursor` 188 124 - `documents` 189 - - `read_through.pending`, `processing`, `completed`, `failed`, `dead_letter` 190 - - `read_through.oldest_pending_age_s` and `oldest_running_age_s` 191 - - `read_through.last_completed_at` and `last_processed_at` 125 + - `read_through.pending` 126 + - `read_through.processing` 127 + - `read_through.failed` 128 + - `read_through.dead_letter`

+4 -4

docs/roadmap.md

··· 1 1 --- 2 2 title: Roadmap 3 - updated: 2026-03-25 3 + updated: 2026-03-26 4 4 --- 5 5 6 6 ## API: Search Stabilization 7 7 8 8 Highest priority. This work blocks further investment in search quality and broader discovery features. 9 9 10 - - [x] Stabilize local development and experimentation around a local `file:` database 10 + - [x] Stabilize local development around PostgreSQL, with SQLite kept only as a rollback path 11 11 - [x] Document backup, restore, and disk-growth procedures for the experimental local DB 12 12 - [x] Research production backend options: PostgreSQL, Turso remote/libSQL, and Turso embedded replicas 13 13 - [x] Write a production storage decision record with workload and operational tradeoffs, using `docs/adr/pg.md` and `docs/adr/turso.md` ··· 33 33 34 34 Completed on [2026-03-25](../CHANGELOG.md#2026-03-25) 35 35 36 - ## API: FTS5 Search Quality 36 + ## API: Keyword Search Quality 37 37 38 38 Improve keyword search quality without external dependencies. 39 39 40 40 **Depends on:** API: Search Stabilization 41 41 42 42 - [ ] Synonym expansion at query time (e.g. "repo" matches "repository") 43 - - [ ] Stemming tokenizer (porter or unicode61+porter) 43 + - [ ] Stemming and parser tuning for PostgreSQL full-text search 44 44 - [ ] Prefix search support for autocomplete 45 45 - [ ] Field weight tuning based on real query patterns 46 46 - [ ] Recency boost for recently updated content

+35 -239

docs/specs/search.md

··· 1 1 --- 2 2 title: Search 3 - updated: 2026-03-25 3 + updated: 2026-03-26 4 4 --- 5 5 6 - > Warning: this document is pretty long. Look at the roadmap and ADR summaries for a 7 - > high-level overview, or jump to the relevant sections. 8 - 9 - Search now has two phases: 10 - 11 - 1. Stabilize indexing and activity caching so search is cheap and reliable. 12 - 2. Enhance keyword search quality with FTS5 features once the base pipeline is stable. 6 + Twisted search is now operationally centered on PostgreSQL, Tap ingest, and a 7 + small set of rebuild tools. 13 8 14 - ## Immediate Priority 9 + ## Current State 15 10 16 - The current highest-priority search work is operational, not ranking: 17 - 18 - - Stabilize experimentation around a local `file:` database workflow. 19 - - Add cURL smoke tests for search, document fetches, indexing, and activity reads. 20 - - Enqueue background indexing when the API fetches records that are not yet searchable. 21 - - Cache recent JetStream activity server-side with a persisted 24-hour cursor. 22 - 23 - Production storage is Turso cloud. The reasoning is recorded in `docs/adr/storage.md`, with the comparison inputs in `docs/adr/pg.md` and `docs/adr/turso.md`. 24 - 25 - These tasks block further work on search quality improvements. 26 - 27 - ## Planning Decisions 28 - 29 - ### Why This Comes First 30 - 31 - Search quality is currently constrained more by ingestion cost and freshness gaps than by ranking quality. 32 - The next iteration should make Twister cheaper to operate, resilient across restarts, and able to backfill misses on demand before any new semantic or hybrid work. 33 - 34 - ### Resolved Questions 35 - 36 - #### Local-Only Storage 37 - 38 - Twister can already run against a local `file:` database. That is useful for stabilizing development and experimentation while the indexing model is still changing. It should not automatically be treated as the final production architecture. 39 - 40 - The production storage question remains open and should compare at least: 41 - 42 - - PostgreSQL with native full-text search and conventional operational tooling 43 - - Turso remote/libSQL 44 - - Turso with embedded replicas or similar local-read, remote-sync patterns 45 - 46 - That comparison has been completed, and the current production choice is Turso. 47 - 48 - #### Tangled First-Commit Timestamp 49 - 50 - The first Tangled commit timestamp is useful as a lower-bound hint for one-time experiments, but it should not become the default replay cursor. 51 - JetStream has to default to recent history (< 72 hours from now is what's possible) so bootstrap cost stays bounded. 52 - 53 - #### Tap Versus JetStream 54 - 55 - Tap remains the authoritative indexing and bulk backfill path. JetStream should power only a bounded recent-activity cache. 56 - Read-through API indexing closes gaps when a user fetches a record before Tap has delivered it. 11 + - primary storage: PostgreSQL 12 + - local default URL: `postgresql://localhost/${USER}_dev?sslmode=disable` 13 + - production deploy target: Coolify application plus managed PostgreSQL 14 + - legacy fallback: local SQLite behind `--local` 57 15 58 16 ## Goals 59 17 60 - - Reduce search-related reads and writes enough that remote Turso cost is no longer the dominant constraint. 61 - - Keep indexed content fresh enough for browsing and search without requiring a full-network rebuild after routine restarts. 62 - - Serve recent activity cheaply from a local cache. 63 - - Add a smoke-test layer that verifies search and indexing behavior end to end. 64 - 65 - ## Current Search Mode 66 - 67 - ### Keyword Search (Implemented) 68 - 69 - Full-text search is powered by SQLite FTS5 with BM25 scoring. Queries match title, body, summary, repo name, author handle, and tags. Results are ranked with field-specific weights and snippets highlight matches with `<mark>` tags. 70 - 71 - ## Stabilization Plan 72 - 73 - ### Storage 74 - 75 - Twister should use a local `file:` database to stabilize experimentation and reduce the messiness of iteration while the indexing pipeline is being hardened. Production storage should remain explicitly undecided until the project compares PostgreSQL and Turso-based options against the final workload. 76 - 77 - Requirements: 78 - 79 - - keep local-file mode as the simplest path for development and experimentation 80 - - document what assumptions the local path makes about single-host or shared-disk execution 81 - - document backup, restore, and disk-growth procedures 82 - - produce a production storage decision record comparing PostgreSQL and Turso options, starting from `docs/adr/pg.md` and `docs/adr/turso.md` 83 - 84 - Evaluation criteria for the production decision: 85 - 86 - - write-heavy ingestion behavior 87 - - FTS quality and indexing ergonomics 88 - - operational complexity and backup story 89 - - latency for reads and writes 90 - - failure recovery and restore workflow 91 - - support for future semantic search requirements 92 - 93 - Acceptance: 94 - 95 - - local development no longer depends on remote Turso for routine experimentation 96 - - the production backend choice is documented with explicit tradeoffs 97 - - the chosen production backend has a migration path from the experimental local setup 98 - 99 - The concrete local DB operating procedure lives in `docs/reference/api.md`. 100 - The production migration path is documented in `docs/adr/storage.md`. 101 - 102 - ### Read-Through Indexing 103 - 104 - When the API fetches a repo, issue, PR, profile, or similar detail record 105 - directly from upstream, it should enqueue background indexing work only when 106 - that record is missing or stale. Tap remains the primary ingest path; 107 - read-through indexing only closes gaps. 108 - 109 - Requirements: 110 - 111 - - add a durable job table for on-demand indexing 112 - - deduplicate jobs by stable document identity 113 - - reuse the existing normalization and upsert path 114 - - trigger jobs from detail handlers that already fetch upstream records 115 - - do not enqueue whole collections from list or browse handlers 116 - 117 - Acceptance: 118 - 119 - - a fetched-but-missing record becomes searchable shortly after the first successful API read 120 - - repeated page views do not create unbounded duplicate work 121 - - queue state and terminal failures are inspectable through admin endpoints 122 - - failures are visible through logs and smoke tests 123 - 124 - ### Activity Cache 125 - 126 - JetStream should back a recent-activity cache, not the main search index. The server should persist a timestamp cursor, seed it to `now - ~24h` on first boot, rewind slightly on reconnect, and expire old events aggressively. 127 - 128 - Requirements: 129 - 130 - - add a dedicated activity cache table 131 - - persist a separate JetStream consumer cursor 132 - - seed missing cursors to recent history, not full history 133 - - keep retention bounded by age and row count 134 - 135 - Acceptance: 136 - 137 - - common activity reads can be served from the cache 138 - - restarts resume from the stored timestamp cursor 139 - - reconnects are idempotent and tolerate a short rewind window 18 + - keep search fresh through Tap ingest and targeted backfill 19 + - preserve the current `/search` API contract 20 + - make local development and production use the same database family 21 + - keep rebuild and recovery workflows simple enough to rehearse 140 22 141 - ### Smoke Tests 23 + ## Keyword Search 142 24 143 - Twister needs cURL-based smoke tests covering: 25 + Implemented with PostgreSQL full-text search: 144 26 145 - - `GET /healthz` 146 - - `GET /readyz` 147 - - `GET /search` 148 - - `GET /documents/{id}` 149 - - one fetch path that should enqueue indexing 150 - - one activity endpoint backed by the cache 27 + - weighted `tsvector` over title, author handle, repo name, summary, body, tags 28 + - `websearch_to_tsquery('simple', ...)` 29 + - `ts_rank_cd` 30 + - `ts_headline` 151 31 152 - Acceptance: 32 + Result scores and snippets may differ from the old SQLite FTS5 implementation. 153 33 154 - - one local command can verify the critical API surface 155 - - the same scripts can run against staging or production by changing the base URL 34 + ## Ingest Model 156 35 157 - ## Operational Model 36 + 1. Tap is the authoritative indexing path. 37 + 2. Read-through indexing fills misses from detail fetches. 38 + 3. JetStream powers only the bounded activity cache. 39 + 4. `backfill`, `enrich`, and `reindex` rebuild the serving dataset. 158 40 159 - 1. Tap ingests the authoritative search corpus. 160 - 2. Direct API reads enqueue background indexing for misses. 161 - 3. JetStream fills only the recent-activity cache. 162 - 4. Smoke tests guard the critical paths. 163 - 5. FTS5 quality improvements (synonyms, stemming, prefix search) follow once the base pipeline is stable. 41 + ## Operational Rules 164 42 165 - ## Backfill Strategy 43 + - use explicit collection allowlists in production 44 + - do not import Turso data as the default migration path 45 + - treat PostgreSQL backups and restore drills as part of normal operations 46 + - keep the SQLite path only until the PostgreSQL rollout is fully bedded in 166 47 167 - - Search index backfill should continue to use Tap admin backfill, firehose-driven repo sync, or repo export based resync. 168 - - Activity cache bootstrap should use a recent JetStream timestamp cursor, defaulting to `now - 24h`. 169 - - A manual cursor override can exist for one-time replay experiments, but it should not be the default startup path. 48 + ## Next Search Work 170 49 171 - ## API Contract 172 - 173 - **`GET /search`** — Unified endpoint, routes by `mode` parameter. 174 - 175 - ### Parameters 176 - 177 - | Param | Required | Default | Description | 178 - | ------------ | -------- | ------- | ------------------------------------- | 179 - | `q` | Yes | — | Query string | 180 - | `mode` | No | keyword | keyword | 181 - | `limit` | No | 20 | Results per page (1–100) | 182 - | `offset` | No | 0 | Pagination offset | 183 - | `collection` | No | — | Filter by collection NSID | 184 - | `type` | No | — | Filter by record type | 185 - | `author` | No | — | Filter by handle or DID | 186 - | `repo` | No | — | Filter by repo name or DID | 187 - | `language` | No | — | Filter by primary language | 188 - | `from` | No | — | Created after (ISO 8601) | 189 - | `to` | No | — | Created before (ISO 8601) | 190 - | `state` | No | — | Issue/PR state (open, closed, merged) | 191 - 192 - ### Response 193 - 194 - ```json 195 - { 196 - "query": "tangled vue", 197 - "mode": "keyword", 198 - "total": 42, 199 - "limit": 20, 200 - "offset": 0, 201 - "results": [ 202 - { 203 - "id": "did:plc:abc|sh.tangled.repo|my-repo", 204 - "collection": "sh.tangled.repo", 205 - "record_type": "repo", 206 - "title": "my-repo", 207 - "summary": "A Vue component library", 208 - "body_snippet": "...building <mark>Vue</mark> components for <mark>Tangled</mark>...", 209 - "score": 4.82, 210 - "matched_by": ["keyword"], 211 - "repo_name": "my-repo", 212 - "author_handle": "alice.bsky.social", 213 - "did": "did:plc:abc", 214 - "at_uri": "at://did:plc:abc/sh.tangled.repo/my-repo", 215 - "web_url": "https://tangled.sh/alice.bsky.social/my-repo", 216 - "created_at": "2026-01-15T10:00:00Z", 217 - "updated_at": "2026-03-20T14:30:00Z" 218 - } 219 - ] 220 - } 221 - ``` 222 - 223 - ## Search Strategy 224 - 225 - Indexing via Tap is useful but has proven unreliable for maintaining complete, up-to-date coverage. The approach: 226 - 227 - 1. **Keyword search is the foundation.** It works now and covers the primary use case — finding repos, issues, and people by name or content. 228 - 229 - 2. **Constellation supplements search results.** Star counts and follower counts from Constellation can be used as ranking signals without needing to index interaction records ourselves. 230 - 231 - 3. **Read-through indexing closes freshness gaps.** If a user can fetch a record, the system should be able to make it searchable shortly after. 232 - 233 - 4. **JetStream is for recent activity, not authoritative indexing.** Use it to power the cached feed, not to replace Tap or repo re-sync. 234 - 235 - 5. **FTS5 enhancements are the next quality step.** Synonym expansion, stemming, and prefix search improve discovery without external dependencies. 236 - 237 - 6. **Graceful degradation.** The mobile app treats the search API as optional. If Twister is unavailable, handle-based direct browsing still works. Search results link into the same browsing screens. 238 - 239 - ## Quality Improvements (Planned) 240 - 241 - - Synonym expansion at query time (e.g. "repo" matches "repository") 242 - - Stemming tokenizer (porter or unicode61+porter) 243 - - Prefix search support for autocomplete 244 - - Field weight tuning based on real query patterns 245 - - Recency boost for recently updated content 246 - - Collection-aware ranking 247 - - Star count as a ranking signal 248 - - State filtering defaults 249 - - Better snippet generation 250 - - Relevance test fixtures 251 - 252 - ## Mobile Integration 253 - 254 - The app calls the search API from the Explore tab. Results are displayed in segmented views (repos, users, issues/PRs). 255 - Each result links to the corresponding browsing screen (repo detail, profile, issue detail). 256 - 257 - When the search API is unavailable, the Explore tab shows an appropriate state rather than breaking. 258 - The Home tab's handle-based browsing is fully independent of search. 50 + - synonym expansion 51 + - stemming and better tokenizer choices 52 + - field weight tuning from real queries 53 + - recency boosts 54 + - relevance fixtures that assert behavior, not exact score strings

+11 -2

justfile

··· 41 41 api-build: 42 42 just --justfile packages/api/justfile build 43 43 44 - # Run API. Usage: just api-dev [mode], mode: local|remote (default local) 44 + # Run API. Usage: just api-dev [mode], mode: local|remote|sqlite (default local) 45 45 api-dev mode="local": 46 46 just --justfile packages/api/justfile run-api {{mode}} 47 47 48 - # Run indexer. Usage: just api-run-indexer [mode], mode: local|remote (default local) 48 + # Run indexer. Usage: just api-run-indexer [mode], mode: local|remote|sqlite (default local) 49 49 api-run-indexer mode="local": 50 50 just --justfile packages/api/justfile run-indexer {{mode}} 51 + 52 + db-up: 53 + docker compose -f docker-compose.dev.yaml up -d postgres tap 54 + 55 + db-down: 56 + docker compose -f docker-compose.dev.yaml down 57 + 58 + db-psql: 59 + psql "postgresql://localhost/${USER:-postgres}_dev?sslmode=disable" 51 60 52 61 api-test: 53 62 just --justfile packages/api/justfile test

+34 -117

packages/api/doc.go

··· 1 1 // Twister is the Tap-backed indexing and search API for Tangled. 2 2 // 3 - // It proxies upstream AT Protocol services such as knots, PDS endpoints, 4 - // Bluesky, Constellation, and Jetstream so the app can use a single origin. 5 - // 6 3 // Requirements 7 4 // 8 5 // - Go 1.25+ 9 - // - A Turso database, or local SQLite for development 6 + // - PostgreSQL for the normal local and production workflow 10 7 // 11 8 // Running locally 12 9 // 13 - // cd packages/api 14 - // go run . api --local 15 - // 16 - // The local API listens on :8080 by default and uses packages/api/twister-dev.db. 17 - // Logs are printed as text when --local is set. 18 - // 19 - // # API smoke tests 20 - // 21 - // Smoke checks live in packages/scripts/api/. From the repo root: 22 - // 23 - // uv run --project packages/scripts/api twister-api-smoke 24 - // 25 - // Optional base URL override: 26 - // 27 - // TWISTER_API_BASE_URL=http://localhost:8080 \ 28 - // uv run --project packages/scripts/api twister-api-smoke 10 + // cd /Users/owais/Projects/Twisted 11 + // just db-up 12 + // just api-dev 13 + // just api-run-indexer 29 14 // 30 - // # Experimental local DB operations 15 + // The default local database URL is: 31 16 // 32 - // The experimental local database lives at packages/api/twister-dev.db when 33 - // you run Twister with --local. Treat it as disposable unless you explicitly 34 - // back it up. 17 + // postgresql://localhost/${USER}_dev?sslmode=disable 35 18 // 36 - // Backup: 19 + // That matches a Postgres.app-style setup and the repo's dev compose file. 37 20 // 38 - // 1. Stop the Twister process using the local DB. 39 - // 2. Copy the database file and any SQLite sidecar files if they exist. 40 - // 41 - // Example: 21 + // # Legacy fallback 42 22 // 43 - // cd packages/api 44 - // mkdir -p backups 45 - // timestamp="$(date +%Y%m%d-%H%M%S)" 46 - // cp twister-dev.db "backups/twister-dev-${timestamp}.db" 47 - // test -f twister-dev.db-wal && cp twister-dev.db-wal "backups/twister-dev-${timestamp}.db-wal" 48 - // test -f twister-dev.db-shm && cp twister-dev.db-shm "backups/twister-dev-${timestamp}.db-shm" 23 + // `--local` is deprecated and switches the service to the temporary SQLite 24 + // fallback at packages/api/twister-dev.db. 49 25 // 50 - // Restore: 26 + // go run . api --local 51 27 // 52 - // 1. Stop the Twister process. 53 - // 2. Move the current local DB aside if you want to keep it. 54 - // 3. Copy the backup file back to twister-dev.db. 55 - // 4. Restore matching -wal and -shm files only if they came from the same set. 28 + // Smoke checks 56 29 // 57 - // Example: 30 + // uv run --project packages/scripts/api twister-api-smoke 58 31 // 59 - // cd packages/api 60 - // mv twister-dev.db "twister-dev.db.broken.$(date +%Y%m%d-%H%M%S)" 2>/dev/null || true 61 - // cp backups/twister-dev-YYYYMMDD-HHMMSS.db twister-dev.db 32 + // Optional base URL override: 62 33 // 63 - // Disk growth: 34 + // TWISTER_API_BASE_URL=http://localhost:8080 \ 35 + // uv run --project packages/scripts/api twister-api-smoke 64 36 // 65 - // The local DB grows because of indexed documents, FTS tables, activity cache 66 - // rows, and repeated backfill or reindex runs. 37 + // Environment variables 67 38 // 68 - // Recommended operating procedure: 69 - // 70 - // 1. Check file growth periodically. 71 - // 2. Delete and rebuild the DB freely when the dataset is no longer useful. 72 - // 3. Run VACUUM only when you intentionally want to compact a long-lived DB. 73 - // 4. Keep old backups out of the repo and rotate them manually. 74 - // 75 - // Inspection commands: 76 - // 77 - // cd packages/api 78 - // du -h twister-dev.db* 79 - // ls -lh twister-dev.db* 80 - // 81 - // Failure recovery: prefer restore-or-rebuild over manual repair if the 82 - // experimental DB becomes 83 - // suspicious or inconsistent. It is a developer convenience database, not the 84 - // source of truth. 85 - // 86 - // # Environment variables 87 - // 88 - // Copy .env.example to .env in the repo root or packages/api/. The server loads 89 - // .env, ../.env, and ../../.env automatically. 90 - // 91 - // - TURSO_DATABASE_URL: Turso/libSQL connection URL, required unless --local 92 - // - TURSO_AUTH_TOKEN: auth token, required for non-file URLs 93 - // - HTTP_BIND_ADDR: default :8080 94 - // - LOG_LEVEL: debug, info, warn, or error; default info 95 - // - LOG_FORMAT: json or text; default json 96 - // - SEARCH_DEFAULT_LIMIT: default 20 97 - // - SEARCH_MAX_LIMIT: default 100 39 + // - DATABASE_URL: primary database connection URL 40 + // - HTTP_BIND_ADDR: API bind address, default :8080 41 + // - INDEXER_HEALTH_ADDR: indexer health bind address, default :9090 42 + // - LOG_LEVEL: debug, info, warn, or error 43 + // - LOG_FORMAT: json or text 44 + // - TAP_URL: Tap WebSocket URL, default ws://localhost:2480/channel in local indexer runs 45 + // - TAP_AUTH_PASSWORD: Tap admin password, default twisted-dev in local indexer runs 46 + // - INDEXED_COLLECTIONS: comma-separated AT collections to index 47 + // - READ_THROUGH_MODE: off or missing; default missing 48 + // - READ_THROUGH_COLLECTIONS: read-through allowlist 49 + // - READ_THROUGH_MAX_ATTEMPTS: retries before dead_letter 98 50 // - ENABLE_ADMIN_ENDPOINTS: default false 99 - // - ADMIN_AUTH_TOKEN: bearer token for admin endpoints 100 - // - CONSTELLATION_URL: default https://constellation.microcosm.blue 101 - // - CONSTELLATION_USER_AGENT: user-agent sent to Constellation 102 - // - TAP_URL: Tap firehose URL, indexer only 103 - // - TAP_AUTH_PASSWORD: Tap auth password, indexer only 104 - // - INDEXED_COLLECTIONS: comma-separated AT collections to index 105 - // - READ_THROUGH_MODE: off, missing, or broad; default missing 106 - // - READ_THROUGH_COLLECTIONS: read-through allowlist, default INDEXED_COLLECTIONS 107 - // - READ_THROUGH_MAX_ATTEMPTS: max retries before dead_letter, default 5 51 + // - ADMIN_AUTH_TOKEN: bearer token for admin routes 108 52 // 109 53 // CLI commands 110 54 // ··· 113 57 // twister backfill 114 58 // twister reindex 115 59 // twister enrich 116 - // 117 - // Enrich: 60 + // twister healthcheck 118 61 // 119 - // Resolves missing author_handle, repo_name, and web_url fields on documents 120 - // already in the database. 62 + // # Deployment 121 63 // 122 - // twister enrich --local 123 - // twister enrich --local --collection sh.tangled.repo 124 - // twister enrich --local --did did:plc:abc123 125 - // twister enrich --local --dry-run 126 - // 127 - // Flags: --collection, --did, --document, --dry-run, --concurrency (default 5). 128 - // 129 - // Proxy endpoints 130 - // 131 - // - GET /proxy/knot/{host}/{nsid} -> https://{host}/xrpc/{nsid} 132 - // - GET /proxy/pds/{host}/{nsid} -> https://{host}/xrpc/{nsid} 133 - // - GET /proxy/bsky/{nsid} -> https://public.api.bsky.app/xrpc/{nsid} 134 - // - GET /identity/resolve -> https://bsky.social/xrpc/com.atproto.identity.resolveHandle 135 - // - GET /identity/did/{did} -> https://plc.directory/{did} or /.well-known/did.json 136 - // - GET /backlinks/count -> Constellation getBacklinksCount, cached 137 - // - WS /activity/stream -> wss://jetstream2.us-east.bsky.network/subscribe 138 - // 139 - // # Admin endpoints 140 - // 141 - // Admin routes require ENABLE_ADMIN_ENDPOINTS=true. If ADMIN_AUTH_TOKEN is set, 142 - // requests must send Authorization: Bearer <ADMIN_AUTH_TOKEN>. 143 - // 144 - // - GET /admin/status: cursor state, queue counts, oldest ages, last activity 145 - // - GET /admin/indexing/jobs: inspect queue rows by status, source, or document 146 - // - GET /admin/indexing/audit: inspect append-only indexing audit rows 147 - // - POST /admin/indexing/enqueue: queue one explicit record for indexing 148 - // - POST /admin/reindex: re-sync all or filtered documents into the FTS index 64 + // Production uses Coolify for the `api`, `indexer`, and `tap` services plus a 65 + // separate Coolify-managed PostgreSQL resource. See docs/reference/deployment-walkthrough.md. 149 66 package main

+6 -2

packages/api/go.mod

··· 5 5 require ( 6 6 github.com/charmbracelet/log v1.0.0 7 7 github.com/coder/websocket v1.8.12 8 + github.com/jackc/pgx/v5 v5.9.1 8 9 github.com/joho/godotenv v1.5.1 9 10 github.com/spf13/cobra v1.10.2 10 - github.com/tursodatabase/libsql-client-go v0.0.0-20251219100830-236aa1ff8acc 11 11 modernc.org/sqlite v1.47.0 12 12 ) 13 13 14 14 require ( 15 - github.com/antlr4-go/antlr/v4 v4.13.0 // indirect 16 15 github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect 17 16 github.com/charmbracelet/colorprofile v0.2.3-0.20250311203215-f60798e515dc // indirect 18 17 github.com/charmbracelet/lipgloss v1.1.0 // indirect ··· 23 22 github.com/go-logfmt/logfmt v0.6.1 // indirect 24 23 github.com/google/uuid v1.6.0 // indirect 25 24 github.com/inconshreveable/mousetrap v1.1.0 // indirect 25 + github.com/jackc/pgpassfile v1.0.0 // indirect 26 + github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect 27 + github.com/jackc/puddle/v2 v2.2.2 // indirect 26 28 github.com/lucasb-eyer/go-colorful v1.2.0 // indirect 27 29 github.com/mattn/go-isatty v0.0.20 // indirect 28 30 github.com/mattn/go-runewidth v0.0.16 // indirect ··· 33 35 github.com/spf13/pflag v1.0.9 // indirect 34 36 github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect 35 37 golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect 38 + golang.org/x/sync v0.19.0 // indirect 36 39 golang.org/x/sys v0.42.0 // indirect 40 + golang.org/x/text v0.29.0 // indirect 37 41 modernc.org/libc v1.70.0 // indirect 38 42 modernc.org/mathutil v1.7.1 // indirect 39 43 modernc.org/memory v1.11.0 // indirect

+15 -4

packages/api/go.sum

··· 1 - github.com/antlr4-go/antlr/v4 v4.13.0 h1:lxCg3LAv+EUK6t1i0y1V6/SLeUi0eKEKdhQAlS8TVTI= 2 - github.com/antlr4-go/antlr/v4 v4.13.0/go.mod h1:pfChB/xh/Unjila75QW7+VU4TSnWnnk9UTnmpPaOR2g= 3 1 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k= 4 2 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8= 5 3 github.com/charmbracelet/colorprofile v0.2.3-0.20250311203215-f60798e515dc h1:4pZI35227imm7yK2bGPcfpFEmuY1gc2YSTShr4iJBfs= ··· 17 15 github.com/coder/websocket v1.8.12 h1:5bUXkEPPIbewrnkU8LTCLVaxi4N4J8ahufH2vlo4NAo= 18 16 github.com/coder/websocket v1.8.12/go.mod h1:LNVeNrXQZfe5qhS9ALED3uA+l5pPqvwXg3CKoDBB2gs= 19 17 github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= 18 + github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 20 19 github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 21 20 github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 22 21 github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY= ··· 33 32 github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM= 34 33 github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8= 35 34 github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= 35 + github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM= 36 + github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg= 37 + github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo= 38 + github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM= 39 + github.com/jackc/pgx/v5 v5.9.1 h1:uwrxJXBnx76nyISkhr33kQLlUqjv7et7b9FjCen/tdc= 40 + github.com/jackc/pgx/v5 v5.9.1/go.mod h1:mal1tBGAFfLHvZzaYh77YS/eC6IX9OWbRV1QIIM0Jn4= 41 + github.com/jackc/puddle/v2 v2.2.2 h1:PR8nw+E/1w0GLuRFSmiioY6UooMp6KJv0/61nB7icHo= 42 + github.com/jackc/puddle/v2 v2.2.2/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4= 36 43 github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0= 37 44 github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4= 38 45 github.com/lucasb-eyer/go-colorful v1.2.0 h1:1nnpGOrhyZZuNyfu1QjKiUICQ74+3FNCN69Aj6K7nkY= ··· 57 64 github.com/spf13/cobra v1.10.2/go.mod h1:7C1pvHqHw5A4vrJfjNwvOdzYu0Gml16OCs2GRiTUUS4= 58 65 github.com/spf13/pflag v1.0.9 h1:9exaQaMOCwffKiiiYk6/BndUBv+iRViNW+4lEMi0PvY= 59 66 github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= 67 + github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 68 + github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 69 + github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 60 70 github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= 61 71 github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= 62 - github.com/tursodatabase/libsql-client-go v0.0.0-20251219100830-236aa1ff8acc h1:lzi/5fg2EfinRlh3v//YyIhnc4tY7BTqazQGwb1ar+0= 63 - github.com/tursodatabase/libsql-client-go v0.0.0-20251219100830-236aa1ff8acc/go.mod h1:08inkKyguB6CGGssc/JzhmQWwBgFQBgjlYFjxjRh7nU= 64 72 github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e h1:JVG44RsyaB9T2KIHavMF/ppJZNG9ZpyihvCd0w101no= 65 73 github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e/go.mod h1:RbqR21r5mrJuqunuUZ/Dhy/avygyECGrLceyNeo4LiM= 66 74 go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg= ··· 73 81 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 74 82 golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo= 75 83 golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= 84 + golang.org/x/text v0.29.0 h1:1neNs90w9YzJ9BocxfsQNHKuAT4pkghyXc4nhZ6sJvk= 85 + golang.org/x/text v0.29.0/go.mod h1:7MhJOA9CD2qZyOKYazxdYMF85OwPdEr9jTtBpO7ydH4= 76 86 golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k= 77 87 golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0= 78 88 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 89 + gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 79 90 gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= 80 91 gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 81 92 modernc.org/cc/v4 v4.27.1 h1:9W30zRlYrefrDV2JE2O8VDtJ1yPGownxciz5rrbQZis=

+2 -2

packages/api/internal/api/api.go

··· 24 24 25 25 // Server is the HTTP search API server. 26 26 type Server struct { 27 - search *search.Repository 27 + search search.Repository 28 28 store store.Store 29 29 cfg *config.Config 30 30 log *slog.Logger ··· 37 37 } 38 38 39 39 // New creates a new API server. 40 - func New(searchRepo *search.Repository, st store.Store, cfg *config.Config, log *slog.Logger, constellation *constellation.Client, xrpcClient *xrpc.Client) *Server { 40 + func New(searchRepo search.Repository, st store.Store, cfg *config.Config, log *slog.Logger, constellation *constellation.Client, xrpcClient *xrpc.Client) *Server { 41 41 registry := normalize.NewRegistry() 42 42 policy := idx.NewPolicy(cfg.IndexedCollections, cfg.ReadThroughCollections, cfg.ReadThroughMode) 43 43 return &Server{

+1 -1

packages/api/internal/api/test_store_test.go

··· 40 40 func (s *apiTestStore) ListDocuments(_ context.Context, _ store.DocumentFilter) ([]*store.Document, error) { 41 41 return nil, nil 42 42 } 43 - func (s *apiTestStore) OptimizeFTS(_ context.Context) error { return nil } 43 + func (s *apiTestStore) OptimizeSearchIndex(_ context.Context) error { return nil } 44 44 func (s *apiTestStore) GetSyncState(_ context.Context, _ string) (*store.SyncState, error) { 45 45 return nil, nil 46 46 }

+18 -53

packages/api/internal/backfill/doc.go

··· 1 1 // Package backfill provides Tap bootstrap tooling for Twister. 2 2 // 3 - // # Backfill Runbook 3 + // # Backfill runbook 4 4 // 5 - // This runbook covers initial bootstrap and repeat runs using: 5 + // The default command is: 6 6 // 7 7 // twister backfill 8 8 // 9 9 // `--source lightrail` is the default and discovers DIDs from 10 - // com.atproto.sync.listReposByCollection. `--source graph` keeps the older 11 - // handle/DID seed crawl for targeted fallback runs. 12 - // 13 - // # Graph Seeds Input 14 - // 15 - // The `--seeds` flag applies only to `--source graph` and supports either of 16 - // these forms: 17 - // 18 - // 1. File path: 19 - // 20 - // twister backfill --seeds /etc/twister/seeds.txt 21 - // 22 - // 2. Comma-separated inline list: 23 - // 24 - // twister backfill --seeds anirudh.fi,atprotocol.dev,oppi.li 10 + // com.atproto.sync.listReposByCollection. `--source graph` remains available for 11 + // targeted fallback runs. 25 12 // 26 - // Supported seed entries are DIDs and handles. 27 - // 28 - // Repository-managed example seed file: 29 - // 30 - // docs/api/seeds.txt 13 + // # Graph seeds input 31 14 // 32 - // Runtime seed file is typically mounted outside the repo, for example: 15 + // The `--seeds` flag applies only to `--source graph` and accepts either: 33 16 // 34 - // /etc/twister/seeds.txt 17 + // 1. a file path 18 + // 2. a comma-separated list of DIDs or handles 35 19 // 36 - // # Prerequisites 20 + // Prerequisites 37 21 // 38 - // Required environment variables: 39 - // 40 - // - TURSO_DATABASE_URL 41 - // - TURSO_AUTH_TOKEN (for non-file Turso URLs) 22 + // - DATABASE_URL 42 23 // - TAP_URL 43 24 // - TAP_AUTH_PASSWORD 44 25 // 45 - // # First Bootstrap 46 - // 47 - // 1. Run full-network dry-run: 48 - // 49 - // twister backfill --dry-run 26 + // Typical bootstrap 50 27 // 51 - // 2. Run real bootstrap: 28 + // 1. twister backfill --dry-run 29 + // 2. twister backfill 30 + // 3. twister enrich 31 + // 4. twister reindex 52 32 // 53 - // twister backfill 33 + // In local development, docker-compose.dev.yaml runs Tap on 34 + // ws://localhost:2480/channel. 54 35 // 55 - // 3. Use graph mode only for targeted fallback: 56 - // 57 - // twister backfill --source graph --seeds /tmp/twister-seeds.txt --max-hops 2 58 - // 59 - // Watch logs for discovery totals and Tap submission progress. 60 - // 61 - // # Repeat Run 62 - // 63 - // Re-run `twister backfill` whenever you need to reseed the authoritative Tap 64 - // corpus. Append graph seeds only when using `--source graph`. 65 - // 66 - // # Dry-Run Safety 67 - // 68 - // Before production mutation: 69 - // - include --dry-run 70 - // - confirm only discovery output appears 71 - // - confirm no Tap mutation side effects 36 + // Re-run backfill whenever you need to reseed the authoritative Tap corpus. 72 37 package backfill

+31 -13

packages/api/internal/config/config.go

··· 3 3 import ( 4 4 "errors" 5 5 "os" 6 + "os/user" 6 7 "path/filepath" 7 8 "strconv" 8 9 "strings" ··· 12 13 ) 13 14 14 15 type Config struct { 15 - TursoURL string 16 - TursoToken string 16 + DatabaseURL string 17 17 TapURL string 18 18 TapAuthPassword string 19 19 IndexedCollections string ··· 54 54 loadDotEnv() 55 55 56 56 cfg := &Config{ 57 - TursoURL: os.Getenv("TURSO_DATABASE_URL"), 58 - TursoToken: os.Getenv("TURSO_AUTH_TOKEN"), 57 + DatabaseURL: databaseURL(), 59 58 TapURL: os.Getenv("TAP_URL"), 60 59 TapAuthPassword: os.Getenv("TAP_AUTH_PASSWORD"), 61 60 IndexedCollections: os.Getenv("INDEXED_COLLECTIONS"), ··· 88 87 } 89 88 90 89 if opts.Local { 91 - dbURL, err := localDatabaseURL(opts.WorkDir) 90 + dbURL, err := localSQLiteDatabaseURL(opts.WorkDir) 92 91 if err != nil { 93 92 return nil, err 94 93 } 95 - cfg.TursoURL = dbURL 96 - cfg.TursoToken = "" 94 + cfg.DatabaseURL = dbURL 97 95 cfg.LogFormat = "text" 98 96 } 99 97 100 98 var errs []error 101 - if cfg.TursoURL == "" { 102 - errs = append(errs, errors.New("TURSO_DATABASE_URL is required")) 103 - } 104 - if cfg.TursoToken == "" && !strings.HasPrefix(cfg.TursoURL, "file:") { 105 - errs = append(errs, errors.New("TURSO_AUTH_TOKEN is required for non-file URLs")) 99 + if cfg.DatabaseURL == "" { 100 + errs = append(errs, errors.New("DATABASE_URL is required")) 106 101 } 107 102 if len(errs) > 0 { 108 103 return nil, errors.Join(errs...) ··· 110 105 return cfg, nil 111 106 } 112 107 113 - func localDatabaseURL(workDir string) (string, error) { 108 + func localSQLiteDatabaseURL(workDir string) (string, error) { 114 109 if strings.TrimSpace(workDir) == "" { 115 110 var err error 116 111 workDir, err = os.Getwd() ··· 119 114 } 120 115 } 121 116 return "file:" + filepath.Join(workDir, "twister-dev.db"), nil 117 + } 118 + 119 + func databaseURL() string { 120 + if v := strings.TrimSpace(os.Getenv("DATABASE_URL")); v != "" { 121 + return v 122 + } 123 + if v := strings.TrimSpace(os.Getenv("TURSO_DATABASE_URL")); v != "" { 124 + return v 125 + } 126 + return defaultLocalDatabaseURL() 127 + } 128 + 129 + func defaultLocalDatabaseURL() string { 130 + userName := strings.TrimSpace(os.Getenv("USER")) 131 + if userName == "" { 132 + if current, err := user.Current(); err == nil { 133 + userName = strings.TrimSpace(current.Username) 134 + } 135 + } 136 + if userName == "" { 137 + userName = "postgres" 138 + } 139 + return "postgresql://localhost/" + userName + "_dev?sslmode=disable" 122 140 } 123 141 124 142 func loadDotEnv() {

+32 -23

packages/api/internal/config/config_test.go

··· 6 6 "testing" 7 7 ) 8 8 9 - func TestLoadRequiresRemoteTursoConfigurationByDefault(t *testing.T) { 9 + func TestLoadUsesDefaultLocalPostgresURL(t *testing.T) { 10 + t.Setenv("DATABASE_URL", "") 10 11 t.Setenv("TURSO_DATABASE_URL", "") 11 - t.Setenv("TURSO_AUTH_TOKEN", "") 12 12 13 - _, err := Load(LoadOptions{}) 14 - if err == nil { 15 - t.Fatal("expected missing Turso config to fail") 13 + cfg, err := Load(LoadOptions{}) 14 + if err != nil { 15 + t.Fatalf("load config: %v", err) 16 + } 17 + if got, want := cfg.DatabaseURL, defaultLocalDatabaseURL(); got != want { 18 + t.Fatalf("DatabaseURL: got %q, want %q", got, want) 16 19 } 17 20 } 18 21 19 22 func TestLoadLocalOverridesRemoteDatabaseAndLogging(t *testing.T) { 20 23 workDir := t.TempDir() 21 - t.Setenv("TURSO_DATABASE_URL", "libsql://example.turso.io") 22 - t.Setenv("TURSO_AUTH_TOKEN", "secret") 24 + t.Setenv("DATABASE_URL", "postgresql://localhost/test_dev?sslmode=disable") 23 25 t.Setenv("LOG_FORMAT", "json") 24 26 25 27 cfg, err := Load(LoadOptions{Local: true, WorkDir: workDir}) ··· 28 30 } 29 31 30 32 wantURL := "file:" + filepath.Join(workDir, "twister-dev.db") 31 - if cfg.TursoURL != wantURL { 32 - t.Fatalf("TursoURL: got %q, want %q", cfg.TursoURL, wantURL) 33 - } 34 - if cfg.TursoToken != "" { 35 - t.Fatalf("TursoToken: got %q, want empty", cfg.TursoToken) 33 + if cfg.DatabaseURL != wantURL { 34 + t.Fatalf("DatabaseURL: got %q, want %q", cfg.DatabaseURL, wantURL) 36 35 } 37 36 if cfg.LogFormat != "text" { 38 37 t.Fatalf("LogFormat: got %q, want %q", cfg.LogFormat, "text") ··· 44 43 if err != nil { 45 44 t.Fatalf("getwd: %v", err) 46 45 } 46 + t.Setenv("DATABASE_URL", "") 47 47 t.Setenv("TURSO_DATABASE_URL", "") 48 - t.Setenv("TURSO_AUTH_TOKEN", "") 49 48 50 49 cfg, err := Load(LoadOptions{Local: true}) 51 50 if err != nil { ··· 53 52 } 54 53 55 54 wantURL := "file:" + filepath.Join(wd, "twister-dev.db") 56 - if cfg.TursoURL != wantURL { 57 - t.Fatalf("TursoURL: got %q, want %q", cfg.TursoURL, wantURL) 55 + if cfg.DatabaseURL != wantURL { 56 + t.Fatalf("DatabaseURL: got %q, want %q", cfg.DatabaseURL, wantURL) 58 57 } 59 58 } 60 59 61 60 func TestLoadReadThroughDefaults(t *testing.T) { 62 - t.Setenv("TURSO_DATABASE_URL", "file:test.db") 63 - t.Setenv("TURSO_AUTH_TOKEN", "") 61 + t.Setenv("DATABASE_URL", "file:test.db") 64 62 t.Setenv("INDEXED_COLLECTIONS", "sh.tangled.repo,sh.tangled.repo.issue") 65 63 t.Setenv("READ_THROUGH_MODE", "") 66 64 t.Setenv("READ_THROUGH_COLLECTIONS", "") ··· 81 79 } 82 80 } 83 81 84 - func TestLoadUsesRailwayPortForBindAddresses(t *testing.T) { 85 - t.Setenv("TURSO_DATABASE_URL", "file:test.db") 86 - t.Setenv("TURSO_AUTH_TOKEN", "") 82 + func TestLoadUsesPortEnvForBindAddresses(t *testing.T) { 83 + t.Setenv("DATABASE_URL", "file:test.db") 87 84 t.Setenv("HTTP_BIND_ADDR", "") 88 85 t.Setenv("INDEXER_HEALTH_ADDR", "") 89 86 t.Setenv("PORT", "4567") ··· 100 97 } 101 98 } 102 99 103 - func TestLoadPrefersExplicitBindAddressesOverRailwayPort(t *testing.T) { 104 - t.Setenv("TURSO_DATABASE_URL", "file:test.db") 105 - t.Setenv("TURSO_AUTH_TOKEN", "") 100 + func TestLoadPrefersExplicitBindAddressesOverPortEnv(t *testing.T) { 101 + t.Setenv("DATABASE_URL", "file:test.db") 106 102 t.Setenv("HTTP_BIND_ADDR", "0.0.0.0:8081") 107 103 t.Setenv("INDEXER_HEALTH_ADDR", "0.0.0.0:9091") 108 104 t.Setenv("PORT", "4567") ··· 118 114 t.Fatalf("IndexerHealthAddr: got %q", cfg.IndexerHealthAddr) 119 115 } 120 116 } 117 + 118 + func TestLoadSupportsLegacyDatabaseEnvName(t *testing.T) { 119 + t.Setenv("DATABASE_URL", "") 120 + t.Setenv("TURSO_DATABASE_URL", "file:test.db") 121 + 122 + cfg, err := Load(LoadOptions{}) 123 + if err != nil { 124 + t.Fatalf("load config: %v", err) 125 + } 126 + if got, want := cfg.DatabaseURL, "file:test.db"; got != want { 127 + t.Fatalf("DatabaseURL: got %q, want %q", got, want) 128 + } 129 + }

+2 -2

packages/api/internal/enrich/enrich.go

··· 108 108 109 109 if !opts.DryRun && result.Updated > 0 { 110 110 r.log.Info("enrich: optimizing fts index") 111 - if err := r.store.OptimizeFTS(ctx); err != nil { 112 - r.log.Error("enrich: fts optimize failed", slog.String("error", err.Error())) 111 + if err := r.store.OptimizeSearchIndex(ctx); err != nil { 112 + r.log.Error("enrich: search index finalize failed", slog.String("error", err.Error())) 113 113 result.Errors++ 114 114 } 115 115 }

+1 -1

packages/api/internal/ingest/ingest_test.go

··· 157 157 return docs, nil 158 158 } 159 159 160 - func (f *fakeStore) OptimizeFTS(_ context.Context) error { 160 + func (f *fakeStore) OptimizeSearchIndex(_ context.Context) error { 161 161 return nil 162 162 } 163 163

+6 -6

packages/api/internal/reindex/reindex.go

··· 1 - // Package reindex re-syncs documents to the FTS index from stored fields. 1 + // Package reindex re-syncs documents to the search index from stored fields. 2 2 // It is used by the `twister reindex` CLI command and the POST /admin/reindex endpoint. 3 3 package reindex 4 4 ··· 41 41 } 42 42 43 43 // Run reindexes documents matching opts. 44 - // It re-upserts each document (which re-syncs the FTS virtual table) and then 45 - // runs an FTS optimize pass to merge Tantivy/FTS5 segments. 44 + // It re-upserts each document and then runs any backend-specific search index 45 + // optimization step. 46 46 func (r *Runner) Run(ctx context.Context, opts Options) (*Result, error) { 47 47 filter := store.DocumentFilter{ 48 48 Collection: opts.Collection, ··· 103 103 } 104 104 105 105 if !opts.DryRun { 106 - r.log.Info("reindex: optimizing fts index") 107 - if err := r.store.OptimizeFTS(ctx); err != nil { 108 - r.log.Error("reindex: fts optimize failed", slog.String("error", err.Error())) 106 + r.log.Info("reindex: finalizing search index") 107 + if err := r.store.OptimizeSearchIndex(ctx); err != nil { 108 + r.log.Error("reindex: search index finalize failed", slog.String("error", err.Error())) 109 109 result.Errors++ 110 110 } 111 111 }

+160

packages/api/internal/search/postgres_repository.go

··· 1 + package search 2 + 3 + import ( 4 + "context" 5 + "database/sql" 6 + "fmt" 7 + "strings" 8 + ) 9 + 10 + type PostgresRepository struct { 11 + db *sql.DB 12 + } 13 + 14 + type pgSearchArgs struct { 15 + args []any 16 + } 17 + 18 + func newPGSearchArgs(initial ...any) *pgSearchArgs { 19 + return &pgSearchArgs{args: append([]any{}, initial...)} 20 + } 21 + 22 + func (p *pgSearchArgs) Add(value any) string { 23 + p.args = append(p.args, value) 24 + return fmt.Sprintf("$%d", len(p.args)) 25 + } 26 + 27 + func (p *pgSearchArgs) Values() []any { 28 + return p.args 29 + } 30 + 31 + func (r *PostgresRepository) Ping(ctx context.Context) error { 32 + return r.db.PingContext(ctx) 33 + } 34 + 35 + func (r *PostgresRepository) Keyword(ctx context.Context, p Params) (*Response, error) { 36 + args := newPGSearchArgs(p.Query) 37 + where := []string{"d.search_vector @@ query.q", "d.deleted_at IS NULL"} 38 + joins := []string{ 39 + "LEFT JOIN identity_handles repo_owner ON repo_owner.did = d.repo_did AND repo_owner.is_active = TRUE", 40 + } 41 + 42 + if p.Collection != "" { 43 + where = append(where, "d.collection = "+args.Add(p.Collection)) 44 + } 45 + if p.Type != "" { 46 + where = append(where, "d.record_type = "+args.Add(p.Type)) 47 + } 48 + if p.Author != "" { 49 + placeholder := args.Add(p.Author) 50 + where = append(where, fmt.Sprintf("(d.author_handle = %s OR d.did = %s)", placeholder, args.Add(p.Author))) 51 + } 52 + if p.Repo != "" { 53 + placeholder := args.Add(p.Repo) 54 + where = append(where, fmt.Sprintf("(d.repo_name = %s OR d.repo_did = %s)", placeholder, args.Add(p.Repo))) 55 + } 56 + if p.Language != "" { 57 + where = append(where, "d.language = "+args.Add(p.Language)) 58 + } 59 + if p.From != "" { 60 + where = append(where, "d.created_at >= "+args.Add(p.From)) 61 + } 62 + if p.To != "" { 63 + where = append(where, "d.created_at <= "+args.Add(p.To)) 64 + } 65 + if p.State != "" { 66 + joins = append(joins, "JOIN record_state rs ON rs.subject_uri = d.at_uri") 67 + where = append(where, "rs.state = "+args.Add(p.State)) 68 + } 69 + 70 + baseQuery := ` 71 + WITH query AS ( 72 + SELECT websearch_to_tsquery('simple', $1) AS q 73 + )` 74 + fromClause := ` 75 + FROM documents d 76 + CROSS JOIN query 77 + ` + strings.Join(joins, "\n") 78 + whereClause := " WHERE " + strings.Join(where, " AND ") 79 + 80 + countSQL := baseQuery + ` 81 + SELECT COUNT(*) 82 + ` + fromClause + whereClause 83 + 84 + var total int 85 + if err := r.db.QueryRowContext(ctx, countSQL, args.Values()...).Scan(&total); err != nil { 86 + return nil, fmt.Errorf("count: %w", err) 87 + } 88 + 89 + limitPlaceholder := args.Add(p.Limit) 90 + offsetPlaceholder := args.Add(p.Offset) 91 + 92 + resultsSQL := baseQuery + ` 93 + SELECT d.id, d.title, d.summary, d.repo_name, repo_owner.handle, d.author_handle, 94 + d.did, d.at_uri, d.web_url, d.collection, d.record_type, d.created_at, d.updated_at, 95 + ts_rank_cd(d.search_vector, query.q) AS score, 96 + ts_headline( 97 + 'simple', 98 + COALESCE(NULLIF(d.body, ''), COALESCE(d.summary, '')), 99 + query.q, 100 + 'StartSel=<mark>, StopSel=</mark>, MaxWords=20, MinWords=10, MaxFragments=2, FragmentDelimiter= ... ' 101 + ) AS body_snippet 102 + ` + fromClause + whereClause + ` 103 + ORDER BY score DESC, d.updated_at DESC 104 + LIMIT ` + limitPlaceholder + ` OFFSET ` + offsetPlaceholder 105 + 106 + rows, err := r.db.QueryContext(ctx, resultsSQL, args.Values()...) 107 + if err != nil { 108 + return nil, fmt.Errorf("search: %w", err) 109 + } 110 + defer rows.Close() 111 + 112 + results := make([]Result, 0) 113 + for rows.Next() { 114 + res, err := scanResult(rows) 115 + if err != nil { 116 + return nil, err 117 + } 118 + results = append(results, *res) 119 + } 120 + if err := rows.Err(); err != nil { 121 + return nil, fmt.Errorf("rows: %w", err) 122 + } 123 + 124 + return &Response{ 125 + Query: p.Query, 126 + Mode: "keyword", 127 + Total: total, 128 + Limit: p.Limit, 129 + Offset: p.Offset, 130 + Results: results, 131 + }, nil 132 + } 133 + 134 + func scanResult(scanner interface { 135 + Scan(dest ...any) error 136 + }) (*Result, error) { 137 + var res Result 138 + var title, summary, repoName, repoOwnerHandle, authorHandle sql.NullString 139 + var webURL, createdAt, updatedAt sql.NullString 140 + var bodySnippet sql.NullString 141 + 142 + if err := scanner.Scan( 143 + &res.ID, &title, &summary, &repoName, &repoOwnerHandle, &authorHandle, 144 + &res.DID, &res.ATURI, &webURL, &res.Collection, &res.RecordType, 145 + &createdAt, &updatedAt, &res.Score, &bodySnippet, 146 + ); err != nil { 147 + return nil, fmt.Errorf("scan: %w", err) 148 + } 149 + res.Title = title.String 150 + res.Summary = summary.String 151 + res.RepoName = repoName.String 152 + res.RepoOwnerHandle = repoOwnerHandle.String 153 + res.AuthorHandle = authorHandle.String 154 + res.WebURL = webURL.String 155 + res.BodySnippet = bodySnippet.String 156 + res.CreatedAt = createdAt.String 157 + res.UpdatedAt = updatedAt.String 158 + res.MatchedBy = []string{"keyword"} 159 + return &res, nil 160 + }

+7 -153

packages/api/internal/search/search.go

··· 3 3 import ( 4 4 "context" 5 5 "database/sql" 6 - "fmt" 7 6 "strings" 8 7 ) 9 8 ··· 54 53 } 55 54 56 55 // Repository executes search queries against the database. 57 - type Repository struct { 58 - db *sql.DB 59 - } 60 - 61 - // NewRepository creates a search repository backed by the given database. 62 - func NewRepository(db *sql.DB) *Repository { 63 - return &Repository{db: db} 64 - } 65 - 66 - // Ping checks database connectivity. 67 - func (r *Repository) Ping(ctx context.Context) error { 68 - return r.db.PingContext(ctx) 69 - } 70 - 71 - // Keyword runs a full-text keyword search. 72 - func (r *Repository) Keyword(ctx context.Context, p Params) (*Response, error) { 73 - ftsQuery := toFTS5Query(p.Query) 74 - 75 - var filters []string 76 - var filterArgs []any 77 - 78 - if p.Collection != "" { 79 - filters = append(filters, "d.collection = ?") 80 - filterArgs = append(filterArgs, p.Collection) 81 - } 82 - if p.Type != "" { 83 - filters = append(filters, "d.record_type = ?") 84 - filterArgs = append(filterArgs, p.Type) 85 - } 86 - if p.Author != "" { 87 - filters = append(filters, "(d.author_handle = ? OR d.did = ?)") 88 - filterArgs = append(filterArgs, p.Author, p.Author) 89 - } 90 - if p.Repo != "" { 91 - filters = append(filters, "(d.repo_name = ? OR d.repo_did = ?)") 92 - filterArgs = append(filterArgs, p.Repo, p.Repo) 93 - } 94 - if p.Language != "" { 95 - filters = append(filters, "d.language = ?") 96 - filterArgs = append(filterArgs, p.Language) 97 - } 98 - if p.From != "" { 99 - filters = append(filters, "d.created_at >= ?") 100 - filterArgs = append(filterArgs, p.From) 101 - } 102 - if p.To != "" { 103 - filters = append(filters, "d.created_at <= ?") 104 - filterArgs = append(filterArgs, p.To) 105 - } 106 - 107 - var join string 108 - if p.State != "" { 109 - join = "JOIN record_state rs ON rs.subject_uri = d.at_uri" 110 - filters = append(filters, "rs.state = ?") 111 - filterArgs = append(filterArgs, p.State) 112 - } 113 - 114 - where := "documents_fts MATCH ? AND d.deleted_at IS NULL" 115 - if len(filters) > 0 { 116 - where += " AND " + strings.Join(filters, " AND ") 117 - } 118 - 119 - countSQL := fmt.Sprintf("SELECT COUNT(*) FROM documents_fts JOIN documents d ON d.id = documents_fts.id %s WHERE %s", join, where) 120 - countArgs := append([]any{ftsQuery}, filterArgs...) 121 - 122 - var total int 123 - if err := r.db.QueryRowContext(ctx, countSQL, countArgs...).Scan(&total); err != nil { 124 - return nil, explainNativeFTSError("count", err) 125 - } 126 - 127 - resultsSQL := fmt.Sprintf(` 128 - SELECT d.id, d.title, d.summary, d.repo_name, repo_owner.handle, d.author_handle, 129 - d.did, d.at_uri, d.web_url, d.collection, d.record_type, d.created_at, d.updated_at, 130 - -bm25(documents_fts, 0.0, 3.0, 1.0, 1.5, 2.5, 2.0, 1.2) AS score, 131 - snippet(documents_fts, 2, '<mark>', '</mark>', '...', 20) AS body_snippet 132 - FROM documents_fts 133 - JOIN documents d ON d.id = documents_fts.id 134 - LEFT JOIN identity_handles repo_owner ON repo_owner.did = d.repo_did AND repo_owner.is_active = 1 135 - %s 136 - WHERE %s 137 - ORDER BY score DESC 138 - LIMIT ? OFFSET ?`, join, where) 139 - 140 - resultsArgs := make([]any, 0, 1+len(filterArgs)+2) 141 - resultsArgs = append(resultsArgs, ftsQuery) 142 - resultsArgs = append(resultsArgs, filterArgs...) 143 - resultsArgs = append(resultsArgs, p.Limit, p.Offset) 144 - 145 - rows, err := r.db.QueryContext(ctx, resultsSQL, resultsArgs...) 146 - if err != nil { 147 - return nil, explainNativeFTSError("search", err) 148 - } 149 - defer rows.Close() 150 - 151 - results := make([]Result, 0) 152 - for rows.Next() { 153 - var res Result 154 - var title, summary, repoName, repoOwnerHandle, authorHandle sql.NullString 155 - var webURL, createdAt, updatedAt sql.NullString 156 - var bodySnippet sql.NullString 157 - 158 - if err := rows.Scan( 159 - &res.ID, &title, &summary, &repoName, &repoOwnerHandle, &authorHandle, 160 - &res.DID, &res.ATURI, &webURL, &res.Collection, &res.RecordType, 161 - &createdAt, &updatedAt, &res.Score, &bodySnippet, 162 - ); err != nil { 163 - return nil, fmt.Errorf("scan: %w", err) 164 - } 165 - res.Title = title.String 166 - res.Summary = summary.String 167 - res.RepoName = repoName.String 168 - res.RepoOwnerHandle = repoOwnerHandle.String 169 - res.AuthorHandle = authorHandle.String 170 - res.WebURL = webURL.String 171 - res.BodySnippet = bodySnippet.String 172 - res.CreatedAt = createdAt.String 173 - res.UpdatedAt = updatedAt.String 174 - res.MatchedBy = []string{"keyword"} 175 - results = append(results, res) 176 - } 177 - if err := rows.Err(); err != nil { 178 - return nil, fmt.Errorf("rows: %w", err) 179 - } 180 - 181 - return &Response{ 182 - Query: p.Query, 183 - Mode: "keyword", 184 - Total: total, 185 - Limit: p.Limit, 186 - Offset: p.Offset, 187 - Results: results, 188 - }, nil 189 - } 190 - 191 - func explainNativeFTSError(op string, err error) error { 192 - msg := err.Error() 193 - if strings.Contains(msg, "no such table: documents_fts") || 194 - strings.Contains(msg, "no such module: fts5") { 195 - return fmt.Errorf("%s: SQLite FTS5 is unavailable on this database; ensure the FTS5 migration succeeded and that Turso SQLite extensions are enabled for this database/group: %w", op, err) 196 - } 197 - return fmt.Errorf("%s: %w", op, err) 56 + type Repository interface { 57 + Ping(ctx context.Context) error 58 + Keyword(ctx context.Context, p Params) (*Response, error) 198 59 } 199 60 200 - func toFTS5Query(raw string) string { 201 - parts := strings.Fields(raw) 202 - if len(parts) == 0 { 203 - return `""` 204 - } 205 - 206 - quoted := make([]string, 0, len(parts)) 207 - for _, part := range parts { 208 - part = strings.ReplaceAll(part, `"`, `""`) 209 - quoted = append(quoted, `"`+part+`"`) 61 + func NewRepository(url string, db *sql.DB) Repository { 62 + if strings.HasPrefix(url, "file:") { 63 + return &SQLiteRepository{db: db} 210 64 } 211 - return strings.Join(quoted, " OR ") 65 + return &PostgresRepository{db: db} 212 66 }

+4 -4

packages/api/internal/search/search_test.go

··· 10 10 "tangled.org/desertthunder.dev/twister/internal/store" 11 11 ) 12 12 13 - func TestKeywordSearchUsesFTS5Index(t *testing.T) { 13 + func TestKeywordSearchUsesLegacySQLiteSearchIndex(t *testing.T) { 14 14 dir := t.TempDir() 15 15 dbPath := filepath.Join(dir, "search.db") 16 16 url := "file:" + dbPath 17 17 18 - db, err := store.Open(url, "") 18 + db, err := store.Open(url) 19 19 if err != nil { 20 20 t.Fatalf("open: %v", err) 21 21 } ··· 28 28 t.Fatalf("migrate: %v", err) 29 29 } 30 30 31 - st := store.New(db) 32 - repo := search.NewRepository(db) 31 + st := store.New(url, db) 32 + repo := search.NewRepository(url, db) 33 33 ctx := context.Background() 34 34 35 35 doc := &store.Document{

+143

packages/api/internal/search/sqlite_repository.go

··· 1 + package search 2 + 3 + import ( 4 + "context" 5 + "database/sql" 6 + "fmt" 7 + "strings" 8 + ) 9 + 10 + type SQLiteRepository struct { 11 + db *sql.DB 12 + } 13 + 14 + func (r *SQLiteRepository) Ping(ctx context.Context) error { 15 + return r.db.PingContext(ctx) 16 + } 17 + 18 + func (r *SQLiteRepository) Keyword(ctx context.Context, p Params) (*Response, error) { 19 + ftsQuery := toFTS5Query(p.Query) 20 + 21 + var filters []string 22 + var filterArgs []any 23 + 24 + if p.Collection != "" { 25 + filters = append(filters, "d.collection = ?") 26 + filterArgs = append(filterArgs, p.Collection) 27 + } 28 + if p.Type != "" { 29 + filters = append(filters, "d.record_type = ?") 30 + filterArgs = append(filterArgs, p.Type) 31 + } 32 + if p.Author != "" { 33 + filters = append(filters, "(d.author_handle = ? OR d.did = ?)") 34 + filterArgs = append(filterArgs, p.Author, p.Author) 35 + } 36 + if p.Repo != "" { 37 + filters = append(filters, "(d.repo_name = ? OR d.repo_did = ?)") 38 + filterArgs = append(filterArgs, p.Repo, p.Repo) 39 + } 40 + if p.Language != "" { 41 + filters = append(filters, "d.language = ?") 42 + filterArgs = append(filterArgs, p.Language) 43 + } 44 + if p.From != "" { 45 + filters = append(filters, "d.created_at >= ?") 46 + filterArgs = append(filterArgs, p.From) 47 + } 48 + if p.To != "" { 49 + filters = append(filters, "d.created_at <= ?") 50 + filterArgs = append(filterArgs, p.To) 51 + } 52 + 53 + var join string 54 + if p.State != "" { 55 + join = "JOIN record_state rs ON rs.subject_uri = d.at_uri" 56 + filters = append(filters, "rs.state = ?") 57 + filterArgs = append(filterArgs, p.State) 58 + } 59 + 60 + where := "documents_fts MATCH ? AND d.deleted_at IS NULL" 61 + if len(filters) > 0 { 62 + where += " AND " + strings.Join(filters, " AND ") 63 + } 64 + 65 + countSQL := fmt.Sprintf( 66 + "SELECT COUNT(*) FROM documents_fts JOIN documents d ON d.id = documents_fts.id %s WHERE %s", 67 + join, where, 68 + ) 69 + countArgs := append([]any{ftsQuery}, filterArgs...) 70 + 71 + var total int 72 + if err := r.db.QueryRowContext(ctx, countSQL, countArgs...).Scan(&total); err != nil { 73 + return nil, explainSQLiteSearchError("count", err) 74 + } 75 + 76 + resultsSQL := fmt.Sprintf(` 77 + SELECT d.id, d.title, d.summary, d.repo_name, repo_owner.handle, d.author_handle, 78 + d.did, d.at_uri, d.web_url, d.collection, d.record_type, d.created_at, d.updated_at, 79 + -bm25(documents_fts, 0.0, 3.0, 1.0, 1.5, 2.5, 2.0, 1.2) AS score, 80 + snippet(documents_fts, 2, '<mark>', '</mark>', '...', 20) AS body_snippet 81 + FROM documents_fts 82 + JOIN documents d ON d.id = documents_fts.id 83 + LEFT JOIN identity_handles repo_owner ON repo_owner.did = d.repo_did AND repo_owner.is_active = 1 84 + %s 85 + WHERE %s 86 + ORDER BY score DESC 87 + LIMIT ? OFFSET ?`, join, where) 88 + 89 + resultsArgs := make([]any, 0, 1+len(filterArgs)+2) 90 + resultsArgs = append(resultsArgs, ftsQuery) 91 + resultsArgs = append(resultsArgs, filterArgs...) 92 + resultsArgs = append(resultsArgs, p.Limit, p.Offset) 93 + 94 + rows, err := r.db.QueryContext(ctx, resultsSQL, resultsArgs...) 95 + if err != nil { 96 + return nil, explainSQLiteSearchError("search", err) 97 + } 98 + defer rows.Close() 99 + 100 + results := make([]Result, 0) 101 + for rows.Next() { 102 + res, err := scanResult(rows) 103 + if err != nil { 104 + return nil, err 105 + } 106 + results = append(results, *res) 107 + } 108 + if err := rows.Err(); err != nil { 109 + return nil, fmt.Errorf("rows: %w", err) 110 + } 111 + 112 + return &Response{ 113 + Query: p.Query, 114 + Mode: "keyword", 115 + Total: total, 116 + Limit: p.Limit, 117 + Offset: p.Offset, 118 + Results: results, 119 + }, nil 120 + } 121 + 122 + func explainSQLiteSearchError(op string, err error) error { 123 + msg := err.Error() 124 + if strings.Contains(msg, "no such table: documents_fts") || 125 + strings.Contains(msg, "no such module: fts5") { 126 + return fmt.Errorf("%s: SQLite FTS5 is unavailable on this database: %w", op, err) 127 + } 128 + return fmt.Errorf("%s: %w", op, err) 129 + } 130 + 131 + func toFTS5Query(raw string) string { 132 + parts := strings.Fields(raw) 133 + if len(parts) == 0 { 134 + return `""` 135 + } 136 + 137 + quoted := make([]string, 0, len(parts)) 138 + for _, part := range parts { 139 + part = strings.ReplaceAll(part, `"`, `""`) 140 + quoted = append(quoted, `"`+part+`"`) 141 + } 142 + return strings.Join(quoted, " OR ") 143 + }

+104 -56

packages/api/internal/store/db.go

··· 9 9 "strings" 10 10 "time" 11 11 12 - _ "github.com/tursodatabase/libsql-client-go/libsql" 12 + _ "github.com/jackc/pgx/v5/stdlib" 13 13 _ "modernc.org/sqlite" 14 14 ) 15 15 16 - //go:embed migrations/*.sql 16 + //go:embed migrations/*.sql migrations_postgres/*.sql 17 17 var migrationsFS embed.FS 18 18 19 + type Backend string 20 + 21 + const ( 22 + BackendPostgres Backend = "postgres" 23 + BackendSQLite Backend = "sqlite" 24 + ) 25 + 19 26 type migrationMode struct { 20 - allowTursoExtensionSkip bool 21 - targetDescription string 27 + backend Backend 28 + targetDescription string 29 + } 30 + 31 + // DetectBackend returns the configured database backend for the given URL. 32 + func DetectBackend(url string) Backend { 33 + if strings.HasPrefix(url, "file:") { 34 + return BackendSQLite 35 + } 36 + return BackendPostgres 22 37 } 23 38 24 39 // Open establishes a connection to the database. 25 - // For remote Turso URLs (libsql:// or https://) it uses the libsql-client-go driver. 26 - // For local file: URLs it uses the pure-Go SQLite driver (no CGo required). 27 - func Open(url, token string) (*sql.DB, error) { 28 - driver, dsn := driverAndDSN(url, token) 40 + func Open(url string) (*sql.DB, error) { 41 + driver, dsn := driverAndDSN(url) 29 42 db, err := sql.Open(driver, dsn) 30 43 if err != nil { 31 44 return nil, fmt.Errorf("open db: %w", err) 32 45 } 33 - if strings.HasPrefix(url, "file:") { 46 + switch DetectBackend(url) { 47 + case BackendSQLite: 34 48 if err := configureLocalSQLite(db); err != nil { 35 49 db.Close() 36 50 return nil, err 37 51 } 52 + case BackendPostgres: 53 + configurePostgresPool(db) 38 54 } 39 55 if err := db.Ping(); err != nil { 40 56 db.Close() ··· 44 60 } 45 61 46 62 func configureLocalSQLite(db *sql.DB) error { 47 - // Busy timeout gives the writer a window to wait instead of failing fast with "database is locked". 48 63 if _, err := db.Exec(`PRAGMA busy_timeout = 5000`); err != nil { 49 64 return fmt.Errorf("configure sqlite busy_timeout: %w", err) 50 65 } 51 - // WAL mode allows concurrent readers with a writer and is the default for multi-process local dev. 52 66 if _, err := db.Exec(`PRAGMA journal_mode = WAL`); err != nil { 53 67 return fmt.Errorf("configure sqlite wal mode: %w", err) 54 68 } ··· 63 77 return nil 64 78 } 65 79 66 - // driverAndDSN returns the sql driver name and DSN for the given URL. 67 - // file: URLs use the pure-Go "sqlite" driver; all others use "libsql". 68 - func driverAndDSN(url, token string) (driver, dsn string) { 80 + func configurePostgresPool(db *sql.DB) { 81 + db.SetMaxOpenConns(10) 82 + db.SetMaxIdleConns(10) 83 + db.SetConnMaxLifetime(30 * time.Minute) 84 + db.SetConnMaxIdleTime(5 * time.Minute) 85 + } 86 + 87 + func driverAndDSN(url string) (driver, dsn string) { 69 88 if strings.HasPrefix(url, "file:") { 70 89 return "sqlite", strings.TrimPrefix(url, "file:") 71 90 } 72 - if token == "" || strings.Contains(url, "?") { 73 - return "libsql", url 74 - } 75 - return "libsql", url + "?authToken=" + token 91 + return "pgx", url 76 92 } 77 93 78 - // Migrate runs all embedded SQL migration files in order, skipping any that 79 - // have already been applied. Applied filenames are recorded in the 80 - // schema_migrations table so re-runs are idempotent. 94 + // Migrate runs embedded SQL migrations for the selected backend. 81 95 func Migrate(db *sql.DB, url string) error { 96 + switch DetectBackend(url) { 97 + case BackendSQLite: 98 + return migrateSQLite(db, url) 99 + default: 100 + return migratePostgres(db) 101 + } 102 + } 103 + 104 + func migrateSQLite(db *sql.DB, url string) error { 82 105 if _, err := db.Exec(`CREATE TABLE IF NOT EXISTS schema_migrations ( 83 106 filename TEXT PRIMARY KEY, 84 107 applied_at TEXT NOT NULL ··· 86 109 return fmt.Errorf("create schema_migrations table: %w", err) 87 110 } 88 111 89 - if err := backfillMigrationHistory(db); err != nil { 112 + if err := backfillSQLiteMigrationHistory(db); err != nil { 90 113 return fmt.Errorf("backfill migration history: %w", err) 91 114 } 92 115 93 116 mode := migrationMode{ 94 - allowTursoExtensionSkip: strings.HasPrefix(url, "file:"), 95 - targetDescription: migrationTargetDescription(url), 117 + backend: BackendSQLite, 118 + targetDescription: migrationTargetDescription(url), 96 119 } 97 - entries, err := migrationsFS.ReadDir("migrations") 120 + return runMigrations(db, "migrations", "?", mode) 121 + } 122 + 123 + func migratePostgres(db *sql.DB) error { 124 + if _, err := db.Exec(`CREATE TABLE IF NOT EXISTS schema_migrations ( 125 + filename TEXT PRIMARY KEY, 126 + applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW() 127 + )`); err != nil { 128 + return fmt.Errorf("create schema_migrations table: %w", err) 129 + } 130 + 131 + mode := migrationMode{ 132 + backend: BackendPostgres, 133 + targetDescription: "postgresql", 134 + } 135 + return runMigrations(db, "migrations_postgres", "$", mode) 136 + } 137 + 138 + func runMigrations(db *sql.DB, dir, placeholderPrefix string, mode migrationMode) error { 139 + entries, err := migrationsFS.ReadDir(dir) 98 140 if err != nil { 99 141 return fmt.Errorf("read migrations dir: %w", err) 100 142 } ··· 105 147 if entry.IsDir() || !strings.HasSuffix(entry.Name(), ".sql") { 106 148 continue 107 149 } 150 + 108 151 var already int 109 - _ = db.QueryRow(`SELECT COUNT(*) FROM schema_migrations WHERE filename = ?`, entry.Name()).Scan(&already) 152 + query := `SELECT COUNT(*) FROM schema_migrations WHERE filename = ?` 153 + args := []any{entry.Name()} 154 + if placeholderPrefix == "$" { 155 + query = `SELECT COUNT(*) FROM schema_migrations WHERE filename = $1` 156 + } 157 + if err := db.QueryRow(query, args...).Scan(&already); err != nil { 158 + return fmt.Errorf("check migration %s: %w", entry.Name(), err) 159 + } 110 160 if already > 0 { 111 161 slog.Debug("migration already applied, skipping", "file", entry.Name()) 112 162 continue 113 163 } 114 - data, err := migrationsFS.ReadFile("migrations/" + entry.Name()) 164 + 165 + data, err := migrationsFS.ReadFile(dir + "/" + entry.Name()) 115 166 if err != nil { 116 167 return fmt.Errorf("read migration %s: %w", entry.Name(), err) 117 168 } 118 169 if err := execMigration(db, entry.Name(), string(data), mode); err != nil { 119 170 return err 120 171 } 121 - if _, err := db.Exec( 122 - `INSERT INTO schema_migrations (filename, applied_at) VALUES (?, datetime('now'))`, 123 - entry.Name(), 124 - ); err != nil { 172 + 173 + insert := `INSERT INTO schema_migrations (filename, applied_at) VALUES (?, datetime('now'))` 174 + if placeholderPrefix == "$" { 175 + insert = `INSERT INTO schema_migrations (filename) VALUES ($1)` 176 + } 177 + if _, err := db.Exec(insert, entry.Name()); err != nil { 125 178 return fmt.Errorf("record migration %s: %w", entry.Name(), err) 126 179 } 127 180 slog.Info("migration applied", "file", entry.Name()) ··· 129 182 return nil 130 183 } 131 184 132 - // backfillMigrationHistory records already-applied migrations for databases 133 - // that pre-date the schema_migrations tracking table. It is a no-op if the 134 - // table already has any entries (i.e. tracking was already in place). 135 - func backfillMigrationHistory(db *sql.DB) error { 185 + func backfillSQLiteMigrationHistory(db *sql.DB) error { 136 186 var count int 137 187 if err := db.QueryRow(`SELECT COUNT(*) FROM schema_migrations`).Scan(&count); err != nil || count > 0 { 138 188 return nil ··· 154 204 if sqliteTableExists(db, "identity_handles") { 155 205 mark("002_identity_handles.sql") 156 206 } 157 - 158 207 if sqliteTableExists(db, "documents_fts") { 159 - mark("003_documents_fts.sql") 208 + mark("003_documents_fts5.sql") 160 209 } 161 - 162 210 if sqliteColumnExists(db, "documents", "web_url") { 163 211 mark("004_web_url.sql") 164 212 } 165 - 166 213 return nil 167 214 } 168 215 ··· 185 232 func execMigration(db *sql.DB, name, content string, mode migrationMode) error { 186 233 for _, stmt := range splitStatements(content) { 187 234 if _, err := db.Exec(stmt); err != nil { 188 - upper := strings.ToUpper(stmt) 189 - if strings.Contains(upper, "LIBSQL_VECTOR_IDX") { 190 - slog.Debug("migration: skipping unsupported vector index DDL", 191 - "migration", name, 192 - ) 193 - continue 194 - } 195 - if strings.Contains(upper, "CREATE VIRTUAL TABLE") && strings.Contains(upper, "USING FTS5") { 196 - return fmt.Errorf( 197 - "migration %s: SQLite FTS5 statement failed on %s: %w\nstatement: %s\nhint: this app uses SQLite FTS5 on Turso Cloud. Enable SQLite extensions for the Turso group/database before rerunning the service", 198 - name, mode.targetDescription, err, stmt, 199 - ) 235 + if mode.backend == BackendSQLite { 236 + upper := strings.ToUpper(stmt) 237 + if strings.Contains(upper, "LIBSQL_VECTOR_IDX") { 238 + slog.Debug("migration: skipping unsupported vector index DDL", 239 + "migration", name, 240 + ) 241 + continue 242 + } 243 + if strings.Contains(upper, "CREATE VIRTUAL TABLE") && 244 + strings.Contains(upper, "USING FTS5") { 245 + return fmt.Errorf( 246 + "migration %s: SQLite FTS5 statement failed on %s: %w\nstatement: %s", 247 + name, mode.targetDescription, err, stmt, 248 + ) 249 + } 200 250 } 201 251 return fmt.Errorf("migration %s: exec failed: %w\nstatement: %s", name, err, stmt) 202 252 } ··· 205 255 } 206 256 207 257 func migrationTargetDescription(url string) string { 208 - switch { 209 - case strings.HasPrefix(url, "file:"): 258 + switch DetectBackend(url) { 259 + case BackendSQLite: 210 260 return "local SQLite" 211 - case strings.HasPrefix(url, "libsql://"), strings.HasPrefix(url, "https://"): 212 - return "remote Turso/libSQL" 213 261 default: 214 - return "database" 262 + return "postgresql" 215 263 } 216 264 } 217 265

+8 -8

packages/api/internal/store/db_test.go

··· 9 9 _ "modernc.org/sqlite" 10 10 ) 11 11 12 - func TestExecMigrationSkipsTursoExtensionDDLForLocalSQLite(t *testing.T) { 12 + func TestExecMigrationRunsLegacySQLiteFTSForLocalFallback(t *testing.T) { 13 13 db, err := sql.Open("sqlite", ":memory:") 14 14 if err != nil { 15 15 t.Fatalf("open sqlite: %v", err) ··· 17 17 t.Cleanup(func() { _ = db.Close() }) 18 18 19 19 err = execMigration(db, "003_documents_fts5.sql", "CREATE VIRTUAL TABLE documents_fts USING fts5(title);", migrationMode{ 20 - allowTursoExtensionSkip: true, 21 - targetDescription: "local SQLite", 20 + backend: BackendSQLite, 21 + targetDescription: "local SQLite", 22 22 }) 23 23 if err != nil { 24 24 t.Fatalf("expected local SQLite migration to create FTS5 table: %v", err) ··· 33 33 t.Cleanup(func() { _ = db.Close() }) 34 34 35 35 err = execMigration(db, "003_documents_fts5.sql", "CREATE VIRTUAL TABLE documents_fts USING fts5(", migrationMode{ 36 - allowTursoExtensionSkip: false, 37 - targetDescription: "remote Turso/libSQL", 36 + backend: BackendSQLite, 37 + targetDescription: "local SQLite", 38 38 }) 39 39 if err == nil { 40 - t.Fatal("expected remote migration to fail when FTS5 is unavailable") 40 + t.Fatal("expected SQLite migration to fail when FTS5 is unavailable") 41 41 } 42 - if !strings.Contains(err.Error(), "uses SQLite FTS5 on Turso Cloud") { 42 + if !strings.Contains(err.Error(), "SQLite FTS5 statement failed") { 43 43 t.Fatalf("unexpected error: %v", err) 44 44 } 45 45 } 46 46 47 47 func TestOpenLocalSQLiteAppliesPragmasAndPoolLimits(t *testing.T) { 48 48 path := filepath.Join(t.TempDir(), "local.db") 49 - db, err := Open("file:"+path, "") 49 + db, err := Open("file:" + path) 50 50 if err != nil { 51 51 t.Fatalf("open local sqlite: %v", err) 52 52 }

+2 -2

packages/api/internal/store/indexing_audit.go

··· 6 6 "time" 7 7 ) 8 8 9 - func (s *SQLStore) AppendIndexingAudit(ctx context.Context, input IndexingAuditInput) error { 9 + func (s *SQLiteStore) AppendIndexingAudit(ctx context.Context, input IndexingAuditInput) error { 10 10 now := time.Now().UTC().Format(time.RFC3339) 11 11 _, err := s.db.ExecContext(ctx, ` 12 12 INSERT INTO indexing_audit ( ··· 21 21 return nil 22 22 } 23 23 24 - func (s *SQLStore) ListIndexingAudit( 24 + func (s *SQLiteStore) ListIndexingAudit( 25 25 ctx context.Context, filter IndexingAuditFilter, 26 26 ) ([]*IndexingAuditEntry, error) { 27 27 query := `

+8 -8

packages/api/internal/store/indexing_jobs.go

··· 9 9 "time" 10 10 ) 11 11 12 - func (s *SQLStore) GetIndexingJob(ctx context.Context, documentID string) (*IndexingJob, error) { 12 + func (s *SQLiteStore) GetIndexingJob(ctx context.Context, documentID string) (*IndexingJob, error) { 13 13 row := s.db.QueryRowContext(ctx, ` 14 14 SELECT document_id, did, collection, rkey, cid, record_json, source, 15 15 attempts, status, COALESCE(last_error, ''), scheduled_at, updated_at, ··· 28 28 return job, nil 29 29 } 30 30 31 - func (s *SQLStore) EnqueueIndexingJob(ctx context.Context, input IndexingJobInput) error { 31 + func (s *SQLiteStore) EnqueueIndexingJob(ctx context.Context, input IndexingJobInput) error { 32 32 now := time.Now().UTC().Format(time.RFC3339) 33 33 source := strings.TrimSpace(input.Source) 34 34 if source == "" { ··· 64 64 return nil 65 65 } 66 66 67 - func (s *SQLStore) ClaimIndexingJob( 67 + func (s *SQLiteStore) ClaimIndexingJob( 68 68 ctx context.Context, workerID string, leaseUntil string, 69 69 ) (*IndexingJob, error) { 70 70 now := time.Now().UTC().Format(time.RFC3339) ··· 131 131 return job, nil 132 132 } 133 133 134 - func (s *SQLStore) CompleteIndexingJob(ctx context.Context, documentID string) error { 134 + func (s *SQLiteStore) CompleteIndexingJob(ctx context.Context, documentID string) error { 135 135 now := time.Now().UTC().Format(time.RFC3339) 136 136 _, err := s.db.ExecContext(ctx, ` 137 137 UPDATE indexing_jobs ··· 146 146 return nil 147 147 } 148 148 149 - func (s *SQLStore) RetryIndexingJob( 149 + func (s *SQLiteStore) RetryIndexingJob( 150 150 ctx context.Context, documentID string, nextScheduledAt string, lastError string, 151 151 ) error { 152 152 now := time.Now().UTC().Format(time.RFC3339) ··· 163 163 return nil 164 164 } 165 165 166 - func (s *SQLStore) FailIndexingJob( 166 + func (s *SQLiteStore) FailIndexingJob( 167 167 ctx context.Context, documentID string, status string, lastError string, 168 168 ) error { 169 169 now := time.Now().UTC().Format(time.RFC3339) ··· 180 180 return nil 181 181 } 182 182 183 - func (s *SQLStore) ListIndexingJobs( 183 + func (s *SQLiteStore) ListIndexingJobs( 184 184 ctx context.Context, filter IndexingJobFilter, 185 185 ) ([]*IndexingJob, error) { 186 186 query := ` ··· 237 237 return jobs, nil 238 238 } 239 239 240 - func (s *SQLStore) GetIndexingJobStats(ctx context.Context) (*IndexingJobStats, error) { 240 + func (s *SQLiteStore) GetIndexingJobStats(ctx context.Context) (*IndexingJobStats, error) { 241 241 row := s.db.QueryRowContext(ctx, ` 242 242 SELECT 243 243 COUNT(*) FILTER (WHERE status = ?),

+61

packages/api/internal/store/migrations_postgres/001_initial.sql

··· 1 + CREATE TABLE IF NOT EXISTS documents ( 2 + id TEXT PRIMARY KEY, 3 + did TEXT NOT NULL, 4 + collection TEXT NOT NULL, 5 + rkey TEXT NOT NULL, 6 + at_uri TEXT NOT NULL, 7 + cid TEXT NOT NULL, 8 + record_type TEXT NOT NULL, 9 + title TEXT, 10 + body TEXT, 11 + summary TEXT, 12 + repo_did TEXT, 13 + repo_name TEXT, 14 + author_handle TEXT, 15 + tags_json TEXT, 16 + language TEXT, 17 + created_at TEXT, 18 + updated_at TEXT, 19 + indexed_at TEXT NOT NULL, 20 + web_url TEXT DEFAULT '', 21 + deleted_at TEXT, 22 + search_vector TSVECTOR GENERATED ALWAYS AS ( 23 + setweight(to_tsvector('simple', COALESCE(title, '')), 'A') || 24 + setweight(to_tsvector('simple', COALESCE(author_handle, '')), 'A') || 25 + setweight(to_tsvector('simple', COALESCE(repo_name, '')), 'B') || 26 + setweight(to_tsvector('simple', COALESCE(summary, '')), 'B') || 27 + setweight(to_tsvector('simple', COALESCE(body, '')), 'C') || 28 + setweight(to_tsvector('simple', COALESCE(tags_json, '')), 'D') 29 + ) STORED 30 + ); 31 + 32 + CREATE INDEX IF NOT EXISTS idx_documents_did ON documents(did); 33 + CREATE INDEX IF NOT EXISTS idx_documents_collection ON documents(collection); 34 + CREATE INDEX IF NOT EXISTS idx_documents_record_type ON documents(record_type); 35 + CREATE INDEX IF NOT EXISTS idx_documents_repo_did ON documents(repo_did); 36 + CREATE INDEX IF NOT EXISTS idx_documents_created_at ON documents(created_at); 37 + CREATE INDEX IF NOT EXISTS idx_documents_deleted_at ON documents(deleted_at); 38 + CREATE INDEX IF NOT EXISTS idx_documents_search_vector ON documents USING GIN(search_vector); 39 + 40 + CREATE TABLE IF NOT EXISTS sync_state ( 41 + consumer_name TEXT PRIMARY KEY, 42 + cursor TEXT NOT NULL, 43 + high_water_mark TEXT, 44 + updated_at TEXT NOT NULL 45 + ); 46 + 47 + CREATE TABLE IF NOT EXISTS identity_handles ( 48 + did TEXT PRIMARY KEY, 49 + handle TEXT NOT NULL, 50 + is_active BOOLEAN NOT NULL DEFAULT TRUE, 51 + status TEXT, 52 + updated_at TEXT NOT NULL 53 + ); 54 + 55 + CREATE INDEX IF NOT EXISTS idx_identity_handles_handle ON identity_handles(handle); 56 + 57 + CREATE TABLE IF NOT EXISTS record_state ( 58 + subject_uri TEXT PRIMARY KEY, 59 + state TEXT NOT NULL, 60 + updated_at TEXT NOT NULL 61 + );

+23

packages/api/internal/store/migrations_postgres/002_indexing_jobs.sql

··· 1 + CREATE TABLE IF NOT EXISTS indexing_jobs ( 2 + document_id TEXT PRIMARY KEY, 3 + did TEXT NOT NULL, 4 + collection TEXT NOT NULL, 5 + rkey TEXT NOT NULL, 6 + cid TEXT NOT NULL, 7 + record_json TEXT NOT NULL, 8 + source TEXT NOT NULL DEFAULT 'read_through', 9 + status TEXT NOT NULL, 10 + attempts INTEGER NOT NULL DEFAULT 0, 11 + last_error TEXT, 12 + scheduled_at TEXT NOT NULL, 13 + updated_at TEXT NOT NULL, 14 + lease_owner TEXT DEFAULT '', 15 + lease_expires_at TEXT DEFAULT '', 16 + completed_at TEXT DEFAULT '' 17 + ); 18 + 19 + CREATE INDEX IF NOT EXISTS idx_indexing_jobs_status_scheduled 20 + ON indexing_jobs(status, scheduled_at, updated_at); 21 + 22 + CREATE INDEX IF NOT EXISTS idx_indexing_jobs_claim 23 + ON indexing_jobs(status, scheduled_at, lease_expires_at, updated_at);

+14

packages/api/internal/store/migrations_postgres/003_jetstream_events.sql

··· 1 + CREATE TABLE IF NOT EXISTS jetstream_events ( 2 + id BIGSERIAL PRIMARY KEY, 3 + time_us BIGINT NOT NULL, 4 + did TEXT NOT NULL, 5 + kind TEXT NOT NULL, 6 + collection TEXT, 7 + rkey TEXT, 8 + operation TEXT, 9 + payload TEXT NOT NULL, 10 + received_at TEXT NOT NULL 11 + ); 12 + 13 + CREATE INDEX IF NOT EXISTS idx_jetstream_events_time_us 14 + ON jetstream_events(time_us DESC);

+17

packages/api/internal/store/migrations_postgres/004_indexing_audit.sql

··· 1 + CREATE TABLE IF NOT EXISTS indexing_audit ( 2 + id BIGSERIAL PRIMARY KEY, 3 + source TEXT NOT NULL, 4 + document_id TEXT NOT NULL, 5 + collection TEXT NOT NULL, 6 + cid TEXT NOT NULL, 7 + decision TEXT NOT NULL, 8 + attempt INTEGER NOT NULL DEFAULT 0, 9 + error TEXT, 10 + created_at TEXT NOT NULL 11 + ); 12 + 13 + CREATE INDEX IF NOT EXISTS idx_indexing_audit_created 14 + ON indexing_audit(created_at DESC); 15 + 16 + CREATE INDEX IF NOT EXISTS idx_indexing_audit_document 17 + ON indexing_audit(document_id, created_at DESC);

+726

packages/api/internal/store/postgres_store.go

··· 1 + package store 2 + 3 + import ( 4 + "context" 5 + "database/sql" 6 + "errors" 7 + "fmt" 8 + "strings" 9 + "time" 10 + ) 11 + 12 + type PostgresStore struct { 13 + db *sql.DB 14 + } 15 + 16 + type pgArgs struct { 17 + args []any 18 + } 19 + 20 + func newPGArgs(initial ...any) *pgArgs { 21 + return &pgArgs{args: append([]any{}, initial...)} 22 + } 23 + 24 + func (p *pgArgs) Add(value any) string { 25 + p.args = append(p.args, value) 26 + return fmt.Sprintf("$%d", len(p.args)) 27 + } 28 + 29 + func (p *pgArgs) Values() []any { 30 + return p.args 31 + } 32 + 33 + func (s *PostgresStore) UpsertDocument(ctx context.Context, doc *Document) error { 34 + doc.IndexedAt = time.Now().UTC().Format(time.RFC3339) 35 + 36 + _, err := s.db.ExecContext(ctx, ` 37 + INSERT INTO documents ( 38 + id, did, collection, rkey, at_uri, cid, record_type, 39 + title, body, summary, repo_did, repo_name, author_handle, 40 + tags_json, language, created_at, updated_at, indexed_at, web_url, deleted_at 41 + ) VALUES ( 42 + $1, $2, $3, $4, $5, $6, $7, 43 + $8, $9, $10, $11, $12, $13, 44 + $14, $15, $16, $17, $18, $19, $20 45 + ) 46 + ON CONFLICT(id) DO UPDATE SET 47 + did = excluded.did, 48 + collection = excluded.collection, 49 + rkey = excluded.rkey, 50 + at_uri = excluded.at_uri, 51 + cid = excluded.cid, 52 + record_type = excluded.record_type, 53 + title = excluded.title, 54 + body = excluded.body, 55 + summary = excluded.summary, 56 + repo_did = excluded.repo_did, 57 + repo_name = excluded.repo_name, 58 + author_handle = excluded.author_handle, 59 + tags_json = excluded.tags_json, 60 + language = excluded.language, 61 + created_at = excluded.created_at, 62 + updated_at = excluded.updated_at, 63 + indexed_at = excluded.indexed_at, 64 + web_url = excluded.web_url, 65 + deleted_at = excluded.deleted_at`, 66 + doc.ID, doc.DID, doc.Collection, doc.RKey, doc.ATURI, doc.CID, doc.RecordType, 67 + doc.Title, doc.Body, doc.Summary, doc.RepoDID, doc.RepoName, doc.AuthorHandle, 68 + doc.TagsJSON, doc.Language, doc.CreatedAt, doc.UpdatedAt, doc.IndexedAt, 69 + doc.WebURL, nullableStr(doc.DeletedAt), 70 + ) 71 + if err != nil { 72 + return fmt.Errorf("upsert document: %w", err) 73 + } 74 + return nil 75 + } 76 + 77 + func (s *PostgresStore) ListDocuments(ctx context.Context, filter DocumentFilter) ([]*Document, error) { 78 + query := `SELECT id, did, collection, rkey, at_uri, cid, record_type, 79 + title, body, summary, repo_did, repo_name, author_handle, 80 + tags_json, language, created_at, updated_at, indexed_at, web_url, deleted_at 81 + FROM documents WHERE deleted_at IS NULL` 82 + args := newPGArgs() 83 + 84 + if filter.DocumentID != "" { 85 + query += " AND id = " + args.Add(filter.DocumentID) 86 + } 87 + if filter.Collection != "" { 88 + query += " AND collection = " + args.Add(filter.Collection) 89 + } 90 + if filter.DID != "" { 91 + query += " AND did = " + args.Add(filter.DID) 92 + } 93 + 94 + rows, err := s.db.QueryContext(ctx, query, args.Values()...) 95 + if err != nil { 96 + return nil, fmt.Errorf("list documents: %w", err) 97 + } 98 + defer rows.Close() 99 + 100 + var docs []*Document 101 + for rows.Next() { 102 + doc := &Document{} 103 + var ( 104 + title, body, summary, repoDID, repoName, authorHandle sql.NullString 105 + tagsJSON, language, createdAt, updatedAt, webURL, deletedAt sql.NullString 106 + ) 107 + if err := rows.Scan( 108 + &doc.ID, &doc.DID, &doc.Collection, &doc.RKey, &doc.ATURI, &doc.CID, &doc.RecordType, 109 + &title, &body, &summary, &repoDID, &repoName, &authorHandle, 110 + &tagsJSON, &language, &createdAt, &updatedAt, &doc.IndexedAt, &webURL, &deletedAt, 111 + ); err != nil { 112 + return nil, fmt.Errorf("scan document: %w", err) 113 + } 114 + doc.Title = title.String 115 + doc.Body = body.String 116 + doc.Summary = summary.String 117 + doc.RepoDID = repoDID.String 118 + doc.RepoName = repoName.String 119 + doc.AuthorHandle = authorHandle.String 120 + doc.TagsJSON = tagsJSON.String 121 + doc.Language = language.String 122 + doc.CreatedAt = createdAt.String 123 + doc.UpdatedAt = updatedAt.String 124 + doc.WebURL = webURL.String 125 + doc.DeletedAt = deletedAt.String 126 + docs = append(docs, doc) 127 + } 128 + if err := rows.Err(); err != nil { 129 + return nil, fmt.Errorf("iterate documents: %w", err) 130 + } 131 + return docs, nil 132 + } 133 + 134 + func (s *PostgresStore) OptimizeSearchIndex(_ context.Context) error { 135 + return nil 136 + } 137 + 138 + func (s *PostgresStore) GetDocument(ctx context.Context, id string) (*Document, error) { 139 + row := s.db.QueryRowContext(ctx, ` 140 + SELECT id, did, collection, rkey, at_uri, cid, record_type, 141 + title, body, summary, repo_did, repo_name, author_handle, 142 + tags_json, language, created_at, updated_at, indexed_at, web_url, deleted_at 143 + FROM documents WHERE id = $1`, id) 144 + 145 + doc, err := scanDocument(row) 146 + if errors.Is(err, sql.ErrNoRows) { 147 + return nil, nil 148 + } 149 + if err != nil { 150 + return nil, fmt.Errorf("get document: %w", err) 151 + } 152 + return doc, nil 153 + } 154 + 155 + func (s *PostgresStore) MarkDeleted(ctx context.Context, id string) error { 156 + now := time.Now().UTC().Format(time.RFC3339) 157 + _, err := s.db.ExecContext(ctx, `UPDATE documents SET deleted_at = $1 WHERE id = $2`, now, id) 158 + if err != nil { 159 + return fmt.Errorf("mark deleted: %w", err) 160 + } 161 + return nil 162 + } 163 + 164 + func (s *PostgresStore) GetSyncState(ctx context.Context, consumer string) (*SyncState, error) { 165 + row := s.db.QueryRowContext(ctx, ` 166 + SELECT consumer_name, cursor, COALESCE(high_water_mark, ''), updated_at 167 + FROM sync_state WHERE consumer_name = $1`, consumer) 168 + 169 + ss := &SyncState{} 170 + err := row.Scan(&ss.ConsumerName, &ss.Cursor, &ss.HighWaterMark, &ss.UpdatedAt) 171 + if errors.Is(err, sql.ErrNoRows) { 172 + return nil, nil 173 + } 174 + if err != nil { 175 + return nil, fmt.Errorf("get sync state: %w", err) 176 + } 177 + return ss, nil 178 + } 179 + 180 + func (s *PostgresStore) SetSyncState(ctx context.Context, consumer string, cursor string) error { 181 + now := time.Now().UTC().Format(time.RFC3339) 182 + _, err := s.db.ExecContext(ctx, ` 183 + INSERT INTO sync_state (consumer_name, cursor, updated_at) VALUES ($1, $2, $3) 184 + ON CONFLICT(consumer_name) DO UPDATE SET 185 + cursor = excluded.cursor, 186 + updated_at = excluded.updated_at`, 187 + consumer, cursor, now, 188 + ) 189 + if err != nil { 190 + return fmt.Errorf("set sync state: %w", err) 191 + } 192 + return nil 193 + } 194 + 195 + func (s *PostgresStore) UpdateRecordState(ctx context.Context, subjectURI string, state string) error { 196 + now := time.Now().UTC().Format(time.RFC3339) 197 + _, err := s.db.ExecContext(ctx, ` 198 + INSERT INTO record_state (subject_uri, state, updated_at) VALUES ($1, $2, $3) 199 + ON CONFLICT(subject_uri) DO UPDATE SET 200 + state = excluded.state, 201 + updated_at = excluded.updated_at`, 202 + subjectURI, state, now, 203 + ) 204 + if err != nil { 205 + return fmt.Errorf("update record state: %w", err) 206 + } 207 + return nil 208 + } 209 + 210 + func (s *PostgresStore) UpsertIdentityHandle( 211 + ctx context.Context, did, handle string, isActive bool, status string, 212 + ) error { 213 + now := time.Now().UTC().Format(time.RFC3339) 214 + _, err := s.db.ExecContext(ctx, ` 215 + INSERT INTO identity_handles (did, handle, is_active, status, updated_at) 216 + VALUES ($1, $2, $3, $4, $5) 217 + ON CONFLICT(did) DO UPDATE SET 218 + handle = excluded.handle, 219 + is_active = excluded.is_active, 220 + status = excluded.status, 221 + updated_at = excluded.updated_at`, 222 + did, handle, isActive, nullableStr(status), now, 223 + ) 224 + if err != nil { 225 + return fmt.Errorf("upsert identity handle: %w", err) 226 + } 227 + return nil 228 + } 229 + 230 + func (s *PostgresStore) GetIdentityHandle(ctx context.Context, did string) (string, error) { 231 + var handle sql.NullString 232 + err := s.db.QueryRowContext(ctx, `SELECT handle FROM identity_handles WHERE did = $1`, did).Scan(&handle) 233 + if errors.Is(err, sql.ErrNoRows) { 234 + return "", nil 235 + } 236 + if err != nil { 237 + return "", fmt.Errorf("get identity handle: %w", err) 238 + } 239 + return handle.String, nil 240 + } 241 + 242 + func (s *PostgresStore) GetIndexingJob(ctx context.Context, documentID string) (*IndexingJob, error) { 243 + row := s.db.QueryRowContext(ctx, ` 244 + SELECT document_id, did, collection, rkey, cid, record_json, source, 245 + attempts, status, COALESCE(last_error, ''), scheduled_at, updated_at, 246 + COALESCE(lease_owner, ''), COALESCE(lease_expires_at, ''), 247 + COALESCE(completed_at, '') 248 + FROM indexing_jobs 249 + WHERE document_id = $1`, documentID) 250 + 251 + job, err := scanIndexingJob(row) 252 + if errors.Is(err, sql.ErrNoRows) { 253 + return nil, nil 254 + } 255 + if err != nil { 256 + return nil, fmt.Errorf("get indexing job: %w", err) 257 + } 258 + return job, nil 259 + } 260 + 261 + func (s *PostgresStore) EnqueueIndexingJob(ctx context.Context, input IndexingJobInput) error { 262 + now := time.Now().UTC().Format(time.RFC3339) 263 + source := strings.TrimSpace(input.Source) 264 + if source == "" { 265 + source = IndexSourceReadThrough 266 + } 267 + 268 + _, err := s.db.ExecContext(ctx, ` 269 + INSERT INTO indexing_jobs ( 270 + document_id, did, collection, rkey, cid, record_json, source, 271 + status, attempts, last_error, scheduled_at, updated_at, 272 + lease_owner, lease_expires_at, completed_at 273 + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, 0, NULL, $9, $10, '', '', '') 274 + ON CONFLICT(document_id) DO UPDATE SET 275 + did = excluded.did, 276 + collection = excluded.collection, 277 + rkey = excluded.rkey, 278 + cid = excluded.cid, 279 + record_json = excluded.record_json, 280 + source = excluded.source, 281 + status = excluded.status, 282 + last_error = NULL, 283 + scheduled_at = excluded.scheduled_at, 284 + updated_at = excluded.updated_at, 285 + lease_owner = '', 286 + lease_expires_at = '', 287 + completed_at = ''`, 288 + input.DocumentID, input.DID, input.Collection, input.RKey, input.CID, 289 + input.RecordJSON, source, IndexingJobPending, now, now, 290 + ) 291 + if err != nil { 292 + return fmt.Errorf("enqueue indexing job: %w", err) 293 + } 294 + return nil 295 + } 296 + 297 + func (s *PostgresStore) ClaimIndexingJob( 298 + ctx context.Context, workerID string, leaseUntil string, 299 + ) (*IndexingJob, error) { 300 + now := time.Now().UTC().Format(time.RFC3339) 301 + tx, err := s.db.BeginTx(ctx, nil) 302 + if err != nil { 303 + return nil, fmt.Errorf("begin claim tx: %w", err) 304 + } 305 + defer tx.Rollback() 306 + 307 + row := tx.QueryRowContext(ctx, ` 308 + WITH candidate AS ( 309 + SELECT document_id 310 + FROM indexing_jobs 311 + WHERE ( 312 + status = $1 313 + AND scheduled_at::timestamptz <= $2::timestamptz 314 + ) OR ( 315 + status = $3 316 + AND lease_expires_at IS NOT NULL 317 + AND lease_expires_at != '' 318 + AND lease_expires_at::timestamptz <= $2::timestamptz 319 + ) 320 + ORDER BY scheduled_at::timestamptz ASC, updated_at::timestamptz ASC 321 + FOR UPDATE SKIP LOCKED 322 + LIMIT 1 323 + ) 324 + UPDATE indexing_jobs jobs 325 + SET status = $4, updated_at = $5, lease_owner = $6, lease_expires_at = $7 326 + FROM candidate 327 + WHERE jobs.document_id = candidate.document_id 328 + RETURNING jobs.document_id, jobs.did, jobs.collection, jobs.rkey, jobs.cid, 329 + jobs.record_json, jobs.source, jobs.attempts, jobs.status, 330 + COALESCE(jobs.last_error, ''), jobs.scheduled_at, jobs.updated_at, 331 + COALESCE(jobs.lease_owner, ''), COALESCE(jobs.lease_expires_at, ''), 332 + COALESCE(jobs.completed_at, '')`, 333 + IndexingJobPending, now, IndexingJobProcessing, IndexingJobProcessing, 334 + now, workerID, leaseUntil, 335 + ) 336 + 337 + job, err := scanIndexingJob(row) 338 + if errors.Is(err, sql.ErrNoRows) { 339 + if err := tx.Commit(); err != nil { 340 + return nil, fmt.Errorf("commit empty claim tx: %w", err) 341 + } 342 + return nil, nil 343 + } 344 + if err != nil { 345 + return nil, fmt.Errorf("claim indexing job: %w", err) 346 + } 347 + if err := tx.Commit(); err != nil { 348 + return nil, fmt.Errorf("commit claim tx: %w", err) 349 + } 350 + return job, nil 351 + } 352 + 353 + func (s *PostgresStore) CompleteIndexingJob(ctx context.Context, documentID string) error { 354 + now := time.Now().UTC().Format(time.RFC3339) 355 + _, err := s.db.ExecContext(ctx, ` 356 + UPDATE indexing_jobs 357 + SET status = $1, updated_at = $2, completed_at = $3, 358 + lease_owner = '', lease_expires_at = '', last_error = NULL 359 + WHERE document_id = $4`, 360 + IndexingJobCompleted, now, now, documentID, 361 + ) 362 + if err != nil { 363 + return fmt.Errorf("complete indexing job: %w", err) 364 + } 365 + return nil 366 + } 367 + 368 + func (s *PostgresStore) RetryIndexingJob( 369 + ctx context.Context, documentID string, nextScheduledAt string, lastError string, 370 + ) error { 371 + now := time.Now().UTC().Format(time.RFC3339) 372 + _, err := s.db.ExecContext(ctx, ` 373 + UPDATE indexing_jobs 374 + SET status = $1, attempts = attempts + 1, last_error = $2, scheduled_at = $3, 375 + updated_at = $4, lease_owner = '', lease_expires_at = '' 376 + WHERE document_id = $5`, 377 + IndexingJobPending, lastError, nextScheduledAt, now, documentID, 378 + ) 379 + if err != nil { 380 + return fmt.Errorf("retry indexing job: %w", err) 381 + } 382 + return nil 383 + } 384 + 385 + func (s *PostgresStore) FailIndexingJob( 386 + ctx context.Context, documentID string, status string, lastError string, 387 + ) error { 388 + now := time.Now().UTC().Format(time.RFC3339) 389 + _, err := s.db.ExecContext(ctx, ` 390 + UPDATE indexing_jobs 391 + SET status = $1, attempts = attempts + 1, last_error = $2, updated_at = $3, 392 + lease_owner = '', lease_expires_at = '' 393 + WHERE document_id = $4`, 394 + status, lastError, now, documentID, 395 + ) 396 + if err != nil { 397 + return fmt.Errorf("fail indexing job: %w", err) 398 + } 399 + return nil 400 + } 401 + 402 + func (s *PostgresStore) ListIndexingJobs( 403 + ctx context.Context, filter IndexingJobFilter, 404 + ) ([]*IndexingJob, error) { 405 + query := ` 406 + SELECT document_id, did, collection, rkey, cid, record_json, source, 407 + attempts, status, COALESCE(last_error, ''), scheduled_at, updated_at, 408 + COALESCE(lease_owner, ''), COALESCE(lease_expires_at, ''), 409 + COALESCE(completed_at, '') 410 + FROM indexing_jobs 411 + WHERE 1 = 1` 412 + args := newPGArgs() 413 + 414 + if filter.DocumentID != "" { 415 + query += " AND document_id = " + args.Add(filter.DocumentID) 416 + } 417 + if filter.Status != "" { 418 + query += " AND status = " + args.Add(filter.Status) 419 + } 420 + if filter.Source != "" { 421 + query += " AND source = " + args.Add(filter.Source) 422 + } 423 + 424 + query += " ORDER BY updated_at::timestamptz DESC" 425 + limit := filter.Limit 426 + if limit <= 0 { 427 + limit = 50 428 + } 429 + query += " LIMIT " + args.Add(limit) 430 + if filter.Offset > 0 { 431 + query += " OFFSET " + args.Add(filter.Offset) 432 + } 433 + 434 + rows, err := s.db.QueryContext(ctx, query, args.Values()...) 435 + if err != nil { 436 + return nil, fmt.Errorf("list indexing jobs: %w", err) 437 + } 438 + defer rows.Close() 439 + 440 + var jobs []*IndexingJob 441 + for rows.Next() { 442 + job, err := scanIndexingJob(rows) 443 + if err != nil { 444 + return nil, fmt.Errorf("scan indexing job: %w", err) 445 + } 446 + jobs = append(jobs, job) 447 + } 448 + if err := rows.Err(); err != nil { 449 + return nil, fmt.Errorf("iterate indexing jobs: %w", err) 450 + } 451 + return jobs, nil 452 + } 453 + 454 + func (s *PostgresStore) GetIndexingJobStats(ctx context.Context) (*IndexingJobStats, error) { 455 + row := s.db.QueryRowContext(ctx, ` 456 + SELECT 457 + COUNT(*) FILTER (WHERE status = $1), 458 + COUNT(*) FILTER (WHERE status = $2), 459 + COUNT(*) FILTER (WHERE status = $3), 460 + COUNT(*) FILTER (WHERE status = $4), 461 + COUNT(*) FILTER (WHERE status = $5), 462 + COALESCE(MIN(CASE WHEN status = $1 THEN scheduled_at END), ''), 463 + COALESCE(MIN(CASE WHEN status = $2 THEN updated_at END), ''), 464 + COALESCE(MAX(completed_at), ''), 465 + COALESCE(MAX(updated_at), '') 466 + FROM indexing_jobs`, 467 + IndexingJobPending, IndexingJobProcessing, IndexingJobCompleted, 468 + IndexingJobFailed, IndexingJobDeadLetter, 469 + ) 470 + 471 + stats := &IndexingJobStats{} 472 + err := row.Scan( 473 + &stats.Pending, &stats.Processing, &stats.Completed, &stats.Failed, 474 + &stats.DeadLetter, &stats.OldestPendingAt, &stats.OldestRunningAt, 475 + &stats.LastCompletedAt, &stats.LastProcessedAt, 476 + ) 477 + if err != nil { 478 + return nil, fmt.Errorf("get indexing job stats: %w", err) 479 + } 480 + return stats, nil 481 + } 482 + 483 + func (s *PostgresStore) AppendIndexingAudit(ctx context.Context, input IndexingAuditInput) error { 484 + now := time.Now().UTC().Format(time.RFC3339) 485 + _, err := s.db.ExecContext(ctx, ` 486 + INSERT INTO indexing_audit ( 487 + source, document_id, collection, cid, decision, attempt, error, created_at 488 + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)`, 489 + input.Source, input.DocumentID, input.Collection, input.CID, 490 + input.Decision, input.Attempt, nullableStr(input.Error), now, 491 + ) 492 + if err != nil { 493 + return fmt.Errorf("append indexing audit: %w", err) 494 + } 495 + return nil 496 + } 497 + 498 + func (s *PostgresStore) ListIndexingAudit( 499 + ctx context.Context, filter IndexingAuditFilter, 500 + ) ([]*IndexingAuditEntry, error) { 501 + query := ` 502 + SELECT id, source, document_id, collection, cid, decision, 503 + attempt, COALESCE(error, ''), created_at 504 + FROM indexing_audit 505 + WHERE 1 = 1` 506 + args := newPGArgs() 507 + 508 + if filter.DocumentID != "" { 509 + query += " AND document_id = " + args.Add(filter.DocumentID) 510 + } 511 + if filter.Source != "" { 512 + query += " AND source = " + args.Add(filter.Source) 513 + } 514 + if filter.Decision != "" { 515 + query += " AND decision = " + args.Add(filter.Decision) 516 + } 517 + 518 + query += " ORDER BY created_at::timestamptz DESC" 519 + limit := filter.Limit 520 + if limit <= 0 { 521 + limit = 50 522 + } 523 + query += " LIMIT " + args.Add(limit) 524 + if filter.Offset > 0 { 525 + query += " OFFSET " + args.Add(filter.Offset) 526 + } 527 + 528 + rows, err := s.db.QueryContext(ctx, query, args.Values()...) 529 + if err != nil { 530 + return nil, fmt.Errorf("list indexing audit: %w", err) 531 + } 532 + defer rows.Close() 533 + 534 + var entries []*IndexingAuditEntry 535 + for rows.Next() { 536 + entry := &IndexingAuditEntry{} 537 + if err := rows.Scan( 538 + &entry.ID, &entry.Source, &entry.DocumentID, &entry.Collection, 539 + &entry.CID, &entry.Decision, &entry.Attempt, &entry.Error, 540 + &entry.CreatedAt, 541 + ); err != nil { 542 + return nil, fmt.Errorf("scan indexing audit: %w", err) 543 + } 544 + entries = append(entries, entry) 545 + } 546 + if err := rows.Err(); err != nil { 547 + return nil, fmt.Errorf("iterate indexing audit: %w", err) 548 + } 549 + return entries, nil 550 + } 551 + 552 + func (s *PostgresStore) GetFollowSubjects(ctx context.Context, did string) ([]string, error) { 553 + rows, err := s.db.QueryContext(ctx, ` 554 + SELECT DISTINCT repo_did 555 + FROM documents 556 + WHERE did = $1 557 + AND collection = 'sh.tangled.graph.follow' 558 + AND deleted_at IS NULL 559 + AND repo_did IS NOT NULL 560 + AND repo_did != ''`, 561 + did, 562 + ) 563 + if err != nil { 564 + return nil, fmt.Errorf("get follow subjects: %w", err) 565 + } 566 + defer rows.Close() 567 + 568 + var subjects []string 569 + for rows.Next() { 570 + var subject string 571 + if err := rows.Scan(&subject); err != nil { 572 + return nil, fmt.Errorf("scan follow subject: %w", err) 573 + } 574 + subjects = append(subjects, subject) 575 + } 576 + if err := rows.Err(); err != nil { 577 + return nil, fmt.Errorf("iterate follow subjects: %w", err) 578 + } 579 + return subjects, nil 580 + } 581 + 582 + func (s *PostgresStore) GetRepoCollaborators(ctx context.Context, repoOwnerDID string) ([]string, error) { 583 + rows, err := s.db.QueryContext(ctx, ` 584 + SELECT DISTINCT did 585 + FROM documents 586 + WHERE repo_did = $1 587 + AND did != $1 588 + AND deleted_at IS NULL 589 + AND collection IN ( 590 + 'sh.tangled.repo.issue', 591 + 'sh.tangled.repo.pull', 592 + 'sh.tangled.repo.issue.comment', 593 + 'sh.tangled.repo.pull.comment' 594 + )`, 595 + repoOwnerDID, 596 + ) 597 + if err != nil { 598 + return nil, fmt.Errorf("get repo collaborators: %w", err) 599 + } 600 + defer rows.Close() 601 + 602 + var collaborators []string 603 + for rows.Next() { 604 + var collaborator string 605 + if err := rows.Scan(&collaborator); err != nil { 606 + return nil, fmt.Errorf("scan collaborator: %w", err) 607 + } 608 + collaborators = append(collaborators, collaborator) 609 + } 610 + if err := rows.Err(); err != nil { 611 + return nil, fmt.Errorf("iterate collaborators: %w", err) 612 + } 613 + return collaborators, nil 614 + } 615 + 616 + func (s *PostgresStore) CountDocuments(ctx context.Context) (int64, error) { 617 + var n int64 618 + err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM documents WHERE deleted_at IS NULL`).Scan(&n) 619 + if err != nil { 620 + return 0, fmt.Errorf("count documents: %w", err) 621 + } 622 + return n, nil 623 + } 624 + 625 + func (s *PostgresStore) CountPendingIndexingJobs(ctx context.Context) (int64, error) { 626 + var n int64 627 + err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM indexing_jobs WHERE status = 'pending'`).Scan(&n) 628 + if err != nil { 629 + return 0, fmt.Errorf("count pending indexing jobs: %w", err) 630 + } 631 + return n, nil 632 + } 633 + 634 + func (s *PostgresStore) InsertJetstreamEvent(ctx context.Context, event *JetstreamEvent, maxEvents int) error { 635 + tx, err := s.db.BeginTx(ctx, nil) 636 + if err != nil { 637 + return fmt.Errorf("begin insert jetstream event tx: %w", err) 638 + } 639 + defer tx.Rollback() 640 + 641 + _, err = tx.ExecContext(ctx, ` 642 + INSERT INTO jetstream_events (time_us, did, kind, collection, rkey, operation, payload, received_at) 643 + VALUES ($1, $2, $3, $4, $5, $6, $7, $8)`, 644 + event.TimeUS, event.DID, event.Kind, nullableStr(event.Collection), 645 + nullableStr(event.RKey), nullableStr(event.Operation), event.Payload, event.ReceivedAt, 646 + ) 647 + if err != nil { 648 + return fmt.Errorf("insert jetstream event: %w", err) 649 + } 650 + 651 + if maxEvents > 0 { 652 + _, err = tx.ExecContext(ctx, ` 653 + DELETE FROM jetstream_events 654 + WHERE id IN ( 655 + SELECT id FROM jetstream_events 656 + ORDER BY time_us DESC 657 + OFFSET $1 658 + )`, maxEvents) 659 + if err != nil { 660 + return fmt.Errorf("trim jetstream events: %w", err) 661 + } 662 + } 663 + 664 + if err := tx.Commit(); err != nil { 665 + return fmt.Errorf("commit insert jetstream event tx: %w", err) 666 + } 667 + return nil 668 + } 669 + 670 + func (s *PostgresStore) ListJetstreamEvents( 671 + ctx context.Context, filter JetstreamEventFilter, 672 + ) ([]*JetstreamEvent, error) { 673 + query := ` 674 + SELECT id, time_us, did, kind, 675 + COALESCE(collection, ''), COALESCE(rkey, ''), COALESCE(operation, ''), 676 + payload, received_at 677 + FROM jetstream_events WHERE 1=1` 678 + args := newPGArgs() 679 + 680 + if filter.Collection != "" { 681 + query += " AND collection = " + args.Add(filter.Collection) 682 + } 683 + if filter.DID != "" { 684 + query += " AND did = " + args.Add(filter.DID) 685 + } 686 + if filter.Operation != "" { 687 + query += " AND operation = " + args.Add(filter.Operation) 688 + } 689 + 690 + query += " ORDER BY time_us DESC" 691 + limit := filter.Limit 692 + if limit <= 0 { 693 + limit = 50 694 + } 695 + query += " LIMIT " + args.Add(limit) 696 + if filter.Offset > 0 { 697 + query += " OFFSET " + args.Add(filter.Offset) 698 + } 699 + 700 + rows, err := s.db.QueryContext(ctx, query, args.Values()...) 701 + if err != nil { 702 + return nil, fmt.Errorf("list jetstream events: %w", err) 703 + } 704 + defer rows.Close() 705 + 706 + var events []*JetstreamEvent 707 + for rows.Next() { 708 + e := &JetstreamEvent{} 709 + if err := rows.Scan( 710 + &e.ID, &e.TimeUS, &e.DID, &e.Kind, 711 + &e.Collection, &e.RKey, &e.Operation, 712 + &e.Payload, &e.ReceivedAt, 713 + ); err != nil { 714 + return nil, fmt.Errorf("scan jetstream event: %w", err) 715 + } 716 + events = append(events, e) 717 + } 718 + if err := rows.Err(); err != nil { 719 + return nil, fmt.Errorf("iterate jetstream events: %w", err) 720 + } 721 + return events, nil 722 + } 723 + 724 + func (s *PostgresStore) Ping(ctx context.Context) error { 725 + return s.db.PingContext(ctx) 726 + }

+25 -22

packages/api/internal/store/sql_store.go

··· 8 8 "time" 9 9 ) 10 10 11 - // SQLStore implements Store against a libSQL database. 12 - type SQLStore struct { 11 + // SQLiteStore implements Store against a SQLite database. 12 + type SQLiteStore struct { 13 13 db *sql.DB 14 14 } 15 15 16 16 // New wraps an open *sql.DB in a Store implementation. 17 - func New(db *sql.DB) Store { 18 - return &SQLStore{db: db} 17 + func New(url string, db *sql.DB) Store { 18 + if DetectBackend(url) == BackendPostgres { 19 + return &PostgresStore{db: db} 20 + } 21 + return &SQLiteStore{db: db} 19 22 } 20 23 21 - func (s *SQLStore) UpsertDocument(ctx context.Context, doc *Document) error { 24 + func (s *SQLiteStore) UpsertDocument(ctx context.Context, doc *Document) error { 22 25 doc.IndexedAt = time.Now().UTC().Format(time.RFC3339) 23 26 tx, err := s.db.BeginTx(ctx, nil) 24 27 if err != nil { ··· 68 71 return nil 69 72 } 70 73 71 - func (s *SQLStore) ListDocuments(ctx context.Context, filter DocumentFilter) ([]*Document, error) { 74 + func (s *SQLiteStore) ListDocuments(ctx context.Context, filter DocumentFilter) ([]*Document, error) { 72 75 query := `SELECT id, did, collection, rkey, at_uri, cid, record_type, 73 76 title, body, summary, repo_did, repo_name, author_handle, 74 77 tags_json, language, created_at, updated_at, indexed_at, web_url, deleted_at ··· 128 131 return docs, nil 129 132 } 130 133 131 - func (s *SQLStore) OptimizeFTS(ctx context.Context) error { 134 + func (s *SQLiteStore) OptimizeSearchIndex(ctx context.Context) error { 132 135 _, err := s.db.ExecContext(ctx, `INSERT INTO documents_fts(documents_fts) VALUES('optimize')`) 133 136 if err != nil { 134 - return fmt.Errorf("optimize fts: %w", err) 137 + return fmt.Errorf("optimize search index: %w", err) 135 138 } 136 139 return nil 137 140 } 138 141 139 - func (s *SQLStore) GetDocument(ctx context.Context, id string) (*Document, error) { 142 + func (s *SQLiteStore) GetDocument(ctx context.Context, id string) (*Document, error) { 140 143 row := s.db.QueryRowContext(ctx, ` 141 144 SELECT id, did, collection, rkey, at_uri, cid, record_type, 142 145 title, body, summary, repo_did, repo_name, author_handle, ··· 153 156 return doc, nil 154 157 } 155 158 156 - func (s *SQLStore) MarkDeleted(ctx context.Context, id string) error { 159 + func (s *SQLiteStore) MarkDeleted(ctx context.Context, id string) error { 157 160 now := time.Now().UTC().Format(time.RFC3339) 158 161 tx, err := s.db.BeginTx(ctx, nil) 159 162 if err != nil { ··· 175 178 return nil 176 179 } 177 180 178 - func (s *SQLStore) GetSyncState(ctx context.Context, consumer string) (*SyncState, error) { 181 + func (s *SQLiteStore) GetSyncState(ctx context.Context, consumer string) (*SyncState, error) { 179 182 row := s.db.QueryRowContext(ctx, ` 180 183 SELECT consumer_name, cursor, high_water_mark, updated_at 181 184 FROM sync_state WHERE consumer_name = ?`, consumer) ··· 193 196 return ss, nil 194 197 } 195 198 196 - func (s *SQLStore) SetSyncState(ctx context.Context, consumer string, cursor string) error { 199 + func (s *SQLiteStore) SetSyncState(ctx context.Context, consumer string, cursor string) error { 197 200 now := time.Now().UTC().Format(time.RFC3339) 198 201 _, err := s.db.ExecContext(ctx, ` 199 202 INSERT INTO sync_state (consumer_name, cursor, updated_at) VALUES (?, ?, ?) ··· 208 211 return nil 209 212 } 210 213 211 - func (s *SQLStore) UpdateRecordState(ctx context.Context, subjectURI string, state string) error { 214 + func (s *SQLiteStore) UpdateRecordState(ctx context.Context, subjectURI string, state string) error { 212 215 now := time.Now().UTC().Format(time.RFC3339) 213 216 _, err := s.db.ExecContext(ctx, ` 214 217 INSERT INTO record_state (subject_uri, state, updated_at) VALUES (?, ?, ?) ··· 223 226 return nil 224 227 } 225 228 226 - func (s *SQLStore) UpsertIdentityHandle(ctx context.Context, did, handle string, isActive bool, status string) error { 229 + func (s *SQLiteStore) UpsertIdentityHandle(ctx context.Context, did, handle string, isActive bool, status string) error { 227 230 now := time.Now().UTC().Format(time.RFC3339) 228 231 _, err := s.db.ExecContext(ctx, ` 229 232 INSERT INTO identity_handles (did, handle, is_active, status, updated_at) ··· 241 244 return nil 242 245 } 243 246 244 - func (s *SQLStore) GetIdentityHandle(ctx context.Context, did string) (string, error) { 247 + func (s *SQLiteStore) GetIdentityHandle(ctx context.Context, did string) (string, error) { 245 248 var handle sql.NullString 246 249 err := s.db.QueryRowContext(ctx, `SELECT handle FROM identity_handles WHERE did = ?`, did).Scan(&handle) 247 250 if errors.Is(err, sql.ErrNoRows) { ··· 253 256 return handle.String, nil 254 257 } 255 258 256 - func (s *SQLStore) GetFollowSubjects(ctx context.Context, did string) ([]string, error) { 259 + func (s *SQLiteStore) GetFollowSubjects(ctx context.Context, did string) ([]string, error) { 257 260 rows, err := s.db.QueryContext(ctx, ` 258 261 SELECT DISTINCT repo_did 259 262 FROM documents ··· 283 286 return subjects, nil 284 287 } 285 288 286 - func (s *SQLStore) GetRepoCollaborators(ctx context.Context, repoOwnerDID string) ([]string, error) { 289 + func (s *SQLiteStore) GetRepoCollaborators(ctx context.Context, repoOwnerDID string) ([]string, error) { 287 290 rows, err := s.db.QueryContext(ctx, ` 288 291 SELECT DISTINCT did 289 292 FROM documents ··· 317 320 return collaborators, nil 318 321 } 319 322 320 - func (s *SQLStore) CountDocuments(ctx context.Context) (int64, error) { 323 + func (s *SQLiteStore) CountDocuments(ctx context.Context) (int64, error) { 321 324 var n int64 322 325 err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM documents WHERE deleted_at IS NULL`).Scan(&n) 323 326 if err != nil { ··· 326 329 return n, nil 327 330 } 328 331 329 - func (s *SQLStore) CountPendingIndexingJobs(ctx context.Context) (int64, error) { 332 + func (s *SQLiteStore) CountPendingIndexingJobs(ctx context.Context) (int64, error) { 330 333 var n int64 331 334 err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM indexing_jobs WHERE status = 'pending'`).Scan(&n) 332 335 if err != nil { ··· 335 338 return n, nil 336 339 } 337 340 338 - func (s *SQLStore) InsertJetstreamEvent(ctx context.Context, event *JetstreamEvent, maxEvents int) error { 341 + func (s *SQLiteStore) InsertJetstreamEvent(ctx context.Context, event *JetstreamEvent, maxEvents int) error { 339 342 tx, err := s.db.BeginTx(ctx, nil) 340 343 if err != nil { 341 344 return fmt.Errorf("begin insert jetstream event tx: %w", err) ··· 370 373 return nil 371 374 } 372 375 373 - func (s *SQLStore) ListJetstreamEvents(ctx context.Context, filter JetstreamEventFilter) ([]*JetstreamEvent, error) { 376 + func (s *SQLiteStore) ListJetstreamEvents(ctx context.Context, filter JetstreamEventFilter) ([]*JetstreamEvent, error) { 374 377 query := ` 375 378 SELECT id, time_us, did, kind, 376 379 COALESCE(collection, ''), COALESCE(rkey, ''), COALESCE(operation, ''), ··· 429 432 return events, nil 430 433 } 431 434 432 - func (s *SQLStore) Ping(ctx context.Context) error { 435 + func (s *SQLiteStore) Ping(ctx context.Context) error { 433 436 return s.db.PingContext(ctx) 434 437 } 435 438

+1 -1

packages/api/internal/store/store.go

··· 109 109 GetDocument(ctx context.Context, id string) (*Document, error) 110 110 MarkDeleted(ctx context.Context, id string) error 111 111 ListDocuments(ctx context.Context, filter DocumentFilter) ([]*Document, error) 112 - OptimizeFTS(ctx context.Context) error 112 + OptimizeSearchIndex(ctx context.Context) error 113 113 GetSyncState(ctx context.Context, consumer string) (*SyncState, error) 114 114 SetSyncState(ctx context.Context, consumer string, cursor string) error 115 115 UpdateRecordState(ctx context.Context, subjectURI string, state string) error

+2 -2

packages/api/internal/store/store_test.go

··· 16 16 dbPath := filepath.Join(dir, "test.db") 17 17 url := "file:" + dbPath 18 18 19 - db, err := store.Open(url, "") 19 + db, err := store.Open(url) 20 20 if err != nil { 21 21 t.Fatalf("open: %v", err) 22 22 } ··· 29 29 t.Fatalf("migrate: %v", err) 30 30 } 31 31 32 - st := store.New(db) 32 + st := store.New(url, db) 33 33 ctx := context.Background() 34 34 35 35 t.Run("upsert and get document", func(t *testing.T) {

+90 -45

packages/api/internal/view/api-docs.json

··· 6 6 "summary": "Unified search across all indexed Tangled documents.", 7 7 "details": "Delegates to keyword search. The mode parameter is accepted but only \"keyword\" is supported.", 8 8 "queryParams": [ 9 - { "name": "q", "type": "string", "required": true, "description": "Search query. Supports boolean operators, phrases, and prefix matching." }, 10 - { "name": "mode", "type": "string", "required": false, "description": "Search mode. Only \"keyword\" is supported." }, 9 + { 10 + "name": "q", 11 + "type": "string", 12 + "required": true, 13 + "description": "Search query. Supports boolean operators, phrases, and prefix matching." 14 + }, 15 + { 16 + "name": "mode", 17 + "type": "string", 18 + "required": false, 19 + "description": "Search mode. Only \"keyword\" is supported." 20 + }, 11 21 { "name": "limit", "type": "int", "required": false, "description": "Results per page (1–100). Default 20." }, 12 22 { "name": "offset", "type": "int", "required": false, "description": "Pagination offset. Default 0." }, 13 - { "name": "collection", "type": "string", "required": false, "description": "Filter by ATProto collection NSID." }, 14 - { "name": "type", "type": "string", "required": false, "description": "Filter by record type: repo, issue, pull, profile, string." }, 23 + { 24 + "name": "collection", 25 + "type": "string", 26 + "required": false, 27 + "description": "Filter by ATProto collection NSID." 28 + }, 29 + { 30 + "name": "type", 31 + "type": "string", 32 + "required": false, 33 + "description": "Filter by record type: repo, issue, pull, profile, string." 34 + }, 15 35 { "name": "author", "type": "string", "required": false, "description": "Filter by author handle or DID." }, 16 36 { "name": "repo", "type": "string", "required": false, "description": "Filter by repository name." }, 17 37 { "name": "language", "type": "string", "required": false, "description": "Filter by programming language." }, ··· 24 44 "page": "search", 25 45 "route": "/search/keyword", 26 46 "method": "GET", 27 - "summary": "FTS5 keyword search over indexed Tangled documents.", 47 + "summary": "PostgreSQL full-text keyword search over indexed Tangled documents.", 28 48 "details": "Same parameters and response shape as GET /search. Use this route to bypass mode negotiation.", 29 49 "queryParams": [ 30 - { "name": "q", "type": "string", "required": true, "description": "Search query. Supports boolean operators, phrases, and prefix matching." }, 50 + { 51 + "name": "q", 52 + "type": "string", 53 + "required": true, 54 + "description": "Search query. Supports boolean operators, phrases, and prefix matching." 55 + }, 31 56 { "name": "limit", "type": "int", "required": false, "description": "Results per page (1–100). Default 20." }, 32 57 { "name": "offset", "type": "int", "required": false, "description": "Pagination offset. Default 0." }, 33 - { "name": "collection", "type": "string", "required": false, "description": "Filter by ATProto collection NSID." }, 34 - { "name": "type", "type": "string", "required": false, "description": "Filter by record type: repo, issue, pull, profile, string." }, 58 + { 59 + "name": "collection", 60 + "type": "string", 61 + "required": false, 62 + "description": "Filter by ATProto collection NSID." 63 + }, 64 + { 65 + "name": "type", 66 + "type": "string", 67 + "required": false, 68 + "description": "Filter by record type: repo, issue, pull, profile, string." 69 + }, 35 70 { "name": "author", "type": "string", "required": false, "description": "Filter by author handle or DID." }, 36 71 { "name": "repo", "type": "string", "required": false, "description": "Filter by repository name." }, 37 72 { "name": "language", "type": "string", "required": false, "description": "Filter by programming language." }, ··· 46 81 "method": "GET", 47 82 "summary": "Fetch a single indexed document by its stable ID.", 48 83 "details": "The document ID has the format did|collection|rkey. Deleted documents return 404.", 49 - "pathParams": [ 50 - { "name": "id", "type": "string", "description": "Stable document ID (did|collection|rkey)." } 51 - ] 84 + "pathParams": [{ "name": "id", "type": "string", "description": "Stable document ID (did|collection|rkey)." }] 52 85 }, 53 86 { 54 87 "page": "actors", ··· 56 89 "method": "GET", 57 90 "summary": "Tangled actor profile with optional Bluesky social data.", 58 91 "details": "Resolves handle to DID, fetches sh.tangled.actor.profile from the actor's PDS. If the profile record has bluesky:true, Bluesky display name and avatar are also returned.", 59 - "pathParams": [ 60 - { "name": "handle", "type": "string", "description": "Tangled handle or DID." } 61 - ] 92 + "pathParams": [{ "name": "handle", "type": "string", "description": "Tangled handle or DID." }] 62 93 }, 63 94 { 64 95 "page": "actors", 65 96 "route": "/actors/{handle}/repos", 66 97 "method": "GET", 67 98 "summary": "List all sh.tangled.repo records for an actor.", 68 - "pathParams": [ 69 - { "name": "handle", "type": "string", "description": "Tangled handle or DID." } 70 - ] 99 + "pathParams": [{ "name": "handle", "type": "string", "description": "Tangled handle or DID." }] 71 100 }, 72 101 { 73 102 "page": "actors", ··· 89 118 { "name": "repo", "type": "string", "description": "Repository name." } 90 119 ], 91 120 "queryParams": [ 92 - { "name": "ref", "type": "string", "required": false, "description": "Branch, tag, or commit SHA. Defaults to default branch." }, 121 + { 122 + "name": "ref", 123 + "type": "string", 124 + "required": false, 125 + "description": "Branch, tag, or commit SHA. Defaults to default branch." 126 + }, 93 127 { "name": "path", "type": "string", "required": false, "description": "Subdirectory path within the tree." } 94 128 ] 95 129 }, ··· 118 152 ], 119 153 "queryParams": [ 120 154 { "name": "ref", "type": "string", "required": false, "description": "Branch, tag, or commit SHA." }, 121 - { "name": "path", "type": "string", "required": false, "description": "Limit log to commits touching this path." }, 155 + { 156 + "name": "path", 157 + "type": "string", 158 + "required": false, 159 + "description": "Limit log to commits touching this path." 160 + }, 122 161 { "name": "limit", "type": "int", "required": false, "description": "Max commits to return." }, 123 - { "name": "cursor", "type": "string", "required": false, "description": "Pagination cursor from a previous response." } 162 + { 163 + "name": "cursor", 164 + "type": "string", 165 + "required": false, 166 + "description": "Pagination cursor from a previous response." 167 + } 124 168 ] 125 169 }, 126 170 { ··· 222 266 "route": "/actors/{handle}/issues", 223 267 "method": "GET", 224 268 "summary": "All issues authored by an actor, pre-joined with state.", 225 - "pathParams": [ 226 - { "name": "handle", "type": "string", "description": "Tangled handle or DID." } 227 - ] 269 + "pathParams": [{ "name": "handle", "type": "string", "description": "Tangled handle or DID." }] 228 270 }, 229 271 { 230 272 "page": "actors", 231 273 "route": "/actors/{handle}/pulls", 232 274 "method": "GET", 233 275 "summary": "All pull requests authored by an actor, pre-joined with status.", 234 - "pathParams": [ 235 - { "name": "handle", "type": "string", "description": "Tangled handle or DID." } 236 - ] 276 + "pathParams": [{ "name": "handle", "type": "string", "description": "Tangled handle or DID." }] 237 277 }, 238 278 { 239 279 "page": "actors", 240 280 "route": "/actors/{handle}/following", 241 281 "method": "GET", 242 282 "summary": "sh.tangled.graph.follow records for an actor.", 243 - "pathParams": [ 244 - { "name": "handle", "type": "string", "description": "Tangled handle or DID." } 245 - ] 283 + "pathParams": [{ "name": "handle", "type": "string", "description": "Tangled handle or DID." }] 246 284 }, 247 285 { 248 286 "page": "actors", 249 287 "route": "/actors/{handle}/strings", 250 288 "method": "GET", 251 289 "summary": "sh.tangled.string records posted by an actor.", 252 - "pathParams": [ 253 - { "name": "handle", "type": "string", "description": "Tangled handle or DID." } 254 - ] 290 + "pathParams": [{ "name": "handle", "type": "string", "description": "Tangled handle or DID." }] 255 291 }, 256 292 { 257 293 "page": "issues", ··· 313 349 "method": "GET", 314 350 "summary": "Fetch a DID document for did:plc or did:web identifiers.", 315 351 "details": "did:plc resolves via plc.directory. did:web resolves via https://{host}/.well-known/did.json. Other DID methods return 400.", 316 - "pathParams": [ 317 - { "name": "did", "type": "string", "description": "A did:plc:… or did:web:… identifier." } 318 - ] 352 + "pathParams": [{ "name": "did", "type": "string", "description": "A did:plc:… or did:web:… identifier." }] 319 353 }, 320 354 { 321 355 "page": "activity", ··· 326 360 "queryParams": [ 327 361 { "name": "limit", "type": "int", "required": false, "description": "Results per page (1–200). Default 50." }, 328 362 { "name": "offset", "type": "int", "required": false, "description": "Pagination offset. Default 0." }, 329 - { "name": "collection", "type": "string", "required": false, "description": "Filter by ATProto collection NSID." }, 330 - { "name": "operation", "type": "string", "required": false, "description": "Filter by operation: create, update, delete." }, 363 + { 364 + "name": "collection", 365 + "type": "string", 366 + "required": false, 367 + "description": "Filter by ATProto collection NSID." 368 + }, 369 + { 370 + "name": "operation", 371 + "type": "string", 372 + "required": false, 373 + "description": "Filter by operation: create, update, delete." 374 + }, 331 375 { "name": "did", "type": "string", "required": false, "description": "Filter by actor DID." } 332 376 ] 333 377 }, ··· 344 388 "method": "GET", 345 389 "summary": "Social summary for an actor: follower count via Constellation.", 346 390 "details": "Best-effort — if Constellation is unavailable follower_count is 0 rather than an error.", 347 - "pathParams": [ 348 - { "name": "did", "type": "string", "description": "ATProto DID, e.g. did:plc:abc." } 349 - ] 391 + "pathParams": [{ "name": "did", "type": "string", "description": "ATProto DID, e.g. did:plc:abc." }] 350 392 }, 351 393 { 352 394 "page": "profiles", ··· 356 398 "details": "Used by the mobile client for star counts (source: sh.tangled.graph.star) and follower counts (source: sh.tangled.graph.follow).", 357 399 "queryParams": [ 358 400 { "name": "subject", "type": "string", "required": true, "description": "AT URI or DID being linked to." }, 359 - { "name": "source", "type": "string", "required": true, "description": "Collection NSID of the backlink records." } 401 + { 402 + "name": "source", 403 + "type": "string", 404 + "required": true, 405 + "description": "Collection NSID of the backlink records." 406 + } 360 407 ] 361 408 }, 362 409 { ··· 387 434 "method": "GET", 388 435 "summary": "Proxy a GET request to the Bluesky public API.", 389 436 "details": "Forwards to https://public.api.bsky.app/xrpc/{nsid}. No authentication is forwarded; only public endpoints are accessible.", 390 - "pathParams": [ 391 - { "name": "nsid", "type": "string", "description": "XRPC NSID to call on the Bluesky public API." } 392 - ] 437 + "pathParams": [{ "name": "nsid", "type": "string", "description": "XRPC NSID to call on the Bluesky public API." }] 393 438 }, 394 439 { 395 440 "page": "health", ··· 403 448 "route": "/readyz", 404 449 "method": "GET", 405 450 "summary": "Readiness probe. Checks database reachability.", 406 - "details": "Executes a test query against the Turso/SQLite database. Returns 503 if the database cannot be reached." 451 + "details": "Checks database reachability." 407 452 } 408 453 ]

+1 -1

packages/api/internal/view/templates/docs/index.html

··· 20 20 <thead><tr><th>Resource</th><th>Description</th></tr></thead> 21 21 <tbody> 22 22 <tr><td><a href="/docs/search">Search</a></td> 23 - <td>FTS5 keyword search over indexed documents</td></tr> 23 + <td>PostgreSQL keyword search over indexed documents</td></tr> 24 24 <tr><td><a href="/docs/documents">Documents</a></td> 25 25 <td>Fetch a single indexed document by stable ID</td></tr> 26 26 <tr><td><a href="/docs/actors">Actors & Repos</a></td>

+7 -8

packages/api/internal/view/templates/docs/search.html

··· 1 1 {{define "title"}}Search — Twister API{{end}} 2 2 {{define "content"}} 3 3 <h1>Search</h1> 4 - <p>Search endpoints run FTS5 keyword queries over indexed Tangled documents. 5 - Results include highlighted body snippets and BM25 relevance scores. If 6 - Constellation is configured, repository results are enriched with star counts.</p> 4 + <p>Search endpoints run PostgreSQL full-text queries over indexed Tangled 5 + documents. Results include highlighted snippets and weighted relevance scores. 6 + If Constellation is configured, repository results are enriched with star counts.</p> 7 7 {{range .}}{{template "doc-entry" .}}{{end}} 8 8 <h2>Query syntax</h2> 9 - <p>The keyword search uses SQLite FTS5 query syntax:</p> 9 + <p>The keyword search uses PostgreSQL web-style query parsing:</p> 10 10 <table> 11 11 <thead><tr><th>Feature</th><th>Example</th></tr></thead> 12 12 <tbody> 13 - <tr><td>Boolean AND</td><td><code>go AND search</code></td></tr> 14 - <tr><td>Boolean NOT</td><td><code>rust NOT unsafe</code></td></tr> 13 + <tr><td>Simple terms</td><td><code>go search</code></td></tr> 14 + <tr><td>Exclude term</td><td><code>rust -unsafe</code></td></tr> 15 15 <tr><td>Phrase</td><td><code>"pull request"</code></td></tr> 16 - <tr><td>Prefix</td><td><code>tang*</code></td></tr> 17 - <tr><td>Field-scoped</td><td><code>title:parser</code></td></tr> 16 + <tr><td>Natural-language</td><td><code>repo issue parser</code></td></tr> 18 17 </tbody> 19 18 </table> 20 19 <h2>Errors</h2>

+14 -6

packages/api/justfile

··· 5 5 build: 6 6 CGO_ENABLED=0 go build -ldflags "{{ldflags}}" -o twister ./main.go 7 7 8 - # Run the API server. Usage: just run-api [mode], mode: local|remote (default local) 8 + # Run the API server. Usage: just run-api [mode], mode: local|remote|sqlite (default local) 9 9 run-api mode="local": 10 10 if [ "{{mode}}" = "local" ]; then \ 11 - go run -ldflags "{{ldflags}}" ./main.go api --local; \ 11 + DATABASE_URL="postgresql://localhost/${USER:-postgres}_dev?sslmode=disable" \ 12 + go run -ldflags "{{ldflags}}" ./main.go api; \ 12 13 elif [ "{{mode}}" = "remote" ]; then \ 13 14 go run -ldflags "{{ldflags}}" ./main.go api; \ 15 + elif [ "{{mode}}" = "sqlite" ]; then \ 16 + go run -ldflags "{{ldflags}}" ./main.go api --local; \ 14 17 else \ 15 - echo "invalid mode '{{mode}}' (expected local or remote)" >&2; \ 18 + echo "invalid mode '{{mode}}' (expected local, remote, or sqlite)" >&2; \ 16 19 exit 1; \ 17 20 fi 18 21 19 - # Run the indexer. Usage: just run-indexer [mode], mode: local|remote (default local) 22 + # Run the indexer. Usage: just run-indexer [mode], mode: local|remote|sqlite (default local) 20 23 run-indexer mode="local": 21 24 if [ "{{mode}}" = "local" ]; then \ 22 - go run -ldflags "{{ldflags}}" ./main.go indexer --local; \ 25 + DATABASE_URL="postgresql://localhost/${USER:-postgres}_dev?sslmode=disable" \ 26 + TAP_URL="${TAP_URL:-ws://localhost:2480/channel}" \ 27 + TAP_AUTH_PASSWORD="${TAP_AUTH_PASSWORD:-twisted-dev}" \ 28 + go run -ldflags "{{ldflags}}" ./main.go indexer; \ 23 29 elif [ "{{mode}}" = "remote" ]; then \ 24 30 go run -ldflags "{{ldflags}}" ./main.go indexer; \ 31 + elif [ "{{mode}}" = "sqlite" ]; then \ 32 + go run -ldflags "{{ldflags}}" ./main.go indexer --local; \ 25 33 else \ 26 - echo "invalid mode '{{mode}}' (expected local or remote)" >&2; \ 34 + echo "invalid mode '{{mode}}' (expected local, remote, or sqlite)" >&2; \ 27 35 exit 1; \ 28 36 fi 29 37

+17 -17

packages/api/main.go

··· 42 42 SilenceErrors: true, 43 43 } 44 44 45 - root.PersistentFlags().BoolVar(&local, "local", false, "Use a local twister-dev.db database and text logs for development") 45 + root.PersistentFlags().BoolVar(&local, "local", false, "Deprecated: use the legacy local SQLite fallback instead of the default Postgres database URL") 46 46 47 47 root.AddCommand( 48 48 newAPICmd(&local), ··· 83 83 log := observability.NewLogger(cfg) 84 84 log.Info("starting api", slog.String("service", "api"), slog.String("version", version), slog.String("addr", cfg.HTTPBindAddr)) 85 85 86 - db, err := store.Open(cfg.TursoURL, cfg.TursoToken) 86 + db, err := store.Open(cfg.DatabaseURL) 87 87 if err != nil { 88 88 return fmt.Errorf("open database: %w", err) 89 89 } 90 90 defer db.Close() 91 91 92 - if err := store.Migrate(db, cfg.TursoURL); err != nil { 92 + if err := store.Migrate(db, cfg.DatabaseURL); err != nil { 93 93 return fmt.Errorf("migrate database: %w", err) 94 94 } 95 95 96 - st := store.New(db) 97 - searchRepo := search.NewRepository(db) 96 + st := store.New(cfg.DatabaseURL, db) 97 + searchRepo := search.NewRepository(cfg.DatabaseURL, db) 98 98 99 99 constellationClient := constellation.NewClient( 100 100 constellation.WithBaseURL(cfg.ConstellationURL), ··· 141 141 return fmt.Errorf("TAP_URL is required for indexer") 142 142 } 143 143 144 - db, err := store.Open(cfg.TursoURL, cfg.TursoToken) 144 + db, err := store.Open(cfg.DatabaseURL) 145 145 if err != nil { 146 146 return fmt.Errorf("open database: %w", err) 147 147 } 148 148 defer db.Close() 149 149 150 - if err := store.Migrate(db, cfg.TursoURL); err != nil { 150 + if err := store.Migrate(db, cfg.DatabaseURL); err != nil { 151 151 return fmt.Errorf("migrate database: %w", err) 152 152 } 153 153 154 - st := store.New(db) 154 + st := store.New(cfg.DatabaseURL, db) 155 155 registry := normalize.NewRegistry() 156 156 tap := tapclient.New(cfg.TapURL, cfg.TapAuthPassword, log) 157 157 runner := ingest.NewRunner(st, registry, tap, cfg.IndexedCollections, log) ··· 225 225 return fmt.Errorf("TAP_URL is required for backfill") 226 226 } 227 227 228 - db, err := store.Open(cfg.TursoURL, cfg.TursoToken) 228 + db, err := store.Open(cfg.DatabaseURL) 229 229 if err != nil { 230 230 return fmt.Errorf("open database: %w", err) 231 231 } 232 232 defer db.Close() 233 233 234 - if err := store.Migrate(db, cfg.TursoURL); err != nil { 234 + if err := store.Migrate(db, cfg.DatabaseURL); err != nil { 235 235 return fmt.Errorf("migrate database: %w", err) 236 236 } 237 237 ··· 247 247 ) 248 248 249 249 runner := backfill.NewRunner( 250 - store.New(db), 250 + store.New(cfg.DatabaseURL, db), 251 251 tapAdmin, 252 252 xrpcClient, 253 253 log, ··· 293 293 log := observability.NewLogger(cfg) 294 294 log.Info("starting reindex", slog.String("service", "reindex"), slog.String("version", version)) 295 295 296 - db, err := store.Open(cfg.TursoURL, cfg.TursoToken) 296 + db, err := store.Open(cfg.DatabaseURL) 297 297 if err != nil { 298 298 return fmt.Errorf("open database: %w", err) 299 299 } 300 300 defer db.Close() 301 301 302 - if err := store.Migrate(db, cfg.TursoURL); err != nil { 302 + if err := store.Migrate(db, cfg.DatabaseURL); err != nil { 303 303 return fmt.Errorf("migrate database: %w", err) 304 304 } 305 305 306 306 ctx, cancel := baseContext() 307 307 defer cancel() 308 308 309 - runner := reindex.New(store.New(db), log) 309 + runner := reindex.New(store.New(cfg.DatabaseURL, db), log) 310 310 result, err := runner.Run(ctx, opts) 311 311 if result != nil { 312 312 log.Info("reindex finished", ··· 341 341 log := observability.NewLogger(cfg) 342 342 log.Info("starting enrich", slog.String("service", "enrich"), slog.String("version", version)) 343 343 344 - db, err := store.Open(cfg.TursoURL, cfg.TursoToken) 344 + db, err := store.Open(cfg.DatabaseURL) 345 345 if err != nil { 346 346 return fmt.Errorf("open database: %w", err) 347 347 } 348 348 defer db.Close() 349 349 350 - if err := store.Migrate(db, cfg.TursoURL); err != nil { 350 + if err := store.Migrate(db, cfg.DatabaseURL); err != nil { 351 351 return fmt.Errorf("migrate database: %w", err) 352 352 } 353 353 ··· 360 360 ctx, cancel := baseContext() 361 361 defer cancel() 362 362 363 - runner := enrich.New(store.New(db), xrpcClient, log) 363 + runner := enrich.New(store.New(cfg.DatabaseURL, db), xrpcClient, log) 364 364 result, err := runner.Run(ctx, opts) 365 365 if result != nil { 366 366 log.Info("enrich finished",