···11# Twisted Documentation
2233-Documentation is organized by project:
33+## Reference
4455-- [`app/`](app/) for the Ionic/Vue client
66-- [`api/`](api/) for the Go Tap/index/search service
55+Completed work — architecture, APIs, and data models as built.
7688-## Quick Links
77+- [`reference/api.md`](reference/api.md) — Go search API service
88+- [`reference/app.md`](reference/app.md) — Ionic Vue mobile app
99+- [`reference/lexicons.md`](reference/lexicons.md) — Tangled AT Protocol record types
9101010-- App spec index: [`app/specs/README.md`](app/specs/README.md)
1111-- App task index: [`app/tasks/phase-6.md`](app/tasks/phase-6.md)
1212-- API spec index: [`api/specs/README.md`](api/specs/README.md)
1313-- API task index: [`api/tasks/README.md`](api/tasks/README.md)
1111+## Specs
1212+1313+Forward-looking designs for remaining work.
1414+1515+- [`specs/data-sources.md`](specs/data-sources.md) — Constellation, Tangled XRPC, Tap, AT Protocol, Bluesky OAuth
1616+- [`specs/search.md`](specs/search.md) — Keyword, semantic, and hybrid search
1717+- [`specs/app-features.md`](specs/app-features.md) — Remaining mobile app features
1818+1919+## Roadmap
2020+2121+- [`roadmap.md`](roadmap.md) — All remaining milestones and tasks
-13
docs/TODO.md
···11----
22-title: To-Dos
33-updated: 2026-03-23
44----
55-66-A catch-all for ideas, issues/bugs, and future work that doesn't fit into the current specs or tasks. This is a "parking lot."
77-88-## App
99-1010-- Repo stars, forks, etc. are not properly parsed from JSON.
1111-- ATOM/RSS feed link for repos: (`tangled.org/{did}/{repo}/feed.atom`)
1212-1313-## API
-100
docs/api/deploy.md
···11----
22-title: "Deployment Guide"
33-updated: 2026-03-23
44----
55-66-# Railway Deployment Guide
77-88-Deploy the Twister API and indexer as Railway services alongside the existing Tap instance.
99-1010-## Prerequisites
1111-1212-- Railway project with Tap already deployed
1313-- Turso database created with auth token
1414-- GitHub repository connected to Railway
1515-1616-## Service Layout
1717-1818-| Service | Start Command | Health Check | Public | Port |
1919-| ------- | ----------------- | -------------- | ------ | ---- |
2020-| tap | (pre-existing) | `GET /health` | no | — |
2121-| api | `twister api` | `GET /healthz` | yes | 8080 |
2222-| indexer | `twister indexer` | `GET /health` | no | 9090 |
2323-2424-All services use the same Docker image. Railway overrides `CMD` with the per-service start command.
2525-2626-## Step 1 — Create Services
2727-2828-In the Railway dashboard, create two new services from the same GitHub repo:
2929-3030-1. **api** — set start command to `twister api`
3131-2. **indexer** — set start command to `twister indexer`
3232-3333-Both services build from `packages/api/Dockerfile`.
3434-3535-## Step 2 — Set Environment Variables
3636-3737-### Shared (set on both services)
3838-3939-```sh
4040-TURSO_DATABASE_URL=libsql://twister-prod-<org>.turso.io
4141-TURSO_AUTH_TOKEN=<turso-jwt>
4242-LOG_LEVEL=info
4343-LOG_FORMAT=json
4444-```
4545-4646-### API only
4747-4848-```sh
4949-HTTP_BIND_ADDR=:8080
5050-SEARCH_DEFAULT_LIMIT=20
5151-SEARCH_MAX_LIMIT=100
5252-```
5353-5454-### Indexer only
5555-5656-```sh
5757-TAP_URL=wss://${{tap.RAILWAY_PRIVATE_DOMAIN}}/channel
5858-TAP_AUTH_PASSWORD=<tap-admin-password>
5959-INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile,sh.tangled.repo.issue.comment,sh.tangled.repo.pull.comment,sh.tangled.repo.issue.state,sh.tangled.repo.pull.status,sh.tangled.feed.star
6060-INDEXER_HEALTH_ADDR=:9090
6161-```
6262-6363-Use `${{tap.RAILWAY_PRIVATE_DOMAIN}}` to reference Tap's internal hostname. This keeps traffic on Railway's private network.
6464-6565-## Step 3 — Configure Health Checks
6666-6767-In the Railway dashboard, configure per-service:
6868-6969-- **api**: HTTP health check on path `/healthz`, port `8080`
7070-- **indexer**: HTTP health check on path `/health`, port `9090`
7171-7272-Railway uses these to gate deployment rollouts and restart unhealthy containers.
7373-7474-## Step 4 — Configure Autodeploy
7575-7676-Connect the GitHub repository in the Railway dashboard. Railway will build and deploy on every push to the configured branch.
7777-7878-The Dockerfile uses multi-stage builds with `CGO_ENABLED=0` for a static binary on Alpine.
7979-8080-## Step 5 — Deploy and Verify
8181-8282-After the first deploy:
8383-8484-1. Confirm API is healthy: `curl https://<api-domain>/healthz`
8585-2. Confirm API readiness: `curl https://<api-domain>/readyz`
8686-3. Check indexer health in Railway logs (health check on `:9090/health`)
8787-8888-## Step 6 — Bootstrap Content
8989-9090-Run graph backfill to populate initial content from seed users:
9191-9292-```bash
9393-twister backfill --seeds=docs/api/seeds.txt --max-hops=2
9494-```
9595-9696-Wait for Tap to finish historical sync, then verify search returns results:
9797-9898-```bash
9999-curl "https://<api-domain>/search?q=tangled"
100100-```
-9
docs/api/seeds.txt
···11-# Example seed handles for Twister graph backfill
22-# One DID or handle per line. Comments and blank lines are ignored.
33-44-anirudh.fi
55-atprotocol.dev
66-zzstoatzz.io
77-oppi.li
88-desertthunder.dev
99-tangled.org
-313
docs/api/specs/01-architecture.md
···11----
22-title: "Spec 01 — Architecture"
33-updated: 2026-03-22
44----
55-66-## 1. Purpose
77-88-Build a Go-based search service for Tangled content on AT Protocol that:
99-1010-- ingests Tangled records through **Tap** (already deployed on Railway)
1111-- denormalizes them into internal search documents
1212-- indexes them in **Turso/libSQL**
1313-- exposes a search API with **keyword**, **semantic**, and **hybrid** retrieval modes
1414-- exposes index-backed summary APIs for data the public Tangled APIs do not answer efficiently, such as followers
1515-1616-## 2. Functional Goals
1717-1818-The system shall:
1919-2020-- index Tangled-specific ATProto collections under the `sh.tangled.*` namespace
2121-- support initial backfill and continuous incremental sync via Tap
2222-- support lexical retrieval using Turso's Tantivy-backed FTS
2323-- support semantic retrieval using vector embeddings
2424-- support hybrid ranking combining lexical and semantic signals
2525-- expose stable HTTP APIs for search, document lookup, and graph/profile summaries
2626-- support deployment on **Railway**
2727-2828-## 3. Non-Functional Goals
2929-3030-The system shall prioritize:
3131-3232-- **correctness of sync** — cursors never advance ahead of committed data
3333-- **operational simplicity** — single binary, subcommand-driven
3434-- **incremental delivery** — keyword search ships before embeddings
3535-- **small deployable services** — process groups, not microservices
3636-- **reindexability** — any document or collection can be re-normalized and re-indexed
3737-- **low coupling** — sync, indexing, and serving are independent concerns
3838-3939-## 4. Out of Scope (v1)
4040-4141-- code-aware symbol search
4242-- sourcegraph-style structural search
4343-- personalized ranking
4444-- access control beyond public/private visibility flags in indexed records
4545-- full analytics pipeline
4646-- custom ANN infrastructure outside Turso/libSQL
4747-4848-## 5. Design Principles
4949-5050-1. **Tap owns synchronization correctness.** The application does not consume the raw firehose. Tap handles connection, cryptographic verification, backfill, and filtering.
5151-5252-2. **The indexer owns denormalization.** Raw ATProto records are never queried directly by the public API.
5353-5454-3. **The public API serves denormalized projections.** Search ranking and graph summaries depend on the indexed document model, not transport.
5555-5656-4. **Keyword search is the baseline.** Semantic and hybrid search are layered on top.
5757-5858-5. **Embeddings are asynchronous.** Ingestion is never blocked on vector generation unless explicitly configured.
5959-6060-6. **Twister complements public Tangled APIs.** Repo detail stays on knots/PDSes; the index adds discovery and cross-network summaries.
6161-6262-## 6. External Systems
6363-6464-- **AT Protocol network** — source of all Tangled content
6565-- **Tap** — filtered event delivery from the AT Protocol firehose (deployed on Railway)
6666-- **Turso/libSQL** — relational storage, Tantivy-backed FTS, and native vector search
6767-- **Ollama** — local embedding model server (nomic-embed-text or EmbeddingGemma); deployed as a Railway sidecar service
6868-- **Railway** — deployment platform for Twister services, Tap, and Ollama
6969-7070-## 7. Architecture Summary
7171-7272-```text
7373-ATProto Firehose / PDS
7474- │
7575- ▼
7676- Tap (Railway)
7777- │ WebSocket / webhook JSON events
7878- ▼
7979- Go Indexer Service
8080- ├─ decode Tap events
8181- ├─ normalize records → documents
8282- ├─ upsert documents
8383- ├─ schedule embeddings
8484- └─ persist sync cursor
8585- │
8686- ▼
8787- Turso/libSQL
8888- ├─ documents table
8989- ├─ document_embeddings table
9090- ├─ FTS index (Tantivy-backed)
9191- ├─ vector index (DiskANN)
9292- └─ sync_state table
9393- │
9494- ▼
9595- Go Search API
9696- ├─ keyword search (fts_match / fts_score)
9797- ├─ semantic search (vector_top_k)
9898- ├─ hybrid search (weighted merge)
9999- ├─ profile and graph summaries
100100- └─ document fetch
101101-```
102102-103103-## 8. Runtime Units
104104-105105-| Unit | Role | Deployment |
106106-| -------------- | -------------------------------------------- | -------------------------- |
107107-| `api` | HTTP search, graph summary, and document API | Railway service (public) |
108108-| `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) |
109109-| `embed-worker` | Async embedding generation via Ollama | Optional Railway service |
110110-| `ollama` | Local embedding model server | Railway service (internal) |
111111-| `tap` | ATProto sync | Railway (already deployed) |
112112-113113-## 9. Repository Structure
114114-115115-```text
116116-main.go
117117-118118-internal/
119119- api/ # HTTP handlers, middleware, routes
120120- config/ # Config struct, env parsing
121121- embed/ # Embedding provider abstraction, worker
122122- index/ # FTS and vector index management
123123- ingest/ # Tap event consumer, ingestion loop
124124- normalize/ # Per-collection record → document adapters
125125- observability/# Structured logging, metrics
126126- ranking/ # Score normalization, hybrid merge
127127- search/ # Search orchestration (keyword, semantic, hybrid)
128128- store/ # DB access layer, migrations, domain types
129129- tapclient/ # Tap WebSocket/webhook client
130130-```
131131-132132-## 10. Binary Subcommands
133133-134134-```bash
135135-twister api # Start HTTP search API
136136-twister indexer # Start Tap consumer / indexer
137137-twister embed-worker # Start async embedding worker
138138-twister reindex # Re-normalize and upsert documents
139139-twister reembed # Re-generate embeddings
140140-twister backfill # Bootstrap index from seed users
141141-twister healthcheck # One-shot health probe
142142-```
143143-144144-## 11. Technology Choices
145145-146146-### Embedding: Ollama (self-hosted)
147147-148148-Embeddings are generated locally via Ollama rather than an external API service. This eliminates per-token costs, external service dependencies, and data egress concerns.
149149-150150-**Recommended models (in order of preference):**
151151-152152-| Model | Parameters | Dimensions | Quantized Size | Notes |
153153-|-------|-----------|------------|----------------|-------|
154154-| nomic-embed-text-v1.5 | 137M | 768 (Matryoshka: 64–768) | ~262 MB (F16) | 8192 context, battle-tested, Railway template exists |
155155-| EmbeddingGemma | 308M | 768 | <200 MB (quantized) | Best-in-class MTEB for size, released Sept 2025 |
156156-| all-minilm | 23M | 384 | ~46 MB | Budget option, lower quality |
157157-158158-**Go integration:** Use the official Ollama Go client (`github.com/ollama/ollama/api`) with the `Embed()` method. The embed-worker calls Ollama over Railway's internal network (`ollama.railway.internal:11434`).
159159-160160-**Railway deployment:** Ollama runs as a separate Railway service (~1–2 GB RAM, 1–2 vCPU, ~$10–30/mo). The nomic-embed Railway template provides a proven starting point. No cold starts on always-on services; model loads in 2–10 seconds on first request after deploy.
161161-162162-### Language: Go
163163-164164-Go is the implementation language for the API server, indexer, embedding worker, and CLI commands. Rationale: straightforward long-running services, excellent HTTP support, good concurrency model, small container footprint.
165165-166166-### Sync Layer: Tap
167167-168168-Tap is the only supported sync source in v1. It handles firehose connection, cryptographic verification, backfill, and filtering, then delivers simple JSON events via WebSocket or webhook.
169169-170170-**Tap is already deployed on Railway.** Twister connects to it as a WebSocket client.
171171-172172-#### Tap Capabilities
173173-174174-- Validates repo structure, MST integrity, and identity signatures
175175-- Automatic backfill fetches full repo history from PDS when repos are added
176176-- Filtered output by DID list, collection, or full network mode
177177-- Ordering guarantees: historical events (`live: false`) delivered before live events (`live: true`)
178178-179179-#### Tap Delivery Modes
180180-181181-| Mode | Config | Behavior |
182182-| -------------------------- | ----------------------- | ------------------------------------------------- |
183183-| WebSocket + acks (default) | — | Client acks each event; no data loss |
184184-| Fire-and-forget | `TAP_DISABLE_ACKS=true` | Events marked acked on receipt; simpler but lossy |
185185-| Webhook | `TAP_WEBHOOK_URL=...` | Events POSTed as JSON; acked on HTTP 200 |
186186-187187-#### Tap API Endpoints (reference)
188188-189189-| Endpoint | Method | Purpose |
190190-| --------------------- | ------ | ------------------------------------- |
191191-| `/health` | GET | Health check |
192192-| `/channel` | WS | WebSocket event stream |
193193-| `/repos/add` | POST | Add DIDs to track |
194194-| `/repos/remove` | POST | Stop tracking a repo |
195195-| `/info/:did` | GET | Repo state, rev, record count, errors |
196196-| `/stats/repo-count` | GET | Total tracked repos |
197197-| `/stats/record-count` | GET | Total tracked records |
198198-| `/stats/cursors` | GET | Firehose and list repos cursors |
199199-200200-#### Key Tap Configuration
201201-202202-| Variable | Default | Purpose |
203203-| ------------------------ | ------- | ---------------------------------------------------------------------------------- |
204204-| `TAP_SIGNAL_COLLECTION` | — | Auto-track repos with records in this collection |
205205-| `TAP_COLLECTION_FILTERS` | — | Comma-separated collection filters (e.g., `sh.tangled.repo,sh.tangled.repo.issue`) |
206206-| `TAP_ADMIN_PASSWORD` | — | Basic auth for API access |
207207-| `TAP_DISABLE_ACKS` | `false` | Fire-and-forget mode |
208208-| `TAP_WEBHOOK_URL` | — | Webhook delivery URL |
209209-210210-### Storage and Search: Turso/libSQL
211211-212212-Turso/libSQL is used for relational metadata storage, Tantivy-backed full-text search, and native vector search.
213213-214214-#### Go SDK Options
215215-216216-| Package | CGo | Embedded Replicas | Remote |
217217-| -------------------------------------------------- | --- | ----------------- | ------ |
218218-| `github.com/tursodatabase/go-libsql` | Yes | Yes | Yes |
219219-| `github.com/tursodatabase/libsql-client-go/libsql` | No | No | Yes |
220220-221221-Both register as `database/sql` drivers under `"libsql"`. They cannot be imported in the same binary.
222222-223223-**Recommendation:** Use `libsql-client-go` (pure Go, remote-only) unless embedded replicas are needed for local read performance.
224224-225225-#### Connection Patterns
226226-227227-```go
228228-// Remote only (pure Go, no CGo)
229229-import _ "github.com/tursodatabase/libsql-client-go/libsql"
230230-db, err := sql.Open("libsql", "libsql://your-db.turso.io?authToken=TOKEN")
231231-232232-// Embedded replica (CGo required)
233233-import "github.com/tursodatabase/go-libsql"
234234-connector, err := libsql.NewEmbeddedReplicaConnector(
235235- "local.db", "libsql://your-db.turso.io",
236236- libsql.WithAuthToken("TOKEN"),
237237- libsql.WithSyncInterval(time.Minute),
238238-)
239239-db := sql.OpenDB(connector)
240240-```
241241-242242-#### Full-Text Search (Tantivy-backed)
243243-244244-Turso FTS is **not** standard SQLite FTS5. It uses Tantivy under the hood.
245245-246246-```sql
247247--- Create FTS index with per-column tokenizers and weights
248248-CREATE INDEX idx_docs_fts ON documents USING fts (
249249- title WITH tokenizer=default,
250250- body WITH tokenizer=default,
251251- summary WITH tokenizer=default,
252252- repo_name WITH tokenizer=simple,
253253- author_handle WITH tokenizer=raw
254254-) WITH (weights='title=3.0,repo_name=2.5,author_handle=2.0,summary=1.5,body=1.0');
255255-256256--- Filter by match
257257-SELECT id, title FROM documents
258258-WHERE fts_match(title, body, summary, repo_name, author_handle, 'search query');
259259-260260--- BM25 scoring
261261-SELECT id, title, fts_score(title, body, summary, repo_name, author_handle, 'search query') AS score
262262-FROM documents
263263-ORDER BY score DESC;
264264-265265--- Highlighting
266266-SELECT fts_highlight(title, '<b>', '</b>', 'search query') AS highlighted
267267-FROM documents;
268268-```
269269-270270-**Available tokenizers:** `default` (Unicode-aware), `raw` (exact match), `simple` (whitespace+punctuation), `whitespace`, `ngram` (2-3 char n-grams).
271271-272272-**Query syntax (Tantivy):** `database AND search`, `database NOT nosql`, `"exact phrase"`, `data*` (prefix), `title:database` (field-specific), `title:database^2` (boosting).
273273-274274-**Limitations:** No snippet function (use highlighting). No automatic segment merging (manual `OPTIMIZE INDEX` required).
275275-No read-your-writes within a transaction. No MATCH operator (use `fts_match()` function).
276276-277277-#### Vector Search
278278-279279-```sql
280280--- Vector column type
281281-embedding F32_BLOB(768)
282282-283283--- Insert
284284-INSERT INTO document_embeddings (document_id, embedding, ...)
285285-VALUES (?, vector32(?), ...); -- ? is JSON array '[0.1, 0.2, ...]'
286286-287287--- Brute-force similarity search
288288-SELECT d.id, vector_distance_cos(e.embedding, vector32(?)) AS distance
289289-FROM documents d
290290-JOIN document_embeddings e ON d.id = e.document_id
291291-ORDER BY distance ASC LIMIT 20;
292292-293293--- Create ANN index (DiskANN)
294294-CREATE INDEX idx_embeddings ON document_embeddings(
295295- libsql_vector_idx(embedding, 'metric=cosine')
296296-);
297297-298298--- ANN search via index
299299-SELECT d.id, d.title
300300-FROM vector_top_k('idx_embeddings', vector32(?), 20) AS v
301301-JOIN document_embeddings e ON e.rowid = v.id
302302-JOIN documents d ON d.id = e.document_id;
303303-```
304304-305305-**Vector types:** `F32_BLOB` (recommended), `F16_BLOB`, `F64_BLOB`, `F8_BLOB`, `F1BIT_BLOB`.
306306-307307-**Distance functions:** `vector_distance_cos` (cosine), `vector_distance_l2` (Euclidean).
308308-309309-**Max dimensions:** 65,536. Dimension is fixed at table creation.
310310-311311-### Deployment: Railway
312312-313313-Railway is the deployment platform. It supports health checks, autodeploy, per-service scaling, and internal networking. Tap is already deployed here. Twister deploys as separate Railway services (api, indexer, embed-worker) within the same project.
-192
docs/api/specs/02-tangled-lexicons.md
···11----
22-title: "Spec 02 — Tangled Lexicons"
33-updated: 2026-03-22
44-source: https://github.com/mary-ext/atcute/tree/trunk/packages/definitions/tangled/lexicons/sh/tangled
55----
66-77-All Tangled records use the `sh.tangled.*` namespace. Records use TID keys unless noted otherwise.
88-99-## 1. Searchable Record Types
1010-1111-These are the primary records Twister indexes for search.
1212-1313-### sh.tangled.repo
1414-1515-Repository metadata. Key: `tid`.
1616-1717-| Field | Type | Required | Description |
1818-| ------------- | -------- | -------- | ---------------------------------------------- |
1919-| `name` | string | yes | Repository name |
2020-| `knot` | string | yes | Knot (hosting node) where the repo was created |
2121-| `spindle` | string | no | CI runner for jobs |
2222-| `description` | string | no | 1–140 graphemes |
2323-| `website` | uri | no | Related URI |
2424-| `topics` | string[] | no | Up to 50 topic tags, each 1–50 chars |
2525-| `source` | uri | no | Upstream source |
2626-| `labels` | at-uri[] | no | Label definitions this repo subscribes to |
2727-| `createdAt` | datetime | yes | |
2828-2929-### sh.tangled.repo.issue
3030-3131-Issue on a repository. Key: `tid`.
3232-3333-| Field | Type | Required | Description |
3434-| ------------ | -------- | -------- | -------------------------------- |
3535-| `repo` | at-uri | yes | AT-URI of the parent repo record |
3636-| `title` | string | yes | Issue title |
3737-| `body` | string | no | Issue body (markdown) |
3838-| `createdAt` | datetime | yes | |
3939-| `mentions` | did[] | no | Mentioned users |
4040-| `references` | at-uri[] | no | Referenced records |
4141-4242-### sh.tangled.repo.pull
4343-4444-Pull request. Key: `tid`.
4545-4646-| Field | Type | Required | Description |
4747-| ------------ | -------- | -------- | -------------------------------------------------- |
4848-| `target` | object | yes | `{repo: at-uri, branch: string}` |
4949-| `title` | string | yes | PR title |
5050-| `body` | string | no | PR description (markdown) |
5151-| `patchBlob` | blob | yes | Patch content (`text/x-patch`) |
5252-| `source` | object | no | `{branch: string, sha: string(40), repo?: at-uri}` |
5353-| `createdAt` | datetime | yes | |
5454-| `mentions` | did[] | no | Mentioned users |
5555-| `references` | at-uri[] | no | Referenced records |
5656-5757-### sh.tangled.string
5858-5959-Code snippet / gist. Key: `tid`.
6060-6161-| Field | Type | Required | Description |
6262-| ------------- | -------- | -------- | ------------------- |
6363-| `filename` | string | yes | 1–140 graphemes |
6464-| `description` | string | yes | Up to 280 graphemes |
6565-| `createdAt` | datetime | yes | |
6666-| `contents` | string | yes | Snippet content |
6767-6868-### sh.tangled.actor.profile
6969-7070-User profile. Key: `literal:self` (singleton per account).
7171-7272-| Field | Type | Required | Description |
7373-| -------------------- | -------- | -------- | ---------------------------- |
7474-| `avatar` | blob | no | PNG/JPEG, max 1MB |
7575-| `description` | string | no | Bio, up to 256 graphemes |
7676-| `links` | uri[] | no | Up to 5 social/website links |
7777-| `stats` | string[] | no | Up to 2 vanity stat types |
7878-| `bluesky` | boolean | yes | Show Bluesky link |
7979-| `location` | string | no | Up to 40 graphemes |
8080-| `pinnedRepositories` | at-uri[] | no | Up to 6 pinned repos |
8181-| `pronouns` | string | no | Up to 40 chars |
8282-8383-## 2. Interaction Record Types
8484-8585-These records represent social interactions. They may be indexed for counts/signals but are lower priority for text search.
8686-8787-### sh.tangled.feed.star
8888-8989-Star/favorite on a record. Key: `tid`.
9090-9191-| Field | Type | Required |
9292-| ----------- | -------- | -------- |
9393-| `subject` | at-uri | yes |
9494-| `createdAt` | datetime | yes |
9595-9696-### sh.tangled.feed.reaction
9797-9898-Emoji reaction on a record. Key: `tid`.
9999-100100-| Field | Type | Required | Description |
101101-| ----------- | -------- | -------- | ------------------------------- |
102102-| `subject` | at-uri | yes | |
103103-| `reaction` | string | yes | One of: 👍 👎 😆 🎉 🫤 ❤️ 🚀 👀 |
104104-| `createdAt` | datetime | yes | |
105105-106106-### sh.tangled.graph.follow
107107-108108-Follow a user. Key: `tid`.
109109-110110-| Field | Type | Required |
111111-| ----------- | -------- | -------- |
112112-| `subject` | did | yes |
113113-| `createdAt` | datetime | yes |
114114-115115-## 3. State Record Types
116116-117117-These records track mutable state of issues and PRs.
118118-119119-### sh.tangled.repo.issue.state
120120-121121-| Field | Type | Required | Description |
122122-| ------- | ------ | -------- | -------------------------------------------------------------------------- |
123123-| `issue` | at-uri | yes | |
124124-| `state` | string | yes | `sh.tangled.repo.issue.state.open` or `sh.tangled.repo.issue.state.closed` |
125125-126126-### sh.tangled.repo.pull.status
127127-128128-| Field | Type | Required | Description |
129129-| -------- | ------ | -------- | ----------------------------------------------------------- |
130130-| `pull` | at-uri | yes | |
131131-| `status` | string | yes | `sh.tangled.repo.pull.status.open`, `.closed`, or `.merged` |
132132-133133-## 4. Comment Record Types
134134-135135-### sh.tangled.repo.issue.comment
136136-137137-| Field | Type | Required | Description |
138138-| ------------ | -------- | -------- | ------------------------------ |
139139-| `issue` | at-uri | yes | Parent issue |
140140-| `body` | string | yes | Comment body |
141141-| `createdAt` | datetime | yes | |
142142-| `replyTo` | at-uri | no | Parent comment (for threading) |
143143-| `mentions` | did[] | no | |
144144-| `references` | at-uri[] | no | |
145145-146146-### sh.tangled.repo.pull.comment
147147-148148-| Field | Type | Required | Description |
149149-| ------------ | -------- | -------- | ------------ |
150150-| `pull` | at-uri | yes | Parent PR |
151151-| `body` | string | yes | Comment body |
152152-| `createdAt` | datetime | yes | |
153153-| `mentions` | did[] | no | |
154154-| `references` | at-uri[] | no | |
155155-156156-## 5. Infrastructure Record Types
157157-158158-These are not indexed for search but may be consumed for operational context.
159159-160160-| Collection | Description |
161161-| ----------------------------- | ---------------------------------------------------- |
162162-| `sh.tangled.label.definition` | Label definitions with name, valueType, scope, color |
163163-| `sh.tangled.label.op` | Label application operations |
164164-| `sh.tangled.git.refUpdate` | Git reference update events |
165165-| `sh.tangled.knot.member` | Knot membership |
166166-| `sh.tangled.spindle.member` | Spindle (CI runner) membership |
167167-| `sh.tangled.pipeline.status` | CI pipeline status |
168168-169169-## 6. Collection Priority for v1 Indexing
170170-171171-| Priority | Collection | Rationale |
172172-| -------- | ------------------------------- | ------------------------------------ |
173173-| P0 | `sh.tangled.repo` | Core searchable content |
174174-| P0 | `sh.tangled.repo.issue` | High-signal text content |
175175-| P0 | `sh.tangled.repo.pull` | High-signal text content |
176176-| P1 | `sh.tangled.string` | Searchable code snippets |
177177-| P1 | `sh.tangled.actor.profile` | User/org discovery |
178178-| P2 | `sh.tangled.repo.issue.comment` | Body text, high volume |
179179-| P2 | `sh.tangled.repo.pull.comment` | Body text, high volume |
180180-| P2 | `sh.tangled.repo.issue.state` | State for filtering, not text search |
181181-| P2 | `sh.tangled.repo.pull.status` | State for filtering, not text search |
182182-| P3 | `sh.tangled.feed.star` | Ranking signal (star count) |
183183-| P3 | `sh.tangled.feed.reaction` | Ranking signal |
184184-| P3 | `sh.tangled.graph.follow` | Ranking signal |
185185-186186-### Tap Collection Filter for v1
187187-188188-```sh
189189-TAP_COLLECTION_FILTERS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.issue.comment,sh.tangled.repo.issue.state,sh.tangled.repo.pull,sh.tangled.repo.pull.comment,sh.tangled.repo.pull.status,sh.tangled.string,sh.tangled.actor.profile,sh.tangled.feed.star
190190-191191-# or sh.tangled.*
192192-```
-184
docs/api/specs/03-data-model.md
···11----
22-title: "Spec 03 — Data Model"
33-updated: 2026-03-22
44----
55-66-## 1. Search Document
77-88-A **search document** is the internal denormalized representation used for retrieval. It is derived from one or more ATProto records via normalization.
99-1010-### Stable Identifier
1111-1212-```sh
1313-id = did + "|" + collection + "|" + rkey
1414-```
1515-1616-Example: `did:plc:abc123|sh.tangled.repo|3kb3fge5lm32x`
1717-1818-### Required Fields
1919-2020-| Field | Type | Description |
2121-| --------------- | ------- | -------------------------------------------------------------------------- |
2222-| `id` | TEXT PK | Stable composite identifier |
2323-| `did` | TEXT | Author DID |
2424-| `collection` | TEXT | ATProto collection NSID |
2525-| `rkey` | TEXT | Record key (TID) |
2626-| `at_uri` | TEXT | Full AT-URI |
2727-| `cid` | TEXT | Content identifier (hash) |
2828-| `record_type` | TEXT | Normalized type label (e.g., `repo`, `issue`, `pull`, `string`, `profile`) |
2929-| `title` | TEXT | Normalized title |
3030-| `body` | TEXT | Normalized body text |
3131-| `summary` | TEXT | Short summary / description |
3232-| `repo_did` | TEXT | DID of the repo owner (resolved from at-uri for issues/PRs) |
3333-| `repo_name` | TEXT | Repository name (resolved) |
3434-| `author_handle` | TEXT | Author handle (resolved via identity) |
3535-| `tags_json` | TEXT | JSON array of tags/topics |
3636-| `language` | TEXT | Detected or declared language |
3737-| `created_at` | TEXT | Record creation timestamp (ISO 8601) |
3838-| `updated_at` | TEXT | Last record update timestamp |
3939-| `indexed_at` | TEXT | When this document was last indexed |
4040-| `deleted_at` | TEXT | Soft-delete timestamp (tombstone) |
4141-4242-### Derived Fields (not stored in documents table)
4343-4444-| Field | Location | Description |
4545-| ---------------- | -------------------------------------- | ------------------------------ |
4646-| Embedding vector | `document_embeddings` table | F32_BLOB(N) |
4747-| FTS index | Turso FTS index | Tantivy-backed full-text index |
4848-| Star count | Aggregated from `sh.tangled.feed.star` | Ranking signal |
4949-5050-## 2. Core Documents Table
5151-5252-```sql
5353-CREATE TABLE documents (
5454- id TEXT PRIMARY KEY,
5555- did TEXT NOT NULL,
5656- collection TEXT NOT NULL,
5757- rkey TEXT NOT NULL,
5858- at_uri TEXT NOT NULL,
5959- cid TEXT NOT NULL,
6060- record_type TEXT NOT NULL,
6161- title TEXT,
6262- body TEXT,
6363- summary TEXT,
6464- repo_did TEXT,
6565- repo_name TEXT,
6666- author_handle TEXT,
6767- tags_json TEXT,
6868- language TEXT,
6969- created_at TEXT,
7070- updated_at TEXT,
7171- indexed_at TEXT NOT NULL,
7272- deleted_at TEXT
7373-);
7474-7575-CREATE INDEX idx_documents_did ON documents(did);
7676-CREATE INDEX idx_documents_collection ON documents(collection);
7777-CREATE INDEX idx_documents_record_type ON documents(record_type);
7878-CREATE INDEX idx_documents_repo_did ON documents(repo_did);
7979-CREATE INDEX idx_documents_created_at ON documents(created_at);
8080-CREATE INDEX idx_documents_deleted_at ON documents(deleted_at);
8181-```
8282-8383-## 3. FTS Index
8484-8585-```sql
8686-CREATE INDEX idx_documents_fts ON documents USING fts (
8787- title WITH tokenizer=default,
8888- body WITH tokenizer=default,
8989- summary WITH tokenizer=default,
9090- repo_name WITH tokenizer=simple,
9191- author_handle WITH tokenizer=raw,
9292- tags_json WITH tokenizer=simple
9393-) WITH (weights='title=3.0,repo_name=2.5,author_handle=2.0,summary=1.5,tags_json=1.2,body=1.0');
9494-```
9595-9696-### FTS Maintenance
9797-9898-Turso's Tantivy-backed FTS uses `NoMergePolicy` — segment count grows with writes and is never automatically compacted. This increases query fan-out over time.
9999-100100-**Required maintenance:** Run `OPTIMIZE INDEX idx_documents_fts;` periodically (e.g., daily cron or after bulk backfill). This merges segments and reclaims space.
101101-102102-**Known limitations:**
103103-- No read-your-writes within a transaction — FTS queries see a pre-commit snapshot
104104-- No snippet function (use `fts_highlight()` for highlighting)
105105-- FTS is experimental in Turso; requires the `fts` feature flag
106106-107107-## 4. Embeddings Table
108108-109109-```sql
110110-CREATE TABLE document_embeddings (
111111- document_id TEXT PRIMARY KEY REFERENCES documents(id),
112112- embedding F32_BLOB(768),
113113- embedding_model TEXT NOT NULL,
114114- embedded_at TEXT NOT NULL
115115-);
116116-117117-CREATE INDEX idx_embeddings_vec ON document_embeddings(
118118- libsql_vector_idx(embedding, 'metric=cosine')
119119-);
120120-```
121121-122122-The vector dimension (768) matches nomic-embed-text-v1.5 and EmbeddingGemma defaults. Changing models may require a new column or table migration if the dimension changes.
123123-124124-### Vector Index Tuning
125125-126126-The DiskANN index accepts tuning parameters at creation time:
127127-128128-```sql
129129-CREATE INDEX idx_embeddings_vec ON document_embeddings(
130130- libsql_vector_idx(embedding, 'metric=cosine', 'max_neighbors=50', 'search_l=200')
131131-);
132132-```
133133-134134-| Parameter | Default | Description |
135135-|-----------|---------|-------------|
136136-| `max_neighbors` | 3*sqrt(D) | Graph connectivity; higher = better recall, more storage |
137137-| `search_l` | 200 | Neighbors visited during search; higher = better recall, slower |
138138-| `insert_l` | 70 | Neighbors visited during insert |
139139-| `alpha` | 1.2 | Graph sparsity factor |
140140-| `compress_neighbors` | — | Quantize neighbor vectors for storage savings |
141141-142142-Start with defaults and tune after measuring recall on representative queries.
143143-144144-## 5. Sync State Table
145145-146146-```sql
147147-CREATE TABLE sync_state (
148148- consumer_name TEXT PRIMARY KEY,
149149- cursor TEXT NOT NULL,
150150- high_water_mark TEXT,
151151- updated_at TEXT NOT NULL
152152-);
153153-```
154154-155155-Stores the Tap event ID that has been successfully committed. On restart, the indexer resumes from this cursor.
156156-157157-## 6. Embedding Jobs Table
158158-159159-```sql
160160-CREATE TABLE embedding_jobs (
161161- document_id TEXT PRIMARY KEY REFERENCES documents(id),
162162- status TEXT NOT NULL, -- 'pending', 'processing', 'completed', 'failed'
163163- attempts INTEGER NOT NULL DEFAULT 0,
164164- last_error TEXT,
165165- scheduled_at TEXT NOT NULL,
166166- updated_at TEXT NOT NULL
167167-);
168168-169169-CREATE INDEX idx_embedding_jobs_status ON embedding_jobs(status);
170170-```
171171-172172-## 7. Issue/PR State Cache (optional)
173173-174174-To support filtering search results by issue state or PR status without joining back to the raw records:
175175-176176-```sql
177177-CREATE TABLE record_state (
178178- subject_uri TEXT PRIMARY KEY, -- at-uri of the issue or PR
179179- state TEXT NOT NULL, -- 'open', 'closed', 'merged'
180180- updated_at TEXT NOT NULL
181181-);
182182-```
183183-184184-Updated when `sh.tangled.repo.issue.state` or `sh.tangled.repo.pull.status` events are ingested.
-364
docs/api/specs/04-data-pipeline.md
···11----
22-title: "Spec 04 — Data Pipeline"
33-updated: 2026-03-22
44----
55-66-Covers the full data path: Tap event ingestion, record normalization, and failure handling.
77-88-## 1. Tap Event Format
99-1010-### Record Events
1111-1212-```json
1313-{
1414- "id": 12345,
1515- "type": "record",
1616- "record": {
1717- "live": true,
1818- "rev": "3kb3fge5lm32x",
1919- "did": "did:plc:abc123",
2020- "collection": "sh.tangled.repo",
2121- "rkey": "3kb3fge5lm32x",
2222- "action": "create",
2323- "cid": "bafyreig...",
2424- "record": {
2525- "$type": "sh.tangled.repo",
2626- "name": "my-project",
2727- "knot": "knot.tangled.org",
2828- "description": "A cool project",
2929- "topics": ["go", "search"],
3030- "createdAt": "2026-03-22T12:00:00.000Z"
3131- }
3232- }
3333-}
3434-```
3535-3636-Key fields:
3737-3838-- `id` — monotonic event ID, used as cursor
3939-- `type` — `"record"` or `"identity"`
4040-- `record.live` — `true` for real-time events, `false` for backfill
4141-- `record.action` — `"create"`, `"update"`, or `"delete"`
4242-- `record.did` — author DID
4343-- `record.collection` — ATProto collection NSID
4444-- `record.rkey` — record key
4545-- `record.cid` — content identifier
4646-- `record.record` — the full ATProto record payload (absent on delete)
4747-4848-### Identity Events
4949-5050-```json
5151-{
5252- "id": 12346,
5353- "type": "identity",
5454- "identity": {
5555- "did": "did:plc:abc123",
5656- "handle": "alice.tangled.org",
5757- "isActive": true,
5858- "status": "active"
5959- }
6060-}
6161-```
6262-6363-Identity events are always delivered for tracked repos, regardless of collection filters.
6464-6565-## 2. WebSocket Protocol
6666-6767-### Connection
6868-6969-Connect to `wss://<tap-host>/channel` (or `ws://` for local dev).
7070-7171-If `TAP_ADMIN_PASSWORD` is set, authenticate with HTTP Basic auth (`admin:<password>`).
7272-7373-### Acknowledgment Protocol
7474-7575-Default mode requires the client to ack each event by sending the event `id` back over the WebSocket. Events are retried after `TAP_RETRY_TIMEOUT` (default 60s) if unacked.
7676-7777-For simpler development, set `TAP_DISABLE_ACKS=true` on Tap for fire-and-forget delivery.
7878-7979-### Ordering Guarantees
8080-8181-Events are ordered **per-repo** (per-DID), not globally:
8282-8383-- **Historical events** (`live: false`) may be sent concurrently within a repo
8484-- **Live events** (`live: true`) are synchronization barriers — all prior events for that repo must complete before a live event is sent
8585-- No ordering guarantee across different repos
8686-8787-Example sequence for one repo: `H1, H2, L1, H3, H4, L2`
8888-8989-- H1 and H2 sent concurrently
9090-- Wait for completion, send L1 alone
9191-- Wait for L1, send H3 and H4 concurrently
9292-- Wait for completion, send L2 alone
9393-9494-### Delivery Guarantee
9595-9696-Events are delivered **at least once**. Duplicates may occur on crashes or ack timeouts. The indexer must handle idempotent upserts.
9797-9898-## 3. Ingestion Contract
9999-100100-For each event, the indexer:
101101-102102-1. Validates `type` is `"record"` (identity events are handled separately)
103103-2. Checks `record.collection` against the allowlist
104104-3. Maps `record.action` to an operation:
105105- - `create` → upsert document
106106- - `update` → upsert document
107107- - `delete` → tombstone document (`deleted_at = now`)
108108-4. Decodes `record.record` into the collection-specific struct
109109-5. Normalizes to internal `Document`
110110-6. Upserts into the documents table
111111-7. Schedules embedding job if eligible
112112-8. Persists cursor (`event.id`) **only after successful DB commit**
113113-114114-### Cursor Persistence Rules
115115-116116-- If DB commit fails → cursor does not advance → event will be retried
117117-- After successful DB writes, ack Tap first, then persist cursor for operator-visible resume
118118-- If ack fails → cursor does not advance
119119-- If ack succeeds but cursor persistence fails → retry cursor persistence until successful or process exit
120120-- If normalization fails → log error, optionally dead-letter, skip → cursor advances
121121-- If embedding scheduling fails → document remains keyword-searchable → cursor advances
122122-123123-## 4. Backfill Behavior
124124-125125-When a repo is added to Tap (via `/repos/add`, signal collection, or full network mode):
126126-127127-1. Tap fetches full repo history from PDS via `com.atproto.sync.getRepo`
128128-2. Firehose events for that repo are buffered during backfill
129129-3. Historical events (`live: false`) are delivered first
130130-4. After backfill completes, buffered live events drain
131131-5. New firehose events stream normally (`live: true`)
132132-133133-### Application-Level Backfill Support
134134-135135-The indexer also supports:
136136-137137-- Full reindex from existing corpus (re-normalize all stored documents)
138138-- Targeted reindex by collection
139139-- Targeted reindex by DID
140140-141141-These do not involve Tap — they re-process documents already in the database.
142142-143143-## 5. Normalization
144144-145145-Normalization converts heterogeneous `sh.tangled.*` records into the common `Document` shape defined in [03-data-model.md](03-data-model.md).
146146-147147-### Adapter Interface
148148-149149-Each indexed collection provides an adapter:
150150-151151-```go
152152-type RecordAdapter interface {
153153- Collection() string
154154- RecordType() string
155155- Normalize(event TapRecordEvent) (*Document, error)
156156- Searchable(record map[string]any) bool
157157-}
158158-```
159159-160160-### Per-Collection Normalization
161161-162162-#### sh.tangled.repo → `repo`
163163-164164-| Document Field | Source |
165165-| -------------- | -------------------------------- |
166166-| `title` | `record.name` |
167167-| `body` | `record.description` |
168168-| `summary` | `record.description` (truncated) |
169169-| `repo_name` | `record.name` |
170170-| `repo_did` | `event.did` |
171171-| `tags_json` | `json(record.topics)` |
172172-| `created_at` | `record.createdAt` |
173173-174174-**Searchable:** Always (unless empty name).
175175-176176-#### sh.tangled.repo.issue → `issue`
177177-178178-| Document Field | Source |
179179-| -------------- | ------------------------------------------- |
180180-| `title` | `record.title` |
181181-| `body` | `record.body` |
182182-| `summary` | First ~200 chars of `record.body` |
183183-| `repo_did` | Extracted from `record.repo` AT-URI |
184184-| `repo_name` | Resolved from repo AT-URI |
185185-| `tags_json` | `[]` (labels resolved separately if needed) |
186186-| `created_at` | `record.createdAt` |
187187-188188-**Searchable:** Always.
189189-190190-#### sh.tangled.repo.pull → `pull`
191191-192192-| Document Field | Source |
193193-| -------------- | ------------------------------------------ |
194194-| `title` | `record.title` |
195195-| `body` | `record.body` |
196196-| `summary` | First ~200 chars of `record.body` |
197197-| `repo_did` | Extracted from `record.target.repo` AT-URI |
198198-| `repo_name` | Resolved from target repo AT-URI |
199199-| `tags_json` | `[]` |
200200-| `created_at` | `record.createdAt` |
201201-202202-**Searchable:** Always.
203203-204204-#### sh.tangled.string → `string`
205205-206206-| Document Field | Source |
207207-| -------------- | -------------------- |
208208-| `title` | `record.filename` |
209209-| `body` | `record.contents` |
210210-| `summary` | `record.description` |
211211-| `repo_name` | — |
212212-| `repo_did` | — |
213213-| `tags_json` | `[]` |
214214-| `created_at` | `record.createdAt` |
215215-216216-**Searchable:** Always (content is required).
217217-218218-#### sh.tangled.actor.profile → `profile`
219219-220220-| Document Field | Source |
221221-| -------------- | ---------------------------------------------------- |
222222-| `title` | Author handle (resolved from DID) |
223223-| `body` | `record.description` |
224224-| `summary` | `record.description` (truncated) + `record.location` |
225225-| `repo_name` | — |
226226-| `repo_did` | — |
227227-| `tags_json` | `[]` |
228228-| `created_at` | — (profiles don't have createdAt) |
229229-230230-**Searchable:** If `description` is non-empty.
231231-232232-#### sh.tangled.repo.issue.comment → `issue_comment`
233233-234234-| Document Field | Source |
235235-| -------------- | ----------------------------------------- |
236236-| `title` | — (derived: "Comment on {issue title}") |
237237-| `body` | `record.body` |
238238-| `summary` | First ~200 chars of `record.body` |
239239-| `repo_did` | Resolved from `record.issue` AT-URI chain |
240240-| `repo_name` | Resolved |
241241-| `created_at` | `record.createdAt` |
242242-243243-**Searchable:** If body is non-empty.
244244-245245-#### sh.tangled.repo.pull.comment → `pull_comment`
246246-247247-Same pattern as issue comments, using `record.pull` instead of `record.issue`.
248248-249249-### State Event Handling
250250-251251-State and status records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) do **not** produce new search documents. Instead, they update the `record_state` cache table (see [03-data-model.md](03-data-model.md)).
252252-253253-### Interaction Event Handling
254254-255255-Stars (`sh.tangled.feed.star`) and reactions (`sh.tangled.feed.reaction`) do not produce search documents. They may be aggregated for ranking signals in later phases.
256256-257257-### Embedding Input Text
258258-259259-For documents eligible for embedding, compose the input as:
260260-261261-```sh
262262-{title}\n{repo_name}\n{author_handle}\n{tags}\n{summary}\n{body}
263263-```
264264-265265-Fields are joined with newlines. Empty fields are omitted.
266266-267267-### Repo Name Resolution
268268-269269-Issues, PRs, and comments reference their parent repo via AT-URI (e.g., `at://did:plc:abc/sh.tangled.repo/tid`). Resolving the repo name requires either:
270270-271271-1. Looking up the repo document in the local `documents` table
272272-2. Caching repo metadata in a lightweight lookup table
273273-274274-Option 1 is preferred for v1. If the repo document hasn't been indexed yet, `repo_name` is left empty and backfilled on the next reindex pass.
275275-276276-## 6. Identity Event Handling
277277-278278-Identity events should be used to maintain an author handle cache:
279279-280280-```sh
281281-did → handle mapping
282282-```
283283-284284-When an identity event arrives with a new handle, update `author_handle` on all documents with that DID. This ensures search by handle returns current results.
285285-286286-## 7. Repo Management
287287-288288-To add repos for tracking, POST to Tap's `/repos/add` endpoint:
289289-290290-```bash
291291-curl -u admin:PASSWORD -X POST https://tap-host/repos/add \
292292- -H "Content-Type: application/json" \
293293- -d '{"dids": ["did:plc:abc123", "did:plc:def456"]}'
294294-```
295295-296296-Alternatively, use `TAP_SIGNAL_COLLECTION=sh.tangled.repo` to auto-track any repo that has Tangled repo records.
297297-298298-## 8. Failure Handling
299299-300300-### Ingestion Failures
301301-302302-If Tap event processing fails before DB commit:
303303-304304-- Log the failure with event ID, DID, collection, rkey, and error class
305305-- Retry with exponential backoff (for transient errors like DB timeouts)
306306-- Do **not** advance cursor — the event will be re-delivered by Tap
307307-- After max retries for a persistent error, log and skip (cursor advances)
308308-309309-### Normalization Failures
310310-311311-If a record cannot be normalized:
312312-313313-- Log collection, DID, rkey, CID, and error class
314314-- Do not crash the process
315315-- Skip the event and advance cursor
316316-- Optionally insert into a `dead_letter` table for manual inspection
317317-318318-### Embedding Failures
319319-320320-If embedding generation fails:
321321-322322-- The document remains keyword-searchable
323323-- The embedding job is marked `failed` with `last_error` and incremented `attempts`
324324-- Jobs are retried with exponential backoff up to a max attempt count
325325-- After max attempts, the job enters `dead` state
326326-- The embed-worker exposes failed job count as a metric
327327-- If Ollama is unreachable (sidecar down), all pending jobs pause until connectivity is restored
328328-329329-### DB Failures
330330-331331-If Turso/libSQL is unreachable:
332332-333333-- **API** returns `503` for search endpoints; `/healthz` still returns 200 (liveness), `/readyz` returns 503
334334-- **Indexer** pauses event processing and retries DB connection with backoff; cursor does not advance
335335-- **Embed-worker** pauses job processing and retries
336336-337337-### Tap Connection Failures
338338-339339-If the WebSocket connection to Tap drops:
340340-341341-- Reconnect with exponential backoff
342342-- Resume from the last persisted cursor
343343-- Log reconnection attempts and success
344344-345345-Tap itself handles firehose reconnection independently — a Tap restart does not require indexer intervention beyond reconnecting the WebSocket.
346346-347347-### Duplicate Event Handling
348348-349349-Tap delivers events **at least once**. Duplicates are handled by:
350350-351351-- Using `id = did|collection|rkey` as the primary key
352352-- All writes are upserts (`INSERT OR REPLACE` / `ON CONFLICT ... DO UPDATE`)
353353-- CID comparison can detect true no-ops (same content) vs. actual updates
354354-355355-### Startup Recovery
356356-357357-On indexer startup:
358358-359359-1. Read `cursor` from `sync_state` table
360360-2. Connect to Tap WebSocket
361361-3. Tap replays events from the stored cursor position
362362-4. Processing resumes normally
363363-364364-If no cursor exists (first run), Tap delivers all historical events from backfill.
-306
docs/api/specs/05-search.md
···11----
22-title: "Spec 05 — Search"
33-updated: 2026-03-22
44----
55-66-Covers all search modes, the public search API contract, scoring, and filtering.
77-88-## 1. Search Modes
99-1010-| Mode | Backing | Available |
1111-| ---------- | ------------------------------------ | --------- |
1212-| `keyword` | Turso Tantivy-backed FTS | MVP |
1313-| `semantic` | Vector similarity (DiskANN index) | Phase 2 |
1414-| `hybrid` | Weighted merge of keyword + semantic | Phase 3 |
1515-1616-## 2. Keyword Search
1717-1818-### Implementation
1919-2020-Uses Turso's `fts_score()` function for BM25 ranking:
2121-2222-```sql
2323-SELECT
2424- d.id, d.title, d.summary, d.repo_name, d.author_handle,
2525- d.collection, d.record_type, d.updated_at,
2626- fts_score(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) AS score
2727-FROM documents d
2828-WHERE fts_match(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?)
2929- AND d.deleted_at IS NULL
3030-ORDER BY score DESC
3131-LIMIT ? OFFSET ?;
3232-```
3333-3434-### Field Weights
3535-3636-Configured in the FTS index definition:
3737-3838-| Field | Weight | Rationale |
3939-| --------------- | ------ | ------------------------------------ |
4040-| `title` | 3.0 | Highest signal for relevance |
4141-| `repo_name` | 2.5 | Exact repo lookups should rank first |
4242-| `author_handle` | 2.0 | Author search is common |
4343-| `summary` | 1.5 | More focused than body |
4444-| `tags_json` | 1.2 | Topic matching |
4545-| `body` | 1.0 | Baseline |
4646-4747-### Query Features
4848-4949-Tantivy query syntax is exposed to users:
5050-5151-- Boolean: `go AND search`, `rust NOT unsafe`
5252-- Phrase: `"pull request"`
5353-- Prefix: `tang*`
5454-- Field-specific: `title:parser`
5555-5656-### Snippets
5757-5858-Use `fts_highlight()` to generate highlighted snippets:
5959-6060-```sql
6161-fts_highlight(d.body, '<mark>', '</mark>', ?) AS body_snippet
6262-```
6363-6464-### FTS Operational Notes
6565-6666-- **Segment merging:** Turso FTS uses Tantivy's `NoMergePolicy`. Run `OPTIMIZE INDEX idx_documents_fts;` after bulk writes (backfill) and periodically in production to keep query performance stable.
6767-- **Read-your-writes:** FTS queries within the same transaction see a pre-commit snapshot. If a document is written and immediately searched in the same transaction, FTS will not find it. The indexer and API are separate processes, so this is not a concern in normal operation.
6868-- **Feature flag:** Turso FTS requires the `fts` feature flag to be enabled on the database.
6969-7070-## 3. Semantic Search
7171-7272-### Query Flow
7373-7474-1. Convert user query text to embedding via Ollama (self-hosted)
7575-2. Query `vector_top_k` for nearest neighbors
7676-3. Join back to `documents` to get metadata
7777-4. Filter out deleted/hidden documents
7878-5. Return results with distance as score
7979-8080-```sql
8181-SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle,
8282- d.collection, d.record_type, d.updated_at
8383-FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v
8484-JOIN document_embeddings e ON e.rowid = v.id
8585-JOIN documents d ON d.id = e.document_id
8686-WHERE d.deleted_at IS NULL;
8787-```
8888-8989-### Score Normalization
9090-9191-Cosine distance ranges from 0 (identical) to 2 (opposite). Normalize to a 0–1 relevance score:
9292-9393-```text
9494-semantic_score = 1.0 - (distance / 2.0)
9595-```
9696-9797-## 4. Hybrid Search
9898-9999-### v1: Weighted Score Blending
100100-101101-```text
102102-hybrid_score = 0.65 * keyword_score_normalized + 0.35 * semantic_score_normalized
103103-```
104104-105105-### Score Normalization for Blending
106106-107107-Keyword (BM25) scores are unbounded. Normalize using min-max within the result set:
108108-109109-```text
110110-keyword_normalized = (score - min_score) / (max_score - min_score)
111111-```
112112-113113-Semantic scores are already bounded after the distance-to-relevance conversion.
114114-115115-### Merge Strategy
116116-117117-1. Fetch top N keyword results (e.g., N=50)
118118-2. Fetch top N semantic results
119119-3. Merge on `document_id`
120120-4. For documents appearing in both sets, combine scores
121121-5. For documents in only one set, use that score (with 0 for the missing signal)
122122-6. Sort by `hybrid_score` descending
123123-7. Deduplicate
124124-8. Apply limit/offset
125125-126126-### v2: Reciprocal Rank Fusion (future)
127127-128128-If keyword and semantic score scales prove unstable under weighted blending, replace with RRF:
129129-130130-```text
131131-rrf_score = Σ 1 / (k + rank_i)
132132-```
133133-134134-where `k` is a constant (typically 60) and `rank_i` is the document's rank in each result list.
135135-136136-## 5. Filtering
137137-138138-All search modes support these filters, applied as SQL WHERE clauses:
139139-140140-| Filter | Parameter | SQL |
141141-| ----------- | ------------ | ------------------------------------------- |
142142-| Collection | `collection` | `d.collection = ?` |
143143-| Author | `author` | `d.author_handle = ?` or `d.did = ?` |
144144-| Repo | `repo` | `d.repo_name = ?` or `d.repo_did = ?` |
145145-| Record type | `type` | `d.record_type = ?` |
146146-| Language | `language` | `d.language = ?` |
147147-| Date range | `from`, `to` | `d.created_at >= ?` and `d.created_at <= ?` |
148148-| State | `state` | Join to `record_state` table |
149149-150150-## 6. Embedding Eligibility
151151-152152-A document is eligible for embedding if:
153153-154154-- `deleted_at IS NULL`
155155-- `record_type` is one of: `repo`, `issue`, `pull`, `string`, `profile`
156156-- At least one of `title`, `body`, or `summary` is non-empty
157157-- Total text length exceeds a minimum threshold (e.g., 20 characters)
158158-159159-## 7. API Endpoints
160160-161161-### Health
162162-163163-| Method | Path | Description |
164164-| ------ | ---------- | -------------------------------- |
165165-| GET | `/healthz` | Liveness — process is responsive |
166166-| GET | `/readyz` | Readiness — DB is reachable |
167167-168168-### Search
169169-170170-| Method | Path | Description |
171171-| ------ | ------------------ | ------------------------------------------------ |
172172-| GET | `/search` | Search with configurable mode (default: keyword) |
173173-| GET | `/search/keyword` | Keyword-only search |
174174-| GET | `/search/semantic` | Semantic-only search |
175175-| GET | `/search/hybrid` | Hybrid search |
176176-177177-### Documents
178178-179179-| Method | Path | Description |
180180-| ------ | ----------------- | ----------------------------- |
181181-| GET | `/documents/{id}` | Fetch a single document by ID |
182182-183183-### Admin
184184-185185-| Method | Path | Description |
186186-| ------ | ---------------- | -------------------- |
187187-| POST | `/admin/reindex` | Trigger reindex |
188188-| POST | `/admin/reembed` | Trigger re-embedding |
189189-190190-Admin endpoints are disabled by default. Enable with `ENABLE_ADMIN_ENDPOINTS=true`.
191191-192192-## 8. Query Parameters
193193-194194-| Parameter | Type | Default | Description |
195195-| ------------ | ------ | --------- | -------------------------------------------------------------------- |
196196-| `q` | string | required | Search query |
197197-| `mode` | string | `keyword` | `keyword`, `semantic`, or `hybrid` |
198198-| `limit` | int | 20 | Results per page (max: `SEARCH_MAX_LIMIT`) |
199199-| `offset` | int | 0 | Pagination offset |
200200-| `collection` | string | — | Filter by `sh.tangled.*` collection |
201201-| `type` | string | — | Filter by record type (`repo`, `issue`, `pull`, `string`, `profile`) |
202202-| `author` | string | — | Filter by author handle or DID |
203203-| `repo` | string | — | Filter by repo name or repo DID |
204204-| `language` | string | — | Filter by language |
205205-| `from` | string | — | Created after (ISO 8601) |
206206-| `to` | string | — | Created before (ISO 8601) |
207207-| `state` | string | — | Filter by state (`open`, `closed`, `merged`) |
208208-209209-## 9. Search Response
210210-211211-```json
212212-{
213213- "query": "rust markdown tui",
214214- "mode": "hybrid",
215215- "total": 142,
216216- "limit": 20,
217217- "offset": 0,
218218- "results": [
219219- {
220220- "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x",
221221- "collection": "sh.tangled.repo",
222222- "record_type": "repo",
223223- "title": "glow-rs",
224224- "body_snippet": "A TUI markdown viewer inspired by <mark>Glow</mark>...",
225225- "summary": "Rust TUI markdown viewer",
226226- "repo_name": "glow-rs",
227227- "author_handle": "desertthunder.dev",
228228- "score": 0.842,
229229- "matched_by": ["keyword", "semantic"],
230230- "created_at": "2026-03-20T10:00:00Z",
231231- "updated_at": "2026-03-22T15:03:11Z"
232232- }
233233- ]
234234-}
235235-```
236236-237237-### Result Fields
238238-239239-| Field | Type | Description |
240240-| ------------------ | -------- | ------------------------------------------- |
241241-| `id` | string | Document stable ID |
242242-| `collection` | string | ATProto collection NSID |
243243-| `record_type` | string | Normalized type label |
244244-| `title` | string | Document title |
245245-| `body_snippet` | string | Highlighted body excerpt |
246246-| `summary` | string | Short description |
247247-| `repo_name` | string | Repository name (if applicable) |
248248-| `author_handle` | string | Author handle |
249249-| `did` | string | Author DID when available |
250250-| `at_uri` | string | Canonical AT URI when available |
251251-| `primary_language` | string | Primary language for repo results |
252252-| `stars` | number | Indexed star count for repo results |
253253-| `follower_count` | number | Indexed follower count for profile results |
254254-| `following_count` | number | Indexed following count for profile results |
255255-| `score` | float | Relevance score (0–1) |
256256-| `matched_by` | string[] | Which search modes produced this result |
257257-| `created_at` | string | ISO 8601 creation timestamp |
258258-| `updated_at` | string | ISO 8601 last update timestamp |
259259-260260-## 10. Document Response
261261-262262-`GET /documents/{id}` returns the full document:
263263-264264-```json
265265-{
266266- "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x",
267267- "did": "did:plc:abc",
268268- "collection": "sh.tangled.repo",
269269- "rkey": "3kb3fge5lm32x",
270270- "at_uri": "at://did:plc:abc/sh.tangled.repo/3kb3fge5lm32x",
271271- "cid": "bafyreig...",
272272- "record_type": "repo",
273273- "title": "glow-rs",
274274- "body": "A TUI markdown viewer inspired by Glow, written in Rust.",
275275- "summary": "Rust TUI markdown viewer",
276276- "repo_name": "glow-rs",
277277- "author_handle": "desertthunder.dev",
278278- "tags_json": "[\"rust\", \"tui\", \"markdown\"]",
279279- "language": "en",
280280- "created_at": "2026-03-20T10:00:00Z",
281281- "updated_at": "2026-03-22T15:03:11Z",
282282- "indexed_at": "2026-03-22T15:05:00Z",
283283- "has_embedding": true
284284-}
285285-```
286286-287287-## 11. Error Responses
288288-289289-| Status | Condition |
290290-| ------ | ------------------------------------------------------------------ |
291291-| 400 | Missing `q` parameter, invalid `limit`/`offset`, malformed filters |
292292-| 404 | Document not found |
293293-| 503 | DB unreachable (readiness failure) |
294294-295295-```json
296296-{ "error": "invalid_parameter", "message": "limit must be between 1 and 100" }
297297-```
298298-299299-## 12. API Behavior
300300-301301-- `keyword` returns only lexical matches via `fts_match`/`fts_score`
302302-- `semantic` returns only embedding-backed matches via `vector_top_k`
303303-- `hybrid` merges both result sets and reranks
304304-- All modes exclude documents with `deleted_at IS NOT NULL` by default
305305-- Pagination uses `limit`/`offset` (cursor-based pagination deferred)
306306-- Mobile clients may use `type=repo` and `type=profile` to render repo/profile search directly
-434
docs/api/specs/06-operations.md
···11----
22-title: "Spec 06 — Operations"
33-updated: 2026-03-23
44----
55-66-Covers configuration, observability, security, and deployment.
77-88-## 0. Quick Setup
99-1010-Tap is already deployed. For a new environment, the minimum operator work is:
1111-1212-1. Create or choose a Turso database for that environment
1313-2. Generate a Turso auth token for that database
1414-3. Point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at that database
1515-4. Create Railway services for `api` and `indexer`
1616-5. Point `TAP_URL` at the existing Tap deployment
1717-6. Run migrations/start the services
1818-7. Run `twister backfill` before treating the environment as search-ready
1919-2020-No separate `*_DEV` or `*_PROD` variables are required. Each environment keeps using the same variable names and simply points them at the appropriate Turso database.
2121-2222-## 1. Configuration
2323-2424-All configuration is via environment variables.
2525-2626-### Required
2727-2828-| Variable | Description |
2929-| --------------------- | ----------------------------------------------------------- |
3030-| `TAP_URL` | Tap WebSocket URL (e.g., `wss://tap.example.com/channel`) |
3131-| `TAP_AUTH_PASSWORD` | Tap admin password for Basic auth (if set on Tap) |
3232-| `TURSO_DATABASE_URL` | Turso connection URL (e.g., `libsql://db-name.turso.io`) |
3333-| `TURSO_AUTH_TOKEN` | Turso JWT auth token |
3434-| `INDEXED_COLLECTIONS` | Comma-separated list of `sh.tangled.*` collections to index |
3535-3636-### Search
3737-3838-| Variable | Default | Description |
3939-| ---------------------- | --------- | ------------------------ |
4040-| `SEARCH_DEFAULT_LIMIT` | `20` | Default results per page |
4141-| `SEARCH_MAX_LIMIT` | `100` | Maximum results per page |
4242-| `SEARCH_DEFAULT_MODE` | `keyword` | Default search mode |
4343-4444-### Embedding (Ollama — self-hosted)
4545-4646-| Variable | Default | Description |
4747-| ---------------------- | ------------------------------------------ | ---------------------------------------------- |
4848-| `OLLAMA_URL` | `http://ollama.railway.internal:11434` | Ollama server URL |
4949-| `EMBEDDING_MODEL` | `nomic-embed-text` | Ollama model name |
5050-| `EMBEDDING_DIM` | `768` | Vector dimensionality (must match model) |
5151-| `EMBEDDING_BATCH_SIZE` | `32` | Documents per embedding batch |
5252-5353-### Hybrid Search
5454-5555-| Variable | Default | Description |
5656-| ------------------------ | ------- | --------------------------------------- |
5757-| `HYBRID_KEYWORD_WEIGHT` | `0.65` | Keyword score weight in hybrid ranking |
5858-| `HYBRID_SEMANTIC_WEIGHT` | `0.35` | Semantic score weight in hybrid ranking |
5959-6060-### Server
6161-6262-| Variable | Default | Description |
6363-| ------------------------ | ------- | ------------------------------------------- |
6464-| `HTTP_BIND_ADDR` | `:8080` | API server bind address |
6565-| `LOG_LEVEL` | `info` | Log level: `debug`, `info`, `warn`, `error` |
6666-| `LOG_FORMAT` | `json` | Log format: `json` or `text` |
6767-| `ENABLE_ADMIN_ENDPOINTS` | `false` | Enable `/admin/*` endpoints |
6868-| `ADMIN_AUTH_TOKEN` | — | Bearer token for admin endpoints |
6969-7070-### Example `.env`
7171-7272-```bash
7373-# Tap (deployed on Railway)
7474-TAP_URL=wss://tap-instance.up.railway.app/channel
7575-TAP_AUTH_PASSWORD=your-tap-admin-password
7676-7777-# Turso
7878-TURSO_DATABASE_URL=libsql://twister-db.turso.io
7979-TURSO_AUTH_TOKEN=eyJhbGci...
8080-8181-# Collections
8282-INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile,sh.tangled.repo.issue.comment,sh.tangled.repo.pull.comment,sh.tangled.repo.issue.state,sh.tangled.repo.pull.status,sh.tangled.feed.star
8383-8484-# Search
8585-SEARCH_DEFAULT_LIMIT=20
8686-SEARCH_MAX_LIMIT=100
8787-8888-# Embedding — Ollama (Phase 2)
8989-# OLLAMA_URL=http://ollama.railway.internal:11434
9090-# EMBEDDING_MODEL=nomic-embed-text
9191-# EMBEDDING_DIM=768
9292-9393-# Server
9494-HTTP_BIND_ADDR=:8080
9595-LOG_LEVEL=info
9696-ENABLE_ADMIN_ENDPOINTS=false
9797-```
9898-9999-### Environment Selection
100100-101101-Use the same variable names in every environment:
102102-103103-- local development can point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at `twister-dev`
104104-- production can point those same variables at `twister-prod`
105105-106106-The application should not care which database it is talking to; only the environment wiring changes.
107107-108108-## 1.5. Turso Setup
109109-110110-### Recommended Databases
111111-112112-Use one Turso database per environment, for example:
113113-114114-- `twister-dev`
115115-- `twister-prod`
116116-117117-Keep the app config identical across environments and swap only these values:
118118-119119-- `TURSO_DATABASE_URL`
120120-- `TURSO_AUTH_TOKEN`
121121-122122-### Basic Flow
123123-124124-Using the Turso dashboard or CLI:
125125-126126-1. Create the database for the target environment
127127-2. Capture its libSQL URL
128128-3. Create an auth token for the service
129129-4. Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` in that environment
130130-131131-Example values:
132132-133133-```bash
134134-# Development environment
135135-TURSO_DATABASE_URL=libsql://twister-dev-your-org.turso.io
136136-TURSO_AUTH_TOKEN=...
137137-138138-# Production environment
139139-TURSO_DATABASE_URL=libsql://twister-prod-your-org.turso.io
140140-TURSO_AUTH_TOKEN=...
141141-```
142142-143143-### Practical Rule
144144-145145-Do not introduce `TURSO_DATABASE_URL_DEV`, `TURSO_DATABASE_URL_PROD`, or similar split variables. Railway environments, local shells, and CI should all set the same names with environment-specific values.
146146-147147-## 1.6. Railway Setup
148148-149149-### Project Layout
150150-151151-Create or reuse one Railway project containing:
152152-153153-- existing `tap` service
154154-- `api` service running `twister api`
155155-- `indexer` service running `twister indexer`
156156-157157-### Basic Steps
158158-159159-1. Connect the monorepo to Railway
160160-2. Create the `api` and `indexer` services from the same source repo/Docker image
161161-3. Set shared variables on both services:
162162- - `TURSO_DATABASE_URL`
163163- - `TURSO_AUTH_TOKEN`
164164- - `LOG_LEVEL`
165165- - `LOG_FORMAT`
166166-4. Set API-specific variables:
167167- - `HTTP_BIND_ADDR`
168168- - `SEARCH_DEFAULT_LIMIT`
169169- - `SEARCH_MAX_LIMIT`
170170-5. Set indexer-specific variables:
171171- - `TAP_URL`
172172- - `TAP_AUTH_PASSWORD`
173173- - `INDEXED_COLLECTIONS`
174174-6. Configure health checks
175175-7. Deploy
176176-8. Run backfill against the environment before public validation
177177-178178-### Dev vs Production on Railway
179179-180180-If you use multiple Railway environments, keep the same service definitions and variable names in each one. Only the values change:
181181-182182-- dev Railway environment -> `TURSO_DATABASE_URL=...twister-dev...`
183183-- prod Railway environment -> `TURSO_DATABASE_URL=...twister-prod...`
184184-185185-This keeps deployment logic simple and avoids conditional application config.
186186-187187-## 2. Observability
188188-189189-### Structured Logging
190190-191191-Use Go's `slog` with JSON output. Every log entry includes:
192192-193193-| Field | Description |
194194-| --------- | ----------------------------------- |
195195-| `ts` | Timestamp (RFC 3339) |
196196-| `level` | Log level |
197197-| `service` | `api`, `indexer`, or `embed-worker` |
198198-| `msg` | Human-readable message |
199199-200200-#### Context Fields (where applicable)
201201-202202-| Field | When |
203203-| ------------- | ------------------------ |
204204-| `event_name` | Tap event processing |
205205-| `event_id` | Tap event ID |
206206-| `document_id` | Document operations |
207207-| `did` | Any DID-scoped operation |
208208-| `collection` | Record processing |
209209-| `rkey` | Record processing |
210210-| `cursor` | Cursor persistence |
211211-| `error_class` | Error handling |
212212-| `duration_ms` | Timed operations |
213213-214214-### Metrics
215215-216216-Recommended counters and gauges (via logs, Prometheus, or platform metrics):
217217-218218-#### Ingestion
219219-220220-| Metric | Type | Description |
221221-| ------------------------------ | --------- | ---------------------------------- |
222222-| `events_processed_total` | counter | Total Tap events processed |
223223-| `events_failed_total` | counter | Events that failed processing |
224224-| `normalization_failures_total` | counter | Normalization errors by collection |
225225-| `upsert_duration_ms` | histogram | DB upsert latency |
226226-| `cursor_position` | gauge | Current Tap cursor position |
227227-228228-#### Embedding
229229-230230-| Metric | Type | Description |
231231-| -------------------------- | --------- | ------------------------------ |
232232-| `embedding_queue_depth` | gauge | Pending embedding jobs |
233233-| `embedding_failures_total` | counter | Failed embedding attempts |
234234-| `embedding_duration_ms` | histogram | Per-document embedding latency |
235235-236236-#### Search
237237-238238-| Metric | Type | Description |
239239-| ----------------------- | --------- | -------------------------- |
240240-| `search_requests_total` | counter | Requests by mode |
241241-| `search_duration_ms` | histogram | Query latency by mode |
242242-| `search_results_count` | histogram | Results returned per query |
243243-244244-### Health Checks
245245-246246-#### API Process
247247-248248-| Endpoint | Check | Healthy |
249249-| -------------- | --------------------- | ------------------- |
250250-| `GET /healthz` | Process is responsive | Always (liveness) |
251251-| `GET /readyz` | DB connection works | `SELECT 1` succeeds |
252252-253253-#### Indexer Process
254254-255255-The indexer exposes a top-level health probe (not HTTP-routed):
256256-257257-- Tap WebSocket connected or reconnecting
258258-- Cursor advancing or intentionally idle
259259-- DB reachable
260260-261261-On Railway, this is a health check endpoint on a separate port (9090).
262262-263263-#### Embed Worker
264264-265265-- DB reachable
266266-- Embedding provider reachable (periodic test call)
267267-- Job queue not stalled (jobs processing within expected timeframe)
268268-269269-## 3. Security
270270-271271-### Secrets Management
272272-273273-Secrets are injected through platform secret management:
274274-275275-- **Railway:** Environment variables in the dashboard or `railway variables`
276276-277277-Secrets are never stored in code, config files, or Docker images.
278278-279279-Required secrets:
280280-281281-| Secret | Purpose |
282282-| ------------------- | --------------------------------- |
283283-| `TURSO_AUTH_TOKEN` | Turso database authentication |
284284-| `TAP_AUTH_PASSWORD` | Tap admin API authentication |
285285-| `OLLAMA_URL` | Ollama sidecar connection (no secret if internal networking) |
286286-| `ADMIN_AUTH_TOKEN` | Admin endpoint authentication |
287287-288288-### Admin Endpoints
289289-290290-Admin endpoints (`/admin/reindex`, `/admin/reembed`) are:
291291-292292-- Disabled by default (`ENABLE_ADMIN_ENDPOINTS=false`)
293293-- When enabled, protected by bearer token (`ADMIN_AUTH_TOKEN`)
294294-- Alternatively, exposed only on internal networking (Railway private networking)
295295-296296-### Input Validation
297297-298298-The search API shall:
299299-300300-- Validate `limit` is between 1 and `SEARCH_MAX_LIMIT`
301301-- Validate `offset` is non-negative
302302-- Reject unknown or malformed filter parameters with 400
303303-- Sanitize query strings before passing to FTS (Tantivy query parser handles this, but validate basic structure)
304304-- Bound hybrid requests (limit concurrent vector searches)
305305-306306-### Tap Authentication
307307-308308-The indexer authenticates to Tap using HTTP Basic auth (`admin:<TAP_AUTH_PASSWORD>`). The WebSocket upgrade request includes the auth header.
309309-310310-### Data Privacy
311311-312312-- All indexed content is public ATProto data
313313-- No private or authenticated content is ingested
314314-- Deleted records are tombstoned (`deleted_at` set) and excluded from search results
315315-- Tombstoned documents are periodically purged (configurable retention)
316316-317317-## 4. Deployment
318318-319319-### Railway (Primary)
320320-321321-All Twister services deploy as separate Railway services within the same project. Tap is already deployed here.
322322-323323-#### Service Layout
324324-325325-| Service | Start Command | Health Check | Public |
326326-| ------------ | ---------------------- | ------------------ | ------ |
327327-| tap | (already deployed) | `GET /health` | no |
328328-| api | `twister api` | `GET /healthz` | yes |
329329-| indexer | `twister indexer` | `GET :9090/health` | no |
330330-| embed-worker | `twister embed-worker` | `GET :9091/health` | no |
331331-| ollama | (Railway template) | `GET /api/tags` | no |
332332-333333-All services share the same Docker image. Railway uses the start command to select the subcommand.
334334-335335-#### Environment Variables
336336-337337-Set per-service in the Railway dashboard or via `railway variables`:
338338-339339-```bash
340340-# Shared across services
341341-TURSO_DATABASE_URL=libsql://twister-db.turso.io
342342-TURSO_AUTH_TOKEN=eyJ...
343343-LOG_LEVEL=info
344344-LOG_FORMAT=json
345345-346346-# API service
347347-HTTP_BIND_ADDR=:8080
348348-SEARCH_DEFAULT_LIMIT=20
349349-SEARCH_MAX_LIMIT=100
350350-ENABLE_ADMIN_ENDPOINTS=false
351351-352352-# Indexer service
353353-TAP_URL=wss://${{tap.RAILWAY_PUBLIC_DOMAIN}}/channel # Railway service reference
354354-TAP_AUTH_PASSWORD=...
355355-INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile
356356-357357-# Embed-worker + Ollama (Phase 2)
358358-# OLLAMA_URL=http://ollama.railway.internal:11434
359359-# EMBEDDING_MODEL=nomic-embed-text
360360-```
361361-362362-Railway supports referencing other services' variables with `${{service.VAR}}` syntax, which is useful for linking the indexer to Tap's domain.
363363-364364-#### First-Time Bootstrap Checklist
365365-366366-After the first successful deploy of a new environment:
367367-368368-1. Confirm API readiness on `/readyz`
369369-2. Confirm indexer health and Tap connectivity
370370-3. Run graph backfill with the environment's seed file
371371-4. Wait for Tap historical sync to settle
372372-5. Verify that search returns known historical repos/profiles
373373-374374-#### Health Checks
375375-376376-Railway activates deployments based on health check responses. Configure per-service:
377377-378378-- **api:** HTTP health check on `/healthz` port 8080
379379-- **indexer:** HTTP health check on `/health` port 9090
380380-- **embed-worker:** HTTP health check on `/health` port 9091
381381-382382-#### Autodeploy
383383-384384-Connect the GitHub repository for automatic deployments on push. Railway builds from the Dockerfile and uses the start command configured per service.
385385-386386-#### Internal Networking
387387-388388-Railway services within the same project can communicate over private networking using `service.railway.internal` hostnames. The indexer connects to Tap via this internal network when both are in the same project.
389389-390390-### Dockerfile
391391-392392-```dockerfile
393393-FROM golang:1.24-alpine AS builder
394394-395395-WORKDIR /app
396396-397397-COPY go.mod go.sum ./
398398-RUN go mod download
399399-400400-COPY . .
401401-402402-RUN CGO_ENABLED=0 GOOS=linux go build \
403403- -ldflags="-s -w" \
404404- -o /app/twister \
405405- ./main.go
406406-407407-FROM alpine:3.21
408408-409409-RUN apk add --no-cache ca-certificates tzdata
410410-411411-COPY --from=builder /app/twister /usr/local/bin/twister
412412-413413-EXPOSE 8080 9090 9091
414414-415415-CMD ["twister", "api"]
416416-```
417417-418418-Notes:
419419-420420-- `CGO_ENABLED=0` for static binary (required if using `libsql-client-go`; not compatible with `go-libsql` which needs CGo)
421421-- Railway overrides `CMD` with the start command configured per service
422422-- Multiple ports exposed: 8080 (API), 9090 (indexer health), 9091 (embed-worker health)
423423-424424-### Graceful Shutdown
425425-426426-All processes handle `SIGTERM` and `SIGINT`:
427427-428428-1. Stop accepting new requests/events
429429-2. Drain in-flight work (with timeout)
430430-3. Persist current cursor (indexer)
431431-4. Close DB connections
432432-5. Exit 0
433433-434434-Railway sends `SIGTERM` during deployments and restarts.
-142
docs/api/specs/07-graph-backfill.md
···11----
22-title: "Spec 07 — Graph Backfill"
33-updated: 2026-03-22
44----
55-66-## 1. Purpose
77-88-Bootstrap the search index with existing Tangled content by discovering users from a seed set and triggering Tap backfill for their repositories. Without this, the index only captures new events after deployment.
99-1010-## 2. Seed Set
1111-1212-A manually curated list of known Tangled users (DIDs or handles), stored in a plain text file:
1313-1414-```text
1515-# Known active Tangled users
1616-did:plc:abc123
1717-did:plc:def456
1818-alice.tangled.sh
1919-bob.tangled.sh
2020-# Add more as discovered
2121-```
2222-2323-Format:
2424-2525-- One entry per line
2626-- Lines starting with `#` are comments
2727-- Blank lines are ignored
2828-- Entries can be DIDs (`did:plc:...`) or handles (`alice.tangled.sh`)
2929-- Handles are resolved to DIDs before processing
3030-3131-## 3. Fan-Out Strategy
3232-3333-From each seed user, discover connected users to expand the crawl set:
3434-3535-### Discovery Sources
3636-3737-1. **Follows**: Fetch `sh.tangled.graph.follow` records for the user → extract `subject` DIDs
3838-2. **Collaborators**: For repos owned by the user, identify other users who have created issues, PRs, or comments → extract their DIDs
3939-4040-### Depth Limit
4141-4242-Fan-out is configurable with a max hops parameter (default: 2):
4343-4444-- **Hop 0**: Seed users themselves
4545-- **Hop 1**: Direct follows and collaborators of seed users
4646-- **Hop 2**: Follows and collaborators of hop-1 users
4747-4848-Higher hop counts discover more users but increase time and may pull in loosely related accounts. Start with 2 hops and adjust based on the size of the Tangled network.
4949-5050-### Crawl Queue
5151-5252-Discovered DIDs are added to a queue, deduplicated by DID. Each entry tracks:
5353-5454-- DID
5555-- Discovery hop (distance from seed)
5656-- Source (which seed/user led to discovery)
5757-5858-## 4. Backfill Mechanism
5959-6060-For each discovered user:
6161-6262-1. **Check Tap status**: Query Tap's `/info/:did` endpoint and classify by status:
6363- - tracked + backfilled: skip
6464- - tracked + backfilling/in-progress: skip and let current backfill finish
6565- - untracked or tracked-without-backfill-state: submit to `/repos/add`
6666-2. **Register with Tap**: POST to `/repos/add` with the DID — Tap handles the actual repo export and event delivery
6767-3. **Tap backfill flow**: Tap fetches full repo history from PDS via `com.atproto.sync.getRepo`, then delivers historical events (`live: false`) through the normal WebSocket channel
6868-4. **Indexer processes normally**: The indexer's existing ingestion loop handles backfill events the same as live events — no special backfill code path needed
6969-7070-### Rate Limiting
7171-7272-- Batch `/repos/add` calls (e.g., 10 DIDs per request)
7373-- Add configurable delay between batches to avoid overwhelming Tap
7474-- Respect Tap's processing capacity — monitor `/stats/repo-count` to track progress
7575-7676-## 5. Deduplication
7777-7878-- **User-level**: Maintain a visited set of DIDs during fan-out; skip already-seen DIDs
7979-- **Tap-level**: Tap's `/repos/add` is idempotent — adding an already-tracked DID is a no-op
8080-- **Record-level**: The indexer's upsert logic (keyed on `did|collection|rkey`) handles duplicate events naturally
8181-8282-## 6. CLI Interface
8383-8484-```bash
8585-# Basic backfill from seed file
8686-twister backfill --seeds seeds.txt
8787-8888-# Limit fan-out depth
8989-twister backfill --seeds seeds.txt --max-hops 1
9090-9191-# Preview discovered users without triggering backfill
9292-twister backfill --seeds seeds.txt --dry-run
9393-9494-# Control parallelism
9595-twister backfill --seeds seeds.txt --concurrency 5
9696-```
9797-9898-### Flags
9999-100100-| Flag | Default | Description |
101101-| --------------- | -------- | ----------------------------------------------- |
102102-| `--seeds` | required | Seed source: file path or comma-separated list |
103103-| `--max-hops` | `2` | Max fan-out depth from seed users |
104104-| `--dry-run` | `false` | List discovered users without submitting to Tap |
105105-| `--concurrency` | `5` | Parallel discovery workers |
106106-| `--batch-size` | `10` | DIDs per `/repos/add` call |
107107-| `--batch-delay` | `1s` | Delay between batches |
108108-109109-### Output
110110-111111-Progress is logged to stdout:
112112-113113-```text
114114-[hop 0] Processing 5 seed users...
115115-[hop 0] did:plc:abc123 → 12 follows, 3 collaborators
116116-[hop 0] did:plc:def456 → 8 follows, 1 collaborator
117117-[hop 1] Processing 24 discovered users (18 new)...
118118-...
119119-[done] Discovered 142 unique users across 2 hops
120120-[done] Submitted 98 new DIDs to Tap (44 already tracked)
121121-```
122122-123123-## 7. Idempotency
124124-125125-The entire backfill process is safe to re-run:
126126-127127-- Seed file parsing is stateless
128128-- Fan-out discovery is deterministic for a given network state
129129-- Tap's `/repos/add` is idempotent
130130-- The indexer's upsert logic handles re-delivered events
131131-- No local state is persisted between runs (the crawl queue is in-memory)
132132-133133-## 8. Configuration
134134-135135-| Variable | Default | Description |
136136-| -------------------- | ---------- | ----------------------------- |
137137-| `TAP_URL` | (existing) | Tap base URL for API calls |
138138-| `TAP_AUTH_PASSWORD` | (existing) | Tap admin auth |
139139-| `TURSO_DATABASE_URL` | (existing) | For checking existing records |
140140-| `TURSO_AUTH_TOKEN` | (existing) | DB auth |
141141-142142-No new environment variables are needed — backfill reuses existing Tap and DB configuration.
-89
docs/api/specs/08-app-integration.md
···11----
22-title: "Spec 08 — App Integration"
33-updated: 2026-03-23
44----
55-66-## 1. Purpose
77-88-Define the mobile-facing Twister API surface.
99-1010-The Twisted app should keep using Tangled's public knot and PDS APIs for canonical repo/profile detail. Twister is responsible for:
1111-1212-- cross-network discovery via search
1313-- index-backed summaries for data gaps such as followers
1414-1515-## 2. Client Boundary
1616-1717-The mobile client uses Twister only for:
1818-1919-- Explore search
2020-- index-backed profile summaries
2121-- future feed and notification features
2222-2323-The mobile client does not use Twister for:
2424-2525-- repo tree/blob/detail reads
2626-- direct profile record reads
2727-- issue/PR detail reads
2828-2929-Those remain on Tangled's public APIs.
3030-3131-## 3. Search Contract
3232-3333-`GET /search`
3434-3535-Required query parameters:
3636-3737-- `q`
3838-3939-Optional query parameters:
4040-4141-- `mode=keyword|semantic|hybrid`
4242-- `type=repo|profile`
4343-- `limit`
4444-- `offset`
4545-4646-For mobile clients, repo and profile results should include:
4747-4848-- `did`
4949-- `at_uri`
5050-- `record_type`
5151-- `title`
5252-- `summary`
5353-- `repo_name`
5454-- `author_handle`
5555-- `updated_at`
5656-- `primary_language` for repos when known
5757-- `stars` for repos when known
5858-- `follower_count` and `following_count` for profiles when known
5959-6060-## 4. Profile Summary Contract
6161-6262-`GET /profiles/{did}/summary`
6363-6464-Response:
6565-6666-```json
6767-{
6868- "did": "did:plc:abc123",
6969- "handle": "desertthunder.dev",
7070- "follower_count": 128,
7171- "following_count": 84,
7272- "indexed_at": "2026-03-23T10:15:00Z"
7373-}
7474-```
7575-7676-This endpoint exists because follower counts and follower lists are derived from indexed graph state, not from a single direct public Tangled API call.
7777-7878-## 5. Failure Handling
7979-8080-If Twister is unavailable:
8181-8282-- the app should keep direct known-handle browsing working
8383-- Explore should show a clear "index unavailable" state
8484-- profile pages should omit index-backed follower counts rather than fail entirely
8585-8686-## 6. Ownership
8787-8888-- Twister owns search ranking, document normalization, and graph summary derivation
8989-- The app owns result presentation, route transitions, and fallback behavior
-166
docs/api/specs/09-search-site.md
···11----
22-title: "Spec 09 — Search Site"
33-updated: 2026-03-23
44----
55-66-A minimal static site that serves as both the public Twister API documentation and a live search showcase. Dark mode only, no framework or build step.
77-88-## 1. Purpose
99-1010-- Give developers a browsable reference for the Twister search API
1111-- Give anyone a way to try search against live indexed Tangled content
1212-- Provide a shareable public URL before the mobile app ships
1313-1414-## 2. Scope
1515-1616-In scope:
1717-1818-- Static HTML/CSS/JS (Alpine.js, no bundler)
1919-- API reference pages generated from the spec docs
2020-- Live search input wired to `GET /search`
2121-- Result rendering with type-aware cards (repo, issue, PR, profile, string)
2222-- Filter controls for collection, type, author, language, state
2323-- Pagination
2424-- Responsive layout (mobile-friendly, single breakpoint)
2525-2626-Out of scope:
2727-2828-- Auth, OAuth, or any write operations
2929-- Semantic or hybrid mode toggle (keyword only for MVP)
3030-- Server-side rendering or static-site generator
3131-- Analytics or telemetry
3232-3333-## 3. Pages
3434-3535-| Route | Content |
3636-| ----------------- | --------------------------------------------------------------------------- |
3737-| `/` | Search input + results (the homepage is the search page) |
3838-| `/docs` | API overview: base URL, auth (none for public), rate limits, response shape |
3939-| `/docs/search` | `GET /search` — parameters, filters, response contract, examples |
4040-| `/docs/documents` | `GET /documents/{id}` — request/response, examples |
4141-| `/docs/health` | `GET /healthz`, `GET /readyz` — purpose and expected responses |
4242-4343-## 4. Search Page Behavior
4444-4545-1. Text input with a submit button. No debounce search-as-you-type for MVP.
4646-2. On submit, fetch `GET {API_BASE}/search?q={query}&limit=20` (plus any active filters).
4747-3. Render results as a vertical list of cards.
4848-4. Each card shows: `record_type` badge, `title`, `body_snippet` (with `<mark>` highlights preserved), `author_handle`, `repo_name` (when present), `updated_at` relative time.
4949-5. Clicking a result opens the canonical Tangled URL (`https://tangled.org/{handle}/{repo}` for repos, etc.) in a new tab.
5050-6. "Load more" button appends the next page (`offset += limit`).
5151-7. Empty state: "No results" message.
5252-8. Error state: inline message if the API is unreachable.
5353-9. Filter bar above results: dropdowns/inputs for `type`, `language`, `author`. Filters are query params so URLs are shareable.
5454-5555-## 5. API Docs Pages
5656-5757-Hand-written HTML mirroring the contracts in spec 05 (search) and spec 08 (app integration). Each page includes:
5858-5959-- Endpoint signature (method, path)
6060-- Parameter table (name, type, default, description)
6161-- Example request (curl)
6262-- Example response (JSON block with syntax highlighting via `<pre><code>`)
6363-6464-No generated docs tooling. The pages are static and updated manually when the API changes.
6565-6666-## 6. Styling
6767-6868-Minimal CSS, no utility framework.
6969-7070-### Tokens
7171-7272-```css
7373-:root {
7474- --bg: #0e0e0e;
7575- --surface: #1a1a1a;
7676- --border: #2a2a2a;
7777- --text: #e0e0e0;
7878- --text-dim: #888;
7979- --accent: #7aa2f7;
8080- --mark-bg: #7aa2f733;
8181- --mono: "Google Sans Mono", monospace;
8282- --sans: "Google Sans", sans-serif;
8383- --radius: 6px;
8484-}
8585-```
8686-8787-### Rules
8888-8989-- Dark theming.
9090-- `Google Sans` for body text. `Google Sans Mono` for code, JSON, and badges.
9191-- Fonts loaded via Google Fonts `<link>`. System fallbacks: `sans-serif`, `monospace`.
9292-- Max content width: `720px`, centered.
9393-- Cards: `var(--surface)` background, `var(--border)` border, `var(--radius)` corners.
9494-- `<mark>` tags in snippets styled with `var(--mark-bg)` background and `var(--accent)` text.
9595-- Code blocks: `var(--surface)` background, horizontal scroll, no wrapping.
9696-- Links: `var(--accent)`, no underline, underline on hover.
9797-- Inputs and buttons: `var(--surface)` background, `var(--border)` border, `var(--text)` text.
9898-- One breakpoint at `640px` for mobile: full-width cards, stacked filter bar.
9999-100100-## 7. Package Design
101101-102102-The site lives in `internal/view/` as a self-contained Go package. It owns the templates, static assets, and HTTP handlers. The `api` package mounts `view.Handler()` into its router — nothing else leaks out.
103103-104104-### Exports
105105-106106-The package exposes a single constructor:
107107-108108-```go
109109-// Handler returns an http.Handler that serves the site pages and static assets.
110110-func Handler() http.Handler
111111-```
112112-113113-The `api` package calls `view.Handler()` and mounts it as a fallback after API routes.
114114-115115-### Package Structure
116116-117117-```text
118118-internal/view/
119119- view.go # Handler(), route setup, embed directives
120120- templates/
121121- layout.html # Shared shell (head, nav, footer)
122122- index.html # Search page
123123- docs/
124124- index.html # API overview
125125- search.html # GET /search docs
126126- documents.html # GET /documents/{id} docs
127127- health.html # Health endpoints docs
128128- static/
129129- style.css # All styles, single file
130130- search.js # Search fetch, render, pagination, filters
131131-```
132132-133133-### Embedding
134134-135135-`view.go` uses `//go:embed` to bundle `templates/` and `static/`. Templates are parsed once at init. Static assets are served under `/static/` via `http.FileServer`.
136136-137137-### Routing
138138-139139-`view.Handler()` returns a mux that handles:
140140-141141-| Pattern | Handler |
142142-| --- | --- |
143143-| `GET /` | Render `index.html` |
144144-| `GET /docs` | Render `docs/index.html` |
145145-| `GET /docs/search` | Render `docs/search.html` |
146146-| `GET /docs/documents` | Render `docs/documents.html` |
147147-| `GET /docs/health` | Render `docs/health.html` |
148148-| `GET /static/*` | Serve embedded CSS/JS files |
149149-150150-## 9. Configuration
151151-152152-Since the site is served by the same origin as the API, search requests use relative paths (`/search?q=...`). No `API_BASE` config needed — the browser's origin is the API.
153153-154154-## 10. Local Development
155155-156156-Run `twister api` locally. The site is served at `http://localhost:8080/` alongside the API endpoints. No separate dev server or file server required.
157157-158158-The API docs pages render without any indexed data. The search page needs a running indexer and populated database to return results.
159159-160160-## 11. Constraints
161161-162162-- No dependencies besides Alpine via CDN.
163163-- Total site weight target: under 50 KB excluding fonts.
164164-- Works in modern browsers (last 2 versions of Chrome, Firefox, Safari).
165165-- All fetch calls include error handling for network failures and non-200 responses.
166166-- No CORS concerns — the site and API share an origin.
-23
docs/api/specs/README.md
···11----
22-title: "Twister — Technical Specification Index"
33-updated: 2026-03-22
44----
55-66-# Twister Technical Specifications
77-88-Twister is a Go-based index and search service for [Tangled](https://tangled.org) content on AT Protocol.
99-It ingests records through [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap), denormalizes them into search documents and graph summaries, indexes them in [Turso/libSQL](https://docs.turso.tech), and exposes public APIs for search and index-backed data gaps.
1010-1111-## Specifications
1212-1313-| # | Document | Description |
1414-| --- | ------------------------------------------ | --------------------------------------------------------------- |
1515-| 1 | [Architecture](01-architecture.md) | Purpose, goals, design principles, system context, tech choices |
1616-| 2 | [Tangled Lexicons](02-tangled-lexicons.md) | `sh.tangled.*` record schemas and fields |
1717-| 3 | [Data Model](03-data-model.md) | Database schema, search documents, sync state |
1818-| 4 | [Data Pipeline](04-data-pipeline.md) | Tap integration, normalization, failure handling |
1919-| 5 | [Search](05-search.md) | Search modes, API contract, scoring, filtering |
2020-| 6 | [Operations](06-operations.md) | Configuration, observability, security, deployment |
2121-| 7 | [Graph Backfill](07-graph-backfill.md) | Seed-based user discovery and content backfill |
2222-| 8 | [App Integration](08-app-integration.md) | Mobile-facing contracts for search and graph summaries |
2323-| 9 | [Search Site](09-search-site.md) | Static site for API docs and live search |
-41
docs/api/tasks/README.md
···11----
22-title: "Twister — Task Index"
33-updated: 2026-03-22
44----
55-66-# Twister Tasks
77-88-Assumes Go, Tap (deployed on Railway), Turso/libSQL, and Railway for deployment.
99-1010-## Delivery Strategy
1111-1212-Build in four phases:
1313-1414-1. **MVP** — ingestion, graph backfill, keyword search, deployment, operational tooling
1515-2. **Semantic Search** — embeddings, vector retrieval
1616-3. **Hybrid Search** — weighted merge of keyword + semantic
1717-4. **Quality Polish** — ranking refinement, advanced filters, analytics
1818-1919-Ship keyword search before embeddings. That gives a testable, inspectable baseline before introducing model behavior.
2020-Within MVP, run graph backfill before calling the environment search-ready for users.
2121-2222-## Phases
2323-2424-| Phase | Title | Document | Status |
2525-| ----- | --------------- | ------------------------------------------ | --------------------------------------------------------------------- |
2626-| 1 | MVP | [phase-1-mvp.md](phase-1-mvp.md) | In progress (M0–M2 complete; backfill scheduled before public launch) |
2727-| 2 | Semantic Search | [phase-2-semantic.md](phase-2-semantic.md) | Not started |
2828-| 3 | Hybrid Search | [phase-3-hybrid.md](phase-3-hybrid.md) | Not started |
2929-| 4 | Quality Polish | [phase-4-quality.md](phase-4-quality.md) | Not started |
3030-3131-## MVP Complete When
3232-3333-- Tap ingests tracked `sh.tangled.*` records
3434-- Documents normalize into a stable store
3535-- Keyword search works publicly
3636-- Index-backed profile summaries can fill public API gaps such as followers
3737-- API and indexer are deployed on Railway
3838-- Restart does not lose sync position
3939-- Reindex exists for repair
4040-- Graph backfill populates initial content from seed users
4141-- A static search site with API docs is publicly accessible
-388
docs/api/tasks/phase-1-mvp.md
···11----
22-title: "Phase 1 — MVP"
33-updated: 2026-03-22
44----
55-66-# Phase 1 — MVP
77-88-Get a searchable product online: ingestion, keyword search, deployment, and operational tooling.
99-1010-## MVP Complete When
1111-1212-- Tap ingests tracked `sh.tangled.*` records
1313-- Documents normalize into a stable store
1414-- Keyword search works publicly
1515-- API and indexer are deployed on Railway
1616-- Restart does not lose sync position
1717-- Reindex exists for repair
1818-- Graph backfill populates initial content from seed users
1919-- A static search site with API docs is publicly accessible
2020-2121-## M0 — Repository Bootstrap ✅
2222-2323-Executable layout, local tooling, and development conventions (completed 2026-03-22).
2424-2525-## M1 — Database Schema and Store Layer ✅
2626-2727-refs: [specs/03-data-model.md](../specs/03-data-model.md)
2828-2929-Implemented the Turso/libSQL schema and Go store package for document persistence.
3030-3131-## M2 — Normalization Layer ✅
3232-3333-refs: [specs/02-tangled-lexicons.md](../specs/02-tangled-lexicons.md), [specs/04-data-pipeline.md](../specs/04-data-pipeline.md)
3434-3535-Translate `sh.tangled.*` records into internal search documents.
3636-3737-## M3 — Tap Client and Ingestion Loop
3838-3939-refs: [specs/04-data-pipeline.md](../specs/04-data-pipeline.md), [specs/01-architecture.md](../specs/01-architecture.md)
4040-4141-### Goal
4242-4343-Connect the indexer to Tap (on Railway) and process live events into the store.
4444-4545-### Deliverables
4646-4747-- Tap WebSocket client package (`internal/tapclient/`)
4848-- Event decode layer (record events + identity events)
4949-- Ingestion loop with retry/backoff
5050-- Cursor persistence coupled to successful DB commits
5151-- Identity event handler (DID → handle cache)
5252-5353-### Tasks
5454-5555-- [x] Define Tap event DTOs matching the documented event shape:
5656-5757- ```go
5858- type TapEvent struct {
5959- ID int64 `json:"id"`
6060- Type string `json:"type"` // "record" or "identity"
6161- Record *TapRecord `json:"record"`
6262- Identity *TapIdentity `json:"identity"`
6363- }
6464- type TapRecord struct {
6565- Live bool `json:"live"`
6666- Rev string `json:"rev"`
6767- DID string `json:"did"`
6868- Collection string `json:"collection"`
6969- RKey string `json:"rkey"`
7070- Action string `json:"action"` // "create", "update", "delete"
7171- CID string `json:"cid"`
7272- Record json.RawMessage `json:"record"`
7373- }
7474- type TapIdentity struct {
7575- DID string `json:"did"`
7676- Handle string `json:"handle"`
7777- IsActive bool `json:"isActive"`
7878- Status string `json:"status"`
7979- }
8080- ```
8181-8282-- [x] Implement WebSocket client:
8383- - Connect to `TAP_URL` (e.g., `wss://tap.railway.internal/channel`)
8484- - HTTP Basic auth with `admin:TAP_AUTH_PASSWORD`
8585- - Auto-reconnect with exponential backoff
8686- - Ack protocol: send event `id` back after successful processing
8787-- [x] Implement ingestion loop:
8888- 1. Receive event from WebSocket
8989- 2. If `type == "identity"` → update handle cache, ack, continue
9090- 3. If `type == "record"` → check collection allowlist
9191- 4. Map `action` to operation (create/update → upsert, delete → tombstone)
9292- 5. Decode `record.record` via adapter registry
9393- 6. Normalize to `Document`
9494- 7. Upsert to store
9595- 8. Schedule embedding job if eligible ([Phase 2](phase-2-semantic.md))
9696- 9. Persist cursor (event ID) after successful DB commit
9797- 10. Ack the event
9898-- [x] Implement collection allowlist from `INDEXED_COLLECTIONS` config
9999-- [x] Handle state events (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) → update `record_state`
100100-- [x] Handle normalization failures: log, skip, advance cursor
101101-- [x] Handle DB failures: retry with backoff, do not advance cursor
102102-103103-### Exit Criteria
104104-105105-The system continuously ingests and persists `sh.tangled.*` records from Tap.
106106-107107-## M4 — Graph Backfill from Seed Users
108108-109109-refs: [specs/07-graph-backfill.md](../specs/07-graph-backfill.md)
110110-111111-### Goal
112112-113113-Bootstrap the index with historical Tangled content by discovering and backfilling users from a curated seed set.
114114-115115-### Deliverables
116116-117117-- `twister backfill` CLI command
118118-- Seed file parser and documented seed-file format
119119-- Graph fan-out discovery (follows and collaborators)
120120-- Tap `/repos/add` integration for discovered users
121121-- Deduplication against already-tracked repos
122122-- Dry-run mode and progress logging
123123-- Basic operator runbook for first bootstrap and repeat runs
124124-125125-### Tasks
126126-127127-- [x] Implement `backfill` subcommand with flags:
128128- - `--seeds <file>` — required seed file path
129129- - `--max-hops <n>` — depth limit for fan-out (default: 2)
130130- - `--dry-run` — print the discovery plan without mutating Tap
131131- - `--concurrency <n>` — parallel discovery workers (default: 5)
132132- - `--batch-size <n>` — DIDs per `/repos/add` request
133133- - `--batch-delay <duration>` — delay between Tap registration batches
134134-- [x] Implement seed file parsing:
135135- - One DID or handle per line
136136- - `#` comments allowed
137137- - Blank lines ignored
138138- - Handles resolved to DIDs before graph expansion
139139-- [x] Decide and document the initial seed file location for operators:
140140- - Repository-managed example file for format/reference
141141- - Deployment-specific runtime file or mounted secret for real runs
142142- - Implemented: `docs/api/seeds.txt` and `packages/api/internal/backfill/doc.go`
143143-- [x] Implement graph discovery:
144144- 1. Start from hop-0 seed users
145145- 2. Fetch `sh.tangled.graph.follow` records and collect subject DIDs
146146- 3. Fetch repo collaborators by inspecting repos, issues, PRs, and comments
147147- 4. Enqueue newly discovered DIDs with hop metadata
148148- 5. Stop expanding beyond `max-hops`
149149-- [x] Track discovery metadata for logs:
150150- - source DID
151151- - hop depth
152152- - discovery reason (`seed`, `follow`, `collaborator`)
153153-- [x] Integrate with Tap admin endpoints:
154154- - `GET /info/:did` to skip already-tracked repos when practical
155155- - `POST /repos/add` to register new DIDs for backfill
156156-- [x] Make the command safe to re-run:
157157- - in-memory visited DID set during crawl
158158- - tolerate duplicate `/repos/add`
159159- - rely on index upsert idempotency for re-delivered records
160160-- [x] Add operator-friendly logging:
161161- - seed count
162162- - users discovered per hop
163163- - already-tracked vs newly-submitted DIDs
164164- - batch progress
165165- - final totals
166166-- [x] Add a short runbook covering:
167167- - first bootstrap against an empty database
168168- - repeat run after expanding the seed list
169169- - dry-run before production mutation
170170- - Implemented: `packages/api/internal/backfill/doc.go`
171171-172172-### Exit Criteria
173173-174174-Operators can bootstrap an empty environment to a usable historical baseline before public rollout.
175175-176176-## M5 — Keyword Search API
177177-178178-refs: [specs/05-search.md](../specs/05-search.md)
179179-180180-### Goal
181181-182182-Expose a usable public search API backed by Turso's Tantivy-backed FTS.
183183-184184-### Deliverables
185185-186186-- HTTP server (net/http)
187187-- `GET /healthz` — liveness
188188-- `GET /readyz` — readiness (DB connectivity)
189189-- `GET /search` — keyword search with configurable mode
190190-- `GET /search/keyword` — keyword-only search
191191-- `GET /documents/{id}` — document lookup
192192-- Search repository layer (FTS queries isolated from handlers)
193193-- Pagination, filtering, snippets
194194-195195-### Tasks
196196-197197-- [x] Set up HTTP server with net/http router
198198-- [x] Implement `/healthz` (always 200) and `/readyz` (SELECT 1 against DB)
199199-- [x] Implement search repository with FTS queries:
200200-201201- ```sql
202202- SELECT id, title, summary, repo_name, author_handle, collection, record_type,
203203- created_at, updated_at,
204204- fts_score(title, body, summary, repo_name, author_handle, tags_json, ?) AS score,
205205- fts_highlight(body, '<mark>', '</mark>', ?) AS body_snippet
206206- FROM documents
207207- WHERE fts_match(title, body, summary, repo_name, author_handle, tags_json, ?)
208208- AND deleted_at IS NULL
209209- ORDER BY score DESC
210210- LIMIT ? OFFSET ?;
211211- ```
212212-213213-- [x] Implement request validation:
214214- - `q` required, non-empty
215215- - `limit` 1–100, default 20
216216- - `offset` >= 0, default 0
217217- - Reject unknown parameters with 400
218218-- [x] Implement filters (as WHERE clauses):
219219- - `collection` → `d.collection = ?`
220220- - `type` → `d.record_type = ?`
221221- - `author` → `d.author_handle = ?` or `d.did = ?`
222222- - `repo` → `d.repo_name = ?`
223223-- [x] Implement `/documents/{id}` — full document response
224224-- [x] Implement stable JSON response contract (see spec 05-search.md)
225225-- [x] Exclude tombstoned documents (`deleted_at IS NOT NULL`) by default
226226-- [x] Add request logging middleware (method, path, status, duration)
227227-- [x] Add CORS headers if needed
228228-229229-### Exit Criteria
230230-231231-A user can search Tangled content reliably with keyword search.
232232-233233-## M5a — Search Site ✅
234234-235235-refs: [specs/09-search-site.md](../specs/09-search-site.md)
236236-237237-### Goal
238238-239239-Ship a static site that doubles as public API documentation and a live search demo. Alpine.js via CDN for reactivity, no build step.
240240-241241-### Deliverables
242242-243243-- `internal/view/` package exporting `Handler() http.Handler`
244244-- Embedded templates (`templates/`) and static assets (`static/`) via `//go:embed`
245245-- Search page (`/`) wired to `GET /search` with result cards, filters, and pagination
246246-- API docs pages (`/docs/*`) covering search, documents, and health endpoints
247247-- Dark-mode-only styling with Google Sans fonts and minimal CSS tokens
248248-249249-### Tasks
250250-251251-- [x] Create `internal/view/` package with `view.go`, `templates/`, and `static/` directories
252252-- [x] Implement `Handler()` that returns an `http.Handler` with routes for all pages and `/static/*`
253253-- [x] Embed templates and static assets via `//go:embed`; parse templates once at init
254254-- [x] Use a shared `layout.html` template for the shell (head, nav, footer)
255255-- [x] Mount `view.Handler()` in the `api` package router as a fallback after API routes
256256-- [x] Build search page:
257257- - Text input + submit
258258- - Fetch `GET /search` with relative path (same origin)
259259- - Render result cards with type badge, title, snippet (preserve `<mark>`), author, repo, relative time
260260- - "Load more" pagination via offset
261261- - Filter bar: type, language, author (reflected in URL query params)
262262- - Empty and error states
263263-- [x] Build API docs pages:
264264- - `/docs` — overview (base URL, response shape, no auth)
265265- - `/docs/search` — `GET /search` params, filters, example curl, example response
266266- - `/docs/documents` — `GET /documents/{id}` request/response
267267- - `/docs/health` — `GET /healthz`, `GET /readyz`
268268-- [x] Implement `style.css` with design tokens (`--bg`, `--surface`, `--border`, `--accent`, etc.)
269269-- [x] Load Google Sans and Google Sans Mono via Google Fonts `<link>`
270270-- [x] Result card links open canonical Tangled URLs in new tab
271271-- [x] Verify total site weight under 50 KB (excluding fonts and Alpine CDN) — 21 KB total
272272-273273-### Exit Criteria
274274-275275-A user can search Tangled content and read API docs from a public URL without installing anything.
276276-277277-## M6 — Railway Deployment ✅
278278-279279-refs: [specs/06-operations.md](../specs/06-operations.md), [deploy.md](../deploy.md)
280280-281281-### Goal
282282-283283-Deploy the API and indexer as Railway services alongside Tap.
284284-285285-### Deliverables
286286-287287-- Finalized Dockerfile
288288-- Railway project with services: `api`, `indexer`
289289-- Health checks configured per service
290290-- Secrets/env vars set
291291-- Production startup commands documented
292292-293293-### Tasks
294294-295295-- [x] Finalize Dockerfile (multi-stage, CGO_ENABLED=0, Alpine runtime)
296296-- [x] Create Railway services:
297297- - `api` — start command: `twister api`
298298- - `indexer` — start command: `twister indexer`
299299-- [x] Configure environment variables per service:
300300- - Shared: `TURSO_DATABASE_URL`, `TURSO_AUTH_TOKEN`, `LOG_LEVEL`, `LOG_FORMAT`
301301- - API: `HTTP_BIND_ADDR`, `SEARCH_DEFAULT_LIMIT`, `SEARCH_MAX_LIMIT`
302302- - Indexer: `TAP_URL` (reference Tap service domain), `TAP_AUTH_PASSWORD`, `INDEXED_COLLECTIONS`
303303-- [x] Configure health checks:
304304- - API: HTTP check on `/healthz` port 8080
305305- - Indexer: HTTP check on `/health` port 9090
306306-- [x] Use Railway internal networking for indexer → Tap connection
307307-- [x] Connect GitHub repo for autodeploy
308308-- [x] Test graceful shutdown on redeploy (SIGTERM handling)
309309-- [x] Document deploy steps
310310-311311-### Exit Criteria
312312-313313-The system runs as a deployed service with health-checked processes on Railway.
314314-315315-## M7 — Reindex and Repair ✅
316316-317317-refs: [specs/05-search.md](../specs/05-search.md)
318318-319319-### Goal
320320-321321-Make the system recoverable and operable with repair tools.
322322-323323-### Deliverables
324324-325325-- `twister reindex` command with scoping options
326326-- Dry-run mode
327327-- Admin reindex endpoint
328328-- Progress logging and error summary
329329-330330-### Tasks
331331-332332-- [x] Implement `reindex` subcommand with flags:
333333- - `--collection` — reindex one collection
334334- - `--did` — reindex one DID's documents
335335- - `--document` — reindex one document by ID
336336- - `--dry-run` — show intended work without writes
337337- - No flags → reindex all
338338-- [x] Implement reindex logic:
339339- 1. Select documents matching scope
340340- 2. For each document, re-run normalization from stored fields (or re-fetch if source available)
341341- 3. Update FTS-relevant fields
342342- 4. Upsert back to store
343343- 5. Run `OPTIMIZE INDEX idx_documents_fts` after bulk reindex to merge Tantivy segments
344344- 6. Log progress (N/total, errors)
345345-- [x] Implement `POST /admin/reindex` endpoint (behind `ENABLE_ADMIN_ENDPOINTS` + `ADMIN_AUTH_TOKEN`)
346346-- [x] Add error summary output on completion
347347-- [x] Exit non-zero on unrecoverable failures
348348-349349-### Exit Criteria
350350-351351-Operators can repair bad indexes without rebuilding everything manually.
352352-353353-## M8 — Observability
354354-355355-refs: [specs/06-operations.md](../specs/06-operations.md)
356356-357357-### Goal
358358-359359-Make the system diagnosable in production.
360360-361361-### Deliverables
362362-363363-- Structured slog fields across all services
364364-- Error classification
365365-- Ingestion lag visibility
366366-- Periodic state logs
367367-- Operator documentation
368368-369369-### Tasks
370370-371371-- [ ] Standardize slog fields across all packages:
372372- - `service`, `event_name`, `event_id`, `did`, `collection`, `rkey`, `document_id`, `cursor`, `error_class`, `duration_ms`
373373-- [ ] Add error classification (normalize_error, db_error, tap_error, embed_error)
374374-- [ ] Add periodic state logs in indexer:
375375- - Current cursor position
376376- - Events processed since last log
377377- - Documents in store (count)
378378-- [ ] Add request logging in API (method, path, status, duration, query)
379379-- [ ] Add search latency logging per query mode
380380-- [ ] Write operator documentation:
381381- - Restart procedure
382382- - Reindex procedure
383383- - Backfill notes
384384- - Failure triage guide
385385-386386-### Exit Criteria
387387-388388-The system is maintainable without guesswork.
-167
docs/api/tasks/phase-2-semantic.md
···11----
22-title: "Phase 2 — Semantic Search"
33-updated: 2026-03-23
44----
55-66-# Phase 2 — Semantic Search
77-88-Add embedding generation and vector-based retrieval on top of the keyword baseline, using self-hosted Ollama for embeddings instead of external API services.
99-1010-## M8 — Ollama Sidecar and Embedding Pipeline
1111-1212-refs: [specs/01-architecture.md](../specs/01-architecture.md), [specs/03-data-model.md](../specs/03-data-model.md), [specs/05-search.md](../specs/05-search.md)
1313-1414-### Goal
1515-1616-Deploy Ollama as a Railway sidecar and add asynchronous embedding generation without blocking ingestion.
1717-1818-### Deliverables
1919-2020-- Ollama Railway service running nomic-embed-text-v1.5 (or EmbeddingGemma)
2121-- `embedding_jobs` table operational (schema from M1)
2222-- `embed-worker` subcommand
2323-- Ollama-backed embedding provider (with interface for future alternatives)
2424-- Retry and dead-letter behavior
2525-- `twister reembed` command
2626-2727-### Tasks
2828-2929-- [ ] Deploy Ollama on Railway:
3030- - Use the nomic-embed Railway template as a starting point
3131- - Configure as internal service (no public URL)
3232- - Pre-pull `nomic-embed-text` model on startup
3333- - Health check: `GET /api/tags` on port 11434
3434- - Resource budget: 1–2 GB RAM, 1–2 vCPU
3535-- [ ] Define embedding provider interface:
3636-3737- ```go
3838- type EmbeddingProvider interface {
3939- Embed(ctx context.Context, texts []string) ([][]float32, error)
4040- Model() string
4141- Dimension() int
4242- }
4343- ```
4444-4545-- [ ] Implement Ollama provider using the official Go client:
4646-4747- ```go
4848- import "github.com/ollama/ollama/api"
4949-5050- // OllamaProvider calls Ollama's /api/embed endpoint
5151- // over Railway internal networking (ollama.railway.internal:11434)
5252- type OllamaProvider struct {
5353- client *api.Client
5454- model string // "nomic-embed-text"
5555- dim int // 768
5656- }
5757- ```
5858-5959- - Configure via `OLLAMA_URL` env var (default: `http://ollama.railway.internal:11434`)
6060- - Support batch embedding (Ollama accepts multiple inputs per request)
6161- - Timeout per request (default: 30s)
6262- - Connection health check on startup
6363-- [ ] Implement embedding input text composition (see spec 04-data-pipeline.md, section 5):
6464- `title\nrepo_name\nauthor_handle\ntags\nsummary\nbody`
6565-- [ ] Add job enqueueing: on document upsert, insert `embedding_jobs` row with `status=pending`
6666-- [ ] Implement `embed-worker` loop:
6767- 1. Poll for `pending` jobs (batch by `EMBEDDING_BATCH_SIZE`, default: 32)
6868- 2. Compose input text per document
6969- 3. Call Ollama provider
7070- 4. Store vectors in `document_embeddings` with `vector32(?)`
7171- 5. Mark job `completed`
7272- 6. On failure: increment `attempts`, set `last_error`, backoff
7373- 7. After max attempts: mark `dead`
7474-- [ ] Create DiskANN vector index (see spec 03 for tuning params):
7575- ```sql
7676- CREATE INDEX idx_embeddings_vec ON document_embeddings(
7777- libsql_vector_idx(embedding, 'metric=cosine')
7878- );
7979- ```
8080-- [ ] Implement `reembed` command (re-generate all embeddings, useful for model migration)
8181-- [ ] Skip deleted documents in embedding pipeline
8282-- [ ] Add health check endpoint for embed-worker (port 9091)
8383-- [ ] Add Ollama connectivity check to embed-worker readiness probe
8484-8585-### Model Selection Notes
8686-8787-**nomic-embed-text-v1.5** is the default recommendation:
8888-- 137M parameters, 768-dimension vectors
8989-- Matryoshka support (can truncate to 64/128/256/512 dims for storage tradeoff)
9090-- 8192 token context window
9191-- ~262 MB at F16 quantization, ~500 MB RAM at runtime
9292-- Battle-tested with llama.cpp/Ollama, Railway template exists
9393-9494-**EmbeddingGemma** is the quality alternative:
9595-- 308M parameters, 768-dimension vectors
9696-- Best MTEB scores for models under 500M parameters
9797-- <200 MB quantized, similar RAM footprint
9898-- Released Sept 2025, less deployment track record
9999-100100-**all-minilm** is the budget fallback:
101101-- 23M parameters, 384-dimension vectors (requires schema change)
102102-- ~46 MB model, minimal resources
103103-- Suitable for testing or cost-constrained environments
104104-105105-### Verification
106106-107107-- [ ] Ollama service starts on Railway and responds to health checks
108108-- [ ] Creating a new searchable document enqueues an embedding job
109109-- [ ] Worker processes the job and stores a vector in `document_embeddings`
110110-- [ ] Failed embedding calls retry with bounded attempts
111111-- [ ] Keyword search still works when embed-worker or Ollama is down
112112-- [ ] `reembed` regenerates embeddings for all eligible documents
113113-- [ ] Ollama connectivity failure is surfaced in embed-worker health check
114114-115115-### Exit Criteria
116116-117117-Embeddings are produced asynchronously via self-hosted Ollama and stored durably in Turso.
118118-119119-## M9 — Semantic Search
120120-121121-refs: [specs/05-search.md](../specs/05-search.md)
122122-123123-### Goal
124124-125125-Expose vector-based semantic retrieval.
126126-127127-### Deliverables
128128-129129-- `GET /search/semantic` endpoint
130130-- Query-time embedding (convert query text → vector via Ollama)
131131-- Vector similarity search via `vector_top_k`
132132-- Response parity with keyword search
133133-134134-### Tasks
135135-136136-- [ ] Implement query embedding: call Ollama provider with user's query text
137137-- [ ] Cache query embeddings for identical queries within a short TTL (optional, reduces Ollama load)
138138-- [ ] Implement semantic search repository:
139139-140140- ```sql
141141- SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle,
142142- d.collection, d.record_type, d.created_at, d.updated_at
143143- FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v
144144- JOIN document_embeddings e ON e.rowid = v.id
145145- JOIN documents d ON d.id = e.document_id
146146- WHERE d.deleted_at IS NULL;
147147- ```
148148-149149-- [ ] Normalize distance to relevance score: `score = 1.0 - (distance / 2.0)`
150150-- [ ] Apply same filters as keyword search (collection, author, repo, type)
151151-- [ ] Add timeout and cost controls (limit vector search to reasonable K)
152152-- [ ] Wire `/search/semantic` handler
153153-- [ ] Return `matched_by: ["semantic"]` in results
154154-- [ ] Graceful degradation: if Ollama is unreachable, return 503 for semantic search while keyword search remains available
155155-156156-### Verification
157157-158158-- [ ] Semantically similar queries retrieve expected documents even with little lexical overlap
159159-- [ ] Documents without embeddings are omitted from semantic results
160160-- [ ] Semantic search returns the same JSON schema as keyword search
161161-- [ ] Latency is acceptable under small test load
162162-- [ ] Filters work correctly with semantic results
163163-- [ ] Semantic search degrades gracefully when Ollama is down
164164-165165-### Exit Criteria
166166-167167-The API supports true semantic search over Tangled documents, powered entirely by self-hosted infrastructure.
-51
docs/api/tasks/phase-3-hybrid.md
···11----
22-title: "Phase 3 — Hybrid Search"
33-updated: 2026-03-22
44----
55-66-# Phase 3 — Hybrid Search
77-88-Merge lexical and semantic search into the default high-quality retrieval mode.
99-1010-## M10 — Hybrid Search
1111-1212-refs: [specs/05-search.md](../specs/05-search.md)
1313-1414-### Deliverables
1515-1616-- `GET /search/hybrid` endpoint
1717-- Weighted score blending (keyword 0.65 + semantic 0.35)
1818-- Score normalization
1919-- Result deduplication
2020-- `matched_by` metadata showing which modes contributed
2121-2222-### Tasks
2323-2424-- [ ] Implement hybrid search orchestrator:
2525- 1. Fetch top N keyword results (N=50 or configurable)
2626- 2. Fetch top N semantic results
2727- 3. Normalize keyword scores (min-max within result set)
2828- 4. Semantic scores already normalized (0–1)
2929- 5. Merge on `document_id`
3030- 6. For documents in both sets: `hybrid_score = 0.65 * keyword + 0.35 * semantic`
3131- 7. For documents in one set: use available score (other = 0)
3232- 8. Sort by hybrid_score descending
3333- 9. Deduplicate
3434- 10. Apply limit/offset
3535-- [ ] Populate `matched_by` field: `["keyword"]`, `["semantic"]`, or `["keyword", "semantic"]`
3636-- [ ] Make weights configurable via `HYBRID_KEYWORD_WEIGHT` / `HYBRID_SEMANTIC_WEIGHT`
3737-- [ ] Wire `/search/hybrid` handler
3838-- [ ] Make `/search?mode=hybrid` work
3939-4040-### Verification
4141-4242-- [ ] Hybrid returns documents found by either source
4343-- [ ] Duplicates are merged correctly (no duplicate IDs in results)
4444-- [ ] Exact-match queries still favor lexical relevance
4545-- [ ] Exploratory natural-language queries improve over keyword-only results
4646-- [ ] Score ordering is stable across repeated runs on the same corpus
4747-- [ ] `matched_by` accurately reflects which modes produced each result
4848-4949-### Exit Criteria
5050-5151-Hybrid search becomes the preferred default search mode.
-47
docs/api/tasks/phase-4-quality.md
···11----
22-title: "Phase 4 — Ranking and Quality Polish"
33-updated: 2026-03-22
44----
55-66-# Phase 4 — Ranking and Quality Polish
77-88-Improve search quality without changing the core architecture.
99-1010-## M11 — Ranking and Quality Polish
1111-1212-refs: [specs/05-search.md](../specs/05-search.md)
1313-1414-### Deliverables
1515-1616-- Boosted field weighting refinement
1717-- Recency boost
1818-- Collection-aware ranking
1919-- Better snippets/highlights
2020-- Issue/PR state filtering
2121-- Star count as ranking signal
2222-- Optional query analytics
2323-2424-### Tasks
2525-2626-- [ ] Tune FTS index weights based on real query results
2727-- [ ] Add small recency boost to ranking (e.g., decay function on `created_at`)
2828-- [ ] Add collection-aware ranking adjustments (repos ranked differently from comments)
2929-- [ ] Index `sh.tangled.repo.issue.comment` and `sh.tangled.repo.pull.comment` (P2 collections)
3030-- [ ] Aggregate `sh.tangled.feed.star` counts per repo and use as ranking signal
3131-- [ ] Implement `state` filter (open/closed/merged) using `record_state` table
3232-- [ ] Improve snippets: better truncation, multi-field highlights
3333-- [ ] Add curated relevance test fixtures (expected queries → expected top results)
3434-- [ ] Run `OPTIMIZE INDEX idx_documents_fts` as maintenance task
3535-- [ ] Optional: log queries for analytics (anonymized)
3636-3737-### Verification
3838-3939-- [ ] Exact repo lookups reliably rank the repo first
4040-- [ ] Recent active content gets a reasonable small boost without overwhelming exact relevance
4141-- [ ] Snippets show useful matched context
4242-- [ ] Ranking regression tests catch obvious degradations
4343-- [ ] State filter correctly excludes closed/merged items when requested
4444-4545-### Exit Criteria
4646-4747-Search quality is noticeably improved and more predictable.
-80
docs/app/specs/README.md
···11-# Twisted — Tangled Mobile Companion
22-33-A mobile-first Tangled client for iOS, Android, and web. Built with Ionic Vue, Capacitor, and the `@atcute` AT Protocol client stack.
44-55-## What is Tangled
66-77-[Tangled](https://tangled.org) is a Git hosting and collaboration platform built on the [AT Protocol](https://atproto.com). Identity, social graph (follows, stars, reactions), repos, issues, and PRs are all AT Protocol records stored on users' Personal Data Servers. Git hosting runs on **knots** — headless servers exposing XRPC APIs. The **appview** at `tangled.org` aggregates and renders the network view.
88-99-- Docs: <https://docs.tangled.org>
1010-- Lexicon namespace: `sh.tangled.*`
1111-- Source: <https://tangled.org/tangled.org/core>
1212-1313-## What Twisted Does
1414-1515-**Reader and social companion** for Tangled. Focused on direct browsing, indexed discovery, and lightweight interactions.
1616-1717-- Browse repos, files, READMEs, issues, PRs
1818-- Jump to profiles and repos from a known AT Protocol handle
1919-- Search indexed repos and profiles through the Twister API
2020-- Use index-backed graph summaries where the public API is incomplete
2121-- Sign in via AT Protocol OAuth
2222-- Star repos, follow users, react to content
2323-- Offline-capable with cached data
2424-2525-Out of scope: repo creation, git push/pull, CI/CD, full code review authoring.
2626-2727-## Technology
2828-2929-| Layer | Choice |
3030-| ----------- | --------------------------------------------------------------------------------------------- |
3131-| Framework | Vue 3 + TypeScript |
3232-| UI | Ionic Vue |
3333-| Native | Capacitor (iOS, Android, Web) |
3434-| State | Pinia |
3535-| Async data | TanStack Query (Vue) |
3636-| AT Protocol | `@atcute/client` (XRPC), `@atcute/oauth-browser-client` (OAuth), `@atcute/tangled` (lexicons) |
3737-3838-## Architecture
3939-4040-Three layers, strict dependency direction (presentation → domain → data):
4141-4242-**Presentation** — Ionic pages, Vue components, composables, Pinia stores.
4343-**Domain** — Normalized models (`UserSummary`, `RepoDetail`, `ActivityItem`, etc.), action policies, pagination.
4444-**Data** — `@atcute/client` XRPC calls, `@atcute/tangled` type definitions, local cache, and the Twister API for search/index-backed summaries.
4545-4646-Protocol isolation: no Vue component imports `@atcute/*` directly. All API access flows through `src/services/`.
4747-4848-## Tangled API Surface
4949-5050-Three distinct data hosts:
5151-5252-| Host | Protocol | Data |
5353-| ---------------------------------- | ---------------------------------- | ----------------------------------------------------------------- |
5454-| Knots (`us-west.tangled.sh`, etc.) | XRPC at `/xrpc/sh.tangled.*` | Git data: trees, blobs, commits, branches, diffs, tags |
5555-| User's PDS | XRPC at `/xrpc/com.atproto.repo.*` | AT Protocol records: repos, issues, PRs, stars, follows, profiles |
5656-| Twister API | HTTP JSON | Global search and index-backed graph/profile summaries |
5757-5858-The appview (`tangled.org`) serves HTML — it's the web UI, not a JSON API. The mobile client talks to knots and PDS servers directly for canonical detail and uses the Twister API for cross-network discovery.
5959-6060-Repo param format: `did:plc:xxx/repoName`.
6161-6262-## Phases
6363-6464-| Phase | Focus | Spec | Tasks |
6565-| ----- | ------------------------------------------------------------------------ | ------------------------------------ | ------------------------------------ |
6666-| 1 | Project shell, tabs, mock data, design system | [phase-1.md](phase-1.md) | [../tasks/phase-1.md](../tasks/phase-1.md) |
6767-| 2 | Public browsing — repos, files, profiles, issues, PRs | [phase-2.md](phase-2.md) | [../tasks/phase-2.md](../tasks/phase-2.md) |
6868-| 3 | Index-backed search and handle-first public browsing | [phase-3.md](phase-3.md) | [../tasks/phase-3.md](../tasks/phase-3.md) |
6969-| 4 | OAuth sign-in, star, follow, react, personalized feed | [phase-4.md](phase-4.md) | [../tasks/phase-4.md](../tasks/phase-4.md) |
7070-| 5 | Offline persistence, performance, bundle optimization | [phase-5.md](phase-5.md) | [../tasks/phase-5.md](../tasks/phase-5.md) |
7171-| 6 | Write features, project service integration, push notifications | [phase-6.md](phase-6.md) | [../tasks/phase-6.md](../tasks/phase-6.md) |
7272-| 7 | Real-time Jetstream feed, custom feeds, forking, labels, interdiff | [phase-7.md](phase-7.md) | [../tasks/phase-7.md](../tasks/phase-7.md) |
7373-7474-## Key Design Decisions
7575-7676-1. **`@atcute` end-to-end** for all AT Protocol interaction — no mixing client stacks.
7777-2. **Tangled lexicon handling in one module boundary** (`src/services/tangled/`) — don't scatter `sh.tangled.*` awareness across pages.
7878-3. **Read-first** — the primary product is a fast reader. Social mutations are a controlled second layer.
7979-4. **Use the project API sparingly and intentionally.** Search and index-backed graph gaps belong there; canonical repo detail stays on Tangled's public APIs.
8080-5. **Mobile-first, not desktop-forge-first** — prioritize readability, direct browsing, and small focused actions before broader discovery surfaces.
-180
docs/app/specs/phase-1.md
···11-# Phase 1 — Project Shell & Design System
22-33-## Goal
44-55-Scaffold the Ionic Vue project with tab navigation, placeholder pages, mock data, and reusable UI primitives. Nothing touches the network. The result is a clickable prototype that validates navigation, layout, and component design before any API integration.
66-77-## Technology Stack
88-99-| Layer | Choice |
1010-| -------------- | ----------------------- |
1111-| Framework | Vue 3 + TypeScript |
1212-| UI kit | Ionic Vue |
1313-| Native runtime | Capacitor |
1414-| State | Pinia |
1515-| Async data | TanStack Query (Vue) |
1616-| Routing | Vue Router (Ionic tabs) |
1717-1818-## Navigation Structure
1919-2020-Five-tab layout:
2121-2222-1. **Home** — trending repos, recent activity, personalized content (auth)
2323-2. **Explore** — search repos/users, filters
2424-3. **Repo** — deep-link target for repository detail (not a persistent tab icon — navigated to from Home/Explore/Activity)
2525-4. **Activity** — global feed (anon), social graph feed (auth)
2626-5. **Profile** — auth state, user card, follows, starred repos, settings
2727-2828-> Repo is a routed detail destination, not a standing tab. The tab bar shows Home, Explore, Activity, Profile. Repo pages are pushed onto the Home/Explore/Activity stacks.
2929-3030-## Directory Layout
3131-3232-```sh
3333-src/
3434- app/
3535- router/ # route definitions, tab guards
3636- boot/ # app-level setup (query client, plugins)
3737- providers/ # provide/inject wrappers
3838- core/
3939- config/ # env, feature flags
4040- errors/ # error types and normalization
4141- storage/ # storage abstraction (IndexedDB / Capacitor Secure Storage)
4242- query/ # TanStack Query client config, persister setup
4343- auth/ # auth state machine, session store
4444- services/
4545- atproto/ # @atcute/client wrapper, identity helpers
4646- tangled/ # Tangled API: endpoints, adapters, normalizers, queries, mutations
4747- domain/
4848- models/ # UserSummary, RepoSummary, RepoDetail, etc.
4949- feed/ # feed-specific types and helpers
5050- repo/ # repo-specific types and helpers
5151- profile/ # profile-specific types and helpers
5252- features/
5353- home/
5454- explore/
5555- repo/
5656- activity/
5757- profile/
5858- components/
5959- common/ # cards, buttons, loaders, empty states, error boundaries
6060- repo/ # repo card, file tree item, README viewer
6161- feed/ # activity card, feed list
6262- profile/ # user card, follow button
6363-```
6464-6565-## Domain Models
6666-6767-```ts
6868-export type UserSummary = {
6969- did: string;
7070- handle: string;
7171- displayName?: string;
7272- avatar?: string;
7373- bio?: string;
7474- followerCount?: number;
7575- followingCount?: number;
7676-};
7777-7878-export type RepoSummary = {
7979- atUri: string;
8080- ownerDid: string;
8181- ownerHandle: string;
8282- name: string;
8383- description?: string;
8484- primaryLanguage?: string;
8585- stars?: number;
8686- forks?: number;
8787- updatedAt?: string;
8888- knot: string;
8989-};
9090-9191-export type RepoDetail = RepoSummary & {
9292- readme?: string;
9393- defaultBranch?: string;
9494- languages?: Record<string, number>;
9595- collaborators?: UserSummary[];
9696- topics?: string[];
9797-};
9898-9999-export type RepoFile = {
100100- path: string;
101101- name: string;
102102- type: "file" | "dir" | "submodule";
103103- size?: number;
104104- lastCommitMessage?: string;
105105-};
106106-107107-export type PullRequestSummary = {
108108- atUri: string;
109109- title: string;
110110- authorDid: string;
111111- authorHandle: string;
112112- status: "open" | "merged" | "closed";
113113- createdAt: string;
114114- updatedAt?: string;
115115- sourceBranch: string;
116116- targetBranch: string;
117117- roundCount?: number;
118118-};
119119-120120-export type IssueSummary = {
121121- atUri: string;
122122- title: string;
123123- authorDid: string;
124124- authorHandle: string;
125125- state: "open" | "closed";
126126- createdAt: string;
127127- commentCount?: number;
128128-};
129129-130130-export type ActivityItem = {
131131- id: string;
132132- kind:
133133- | "repo_created"
134134- | "repo_starred"
135135- | "user_followed"
136136- | "pr_opened"
137137- | "pr_merged"
138138- | "issue_opened"
139139- | "issue_closed";
140140- actorDid: string;
141141- actorHandle: string;
142142- targetUri?: string;
143143- targetName?: string;
144144- createdAt: string;
145145-};
146146-```
147147-148148-## Repo Detail Page Structure
149149-150150-Segmented tab layout within the repo detail view:
151151-152152-| Segment | Content |
153153-| -------- | ------------------------------------------------------------------------------------ |
154154-| Overview | owner/repo header, description, topics, social action buttons, README preview, stats |
155155-| Files | directory tree, file viewer (syntax-highlighted) |
156156-| Issues | issue list with state filters |
157157-| PRs | pull request list with status filters |
158158-159159-## Design System Primitives
160160-161161-Build these reusable components during this phase:
162162-163163-- **RepoCard** — compact repo summary for lists
164164-- **UserCard** — avatar + handle + bio snippet
165165-- **ActivityCard** — icon + actor + verb + target + timestamp
166166-- **FileTreeItem** — icon (file/dir) + name + last commit message
167167-- **EmptyState** — icon + message + optional action button
168168-- **ErrorBoundary** — catch + retry UI
169169-- **SkeletonLoader** — content placeholder shimmer for each card type
170170-- **MarkdownRenderer** — render README content (Phase 2 will wire to real data)
171171-172172-## Mock Data
173173-174174-Create `src/mocks/` with factory functions returning typed domain models. All Phase 1 screens render from these factories. Mock data must be realistic — use real-looking handles (`alice.tngl.sh`), repo names, and timestamps.
175175-176176-## Performance Targets
177177-178178-- Shell first-paint under 2s on mid-range device
179179-- Tab switches feel instant (no layout shift)
180180-- Skeleton loaders shown within 100ms of navigation
-120
docs/app/specs/phase-2.md
···11-# Phase 2 — Public Tangled Browsing
22-33-## Goal
44-55-Replace mock data on the shippable public-browsing surface with live Tangled API calls. Users can browse repos, profiles, file trees, README content, issues, and pull requests without signing in. Public entry points are intentionally scoped down for now: Home is a known-handle jump surface, while Explore and Activity remain clearly labeled placeholders until their dedicated work lands.
66-77-## Protocol Stack
88-99-| Package | Version | Role |
1010-| ----------------- | ------- | ---------------------------------------------- |
1111-| `@atcute/client` | ^4.2.1 | XRPC HTTP client — `query()` and `procedure()` |
1212-| `@atcute/tangled` | ^1.0.17 | `sh.tangled.*` lexicon type definitions |
1313-1414-All protocol access goes through `src/services/tangled/`. No Vue component may import `@atcute/*` directly.
1515-1616-## Architecture: Protocol Isolation
1717-1818-```sh
1919-Vue component
2020- → composable (useRepoDetail, useFileTree, ...)
2121- → TanStack Query hook
2222- → service function (services/tangled/queries.ts)
2323- → @atcute/client XRPC call
2424- → normalizer (services/tangled/normalizers.ts)
2525- → domain model
2626-```
2727-2828-### Service Layer Responsibilities
2929-3030-**`services/atproto/client.ts`** — singleton `XRPC` client instance, base URL config, error interceptor.
3131-3232-**`services/tangled/endpoints.ts`** — typed wrappers around XRPC queries:
3333-3434-| Endpoint | Params | Returns |
3535-| ---------------------------------- | ------------------------------------------- | ------------------- |
3636-| `sh.tangled.repo.tree` | `repo: did:plc:xxx/name`, `ref`, `path?` | directory listing |
3737-| `sh.tangled.repo.blob` | `repo`, `ref`, `path` | file content |
3838-| `sh.tangled.repo.log` | `repo`, `ref`, `path?`, `limit?`, `cursor?` | commit history |
3939-| `sh.tangled.repo.branches` | `repo`, `limit?`, `cursor?` | branch list |
4040-| `sh.tangled.repo.tags` | `repo` | tag list |
4141-| `sh.tangled.repo.getDefaultBranch` | `repo` | default branch name |
4242-| `sh.tangled.repo.diff` | `repo`, `ref` | diff output |
4343-| `sh.tangled.repo.compare` | `repo`, `rev1`, `rev2` | comparison |
4444-| `sh.tangled.repo.languages` | `repo` | language breakdown |
4545-4646-The `repo` param format is `did:plc:xxx/repoName`. The XRPC calls go to the repo's **knot** hostname (e.g., `us-west.tangled.sh`), not to `tangled.org`.
4747-4848-**`services/tangled/normalizers.ts`** — transform raw lexicon responses into domain models (`RepoSummary`, `RepoDetail`, `RepoFile`, etc.).
4949-5050-**`services/tangled/queries.ts`** — TanStack Query wrapper functions with cache keys, stale times, and error handling.
5151-5252-## Appview vs Knot Routing
5353-5454-Tangled has two API surfaces:
5555-5656-| Surface | Host | Protocol | Used for |
5757-| ------- | -------------------------- | --------------------------- | ------------------------------------------------- |
5858-| Appview | `tangled.org` | HTTP (HTML, HTMX) | Profile pages, repo listings, timeline, search |
5959-| Knots | `us-west.tangled.sh`, etc. | XRPC (`/xrpc/sh.tangled.*`) | Git data — trees, blobs, commits, branches, diffs |
6060-6161-For Phase 2, git data comes from knots via XRPC. Profile and repo metadata come from PDS records queried through `com.atproto.repo.getRecord` and `com.atproto.repo.listRecords`, not from the HTML appview. The service layer must route requests to the correct host based on the operation.
6262-6363-## Features
6464-6565-### Repository Browsing
6666-6767-- List repos for a user (from their PDS records or appview)
6868-- Repo overview: metadata, description, topics, default branch, language stats
6969-- README rendering: fetch blob for `README.md` from default branch, render markdown
7070-- File tree: navigate directories, open files
7171-- File viewer: syntax-highlighted source display
7272-- Commit log: paginated history for a ref/path
7373-- Branch list with default branch indicator
7474-7575-### Profile Browsing
7676-7777-- View user profile: avatar, bio, links, pronouns, location, pinned repos
7878-- Profile data comes from `sh.tangled.actor.profile` record (key: `self`) on the user's PDS
7979-- List user's repos
8080-8181-### Public Discovery (scoped down)
8282-8383-- Home acts as the temporary public entry point: enter a known AT Protocol handle, then jump to profile or browse that handle's repos
8484-- Explore remains visible as a placeholder for future search work, but should not pretend global search already exists
8585-- Activity remains visible as a placeholder for future feed work, but should not pretend a public timeline already exists
8686-- Unsupported global search/trending behavior should be omitted or clearly labeled as future work, never filled with silent mock data
8787-8888-### Pull Requests (read-only)
8989-9090-- List PRs for a repo with status filter (open/closed/merged)
9191-- PR detail: title, body, author, source/target branches, round count
9292-- PR comments list
9393-9494-### Issues (read-only)
9595-9696-- List issues for a repo with state filter (open/closed)
9797-- Issue detail: title, body, author
9898-- Issue comments (threaded — `replyTo` field)
9999-100100-## Caching Strategy
101101-102102-| Data | Stale time | Cache time |
103103-| ------------- | ---------- | ---------- |
104104-| Repo metadata | 5 min | 30 min |
105105-| File tree | 2 min | 10 min |
106106-| File content | 5 min | 30 min |
107107-| Commit log | 2 min | 10 min |
108108-| Profile | 10 min | 60 min |
109109-| README | 5 min | 30 min |
110110-111111-Use TanStack Query's `staleTime` and `gcTime`. Add a query persister (IndexedDB-backed) for offline reads.
112112-113113-## Error Handling
114114-115115-Normalize these failure modes at the service layer:
116116-117117-- Network unreachable → offline banner, serve from cache
118118-- 404 from knot → "Repository not found" or "File not found"
119119-- XRPC error responses → map to typed app errors
120120-- Malformed response → log + generic error state
-68
docs/app/specs/phase-3.md
···11-# Phase 3 — Indexed Search and Honest Discovery
22-33-## Goal
44-55-Introduce global discovery through the Twister project index while preserving honest product boundaries. Home continues to support direct known-handle browsing, Explore becomes index-backed search, and Activity remains a clearly labeled in-progress surface.
66-77-## Current Product Shape
88-99-### Home
1010-1111-Home is the temporary public entry point for unauthenticated browsing:
1212-1313-- Enter a known AT Protocol handle
1414-- Open that user's profile directly
1515-- Resolve the handle to DID + PDS via AT Protocol identity
1616-- List that user's public Tangled repos inline and open one directly
1717-1818-This keeps public browsing fully real while still giving the app a lightweight direct-entry path.
1919-2020-### Explore
2121-2222-Explore becomes the network-level discovery surface:
2323-2424-- Global repo search via the Twister index
2525-- Global profile search via the Twister index
2626-- Empty state should clearly distinguish "index unavailable" from "no results"
2727-- Search results route into the existing profile and repo detail screens
2828-2929-### Activity
3030-3131-Activity also remains a tab-level placeholder:
3232-3333-- No public timeline yet
3434-- No curated public feed fallback
3535-- Empty state should explicitly say activity is in progress
3636-3737-## Identity and Routing
3838-3939-The app now uses two read paths:
4040-4141-1. **Direct handle browsing**
4242- Resolve `handle -> DID` via `com.atproto.identity.resolveHandle`
4343- Fetch the DID document and extract the PDS endpoint
4444- Query the user's PDS for `sh.tangled.repo` records via `com.atproto.repo.listRecords`
4545-2. **Indexed discovery**
4646- Query the Twister API for global search results
4747- Open the selected profile or repo in the existing screens
4848- Continue detail fetching from Tangled's public APIs
4949-5050-The Twister API is additive, not authoritative for repo detail. It fills discovery and graph gaps; knots and PDSes remain the source of truth for detail screens.
5151-5252-## UI Expectations
5353-5454-- Home shows one handle input plus explicit actions for profile jump and repo browsing
5555-- Home shows loading, invalid-handle, no-repos, and resolved-repo-list states
5656-- Explore shows a working search form, loading state, index-unavailable state, and no-results state
5757-- Activity shows a static in-progress empty state
5858-- Profile may show index-backed follower/following summaries when available
5959-6060-## Deferred Work
6161-6262-The following work is intentionally deferred out of this phase:
6363-6464-- Trending or suggested discovery sections
6565-- Public activity feed ingestion, pagination, and caching
6666-- Jetstream or appview timeline investigation
6767-6868-These capabilities will be revisited after the baseline search and graph-summary integration is stable.
-167
docs/app/specs/phase-4.md
···11-# Phase 4 — OAuth & Social Features
22-33-## Goal
44-55-Add AT Protocol OAuth sign-in and authenticated social actions: follow, star, react. Signed-in users get a personalized feed.
66-77-## Authentication
88-99-### Package
1010-1111-`@atcute/oauth-browser-client` ^3.0.0 — minimal browser OAuth client for AT Protocol.
1212-1313-### OAuth Flow
1414-1515-1. User enters handle or DID
1616-2. Resolve handle → DID → PDS → authorization server metadata
1717-3. Initiate OAuth with PKCE + DPoP (P-256)
1818-4. Redirect to authorization server
1919-5. Callback with auth code
2020-6. Exchange code for access + refresh tokens
2121-7. Store session, bind to XRPC client
2222-2323-### Key Functions
2424-2525-| Function | Purpose |
2626-| -------------------------- | ------------------------------------------------------------ |
2727-| `configureOAuth(opts)` | One-time setup: client metadata URL, redirect URI |
2828-| `getSession(did)` | Resume existing session (returns `Session` with `dpopFetch`) |
2929-| `listStoredSessions()` | List all stored accounts |
3030-| `deleteStoredSession(did)` | Remove stored session |
3131-3232-### Session Object
3333-3434-A `Session` provides:
3535-3636-- `did` — authenticated user's DID
3737-- `dpopFetch` — a `fetch` wrapper that auto-attaches DPoP + access token headers
3838-- Token refresh is handled internally
3939-4040-### Client Metadata
4141-4242-The mobile app needs its own OAuth client metadata hosted at a public URL:
4343-4444-```json
4545-{
4646- "client_id": "https://your-app-domain/oauth/client-metadata.json",
4747- "client_name": "Twisted",
4848- "client_uri": "https://your-app-domain",
4949- "redirect_uris": ["https://your-app-domain/oauth/callback"],
5050- "grant_types": ["authorization_code", "refresh_token"],
5151- "response_types": ["code"],
5252- "token_endpoint_auth_method": "none",
5353- "application_type": "web",
5454- "dpop_bound_access_tokens": true,
5555- "scope": "atproto repo:sh.tangled.graph.follow repo:sh.tangled.feed.star repo:sh.tangled.feed.reaction repo:sh.tangled.actor.profile"
5656-}
5757-```
5858-5959-Request only the scopes needed for Phase 4 social features. Expand scopes in later phases as write features are added.
6060-6161-### Capacitor Considerations
6262-6363-- Web: standard redirect flow works
6464-- iOS/Android via Capacitor: use `App.addListener('appUrlOpen')` to capture the OAuth callback via deep link or custom URL scheme
6565-- Session storage: abstract behind `core/storage/` — use `localStorage` on web, Capacitor Secure Storage plugin on native
6666-6767-### Auth State Machine
6868-6969-```sh
7070-idle → authenticating → authenticated
7171- → error
7272-authenticated → refreshing → authenticated
7373- → expired → idle
7474-authenticated → logging_out → idle
7575-```
7676-7777-Store in Pinia (`core/auth/`). Expose via `useAuth()` composable.
7878-7979-## Social Actions
8080-8181-All social actions create or delete AT Protocol records on the user's PDS via the XRPC `com.atproto.repo.createRecord` / `com.atproto.repo.deleteRecord` procedures. The `dpopFetch` from the session handles auth.
8282-8383-### Star a Repo
8484-8585-Create record:
8686-8787-```json
8888-{
8989- "repo": "did:plc:user",
9090- "collection": "sh.tangled.feed.star",
9191- "record": {
9292- "$type": "sh.tangled.feed.star",
9393- "subject": "at://did:plc:owner/sh.tangled.repo/tid",
9494- "createdAt": "2026-03-22T00:00:00Z"
9595- }
9696-}
9797-```
9898-9999-Unstar: delete the record by its `rkey`.
100100-101101-### Follow a User
102102-103103-Create record:
104104-105105-```json
106106-{
107107- "repo": "did:plc:user",
108108- "collection": "sh.tangled.graph.follow",
109109- "record": {
110110- "$type": "sh.tangled.graph.follow",
111111- "subject": "did:plc:target",
112112- "createdAt": "2026-03-22T00:00:00Z"
113113- }
114114-}
115115-```
116116-117117-Unfollow: delete the record by its `rkey`.
118118-119119-### React to Content
120120-121121-Create record:
122122-123123-```json
124124-{
125125- "repo": "did:plc:user",
126126- "collection": "sh.tangled.feed.reaction",
127127- "record": {
128128- "$type": "sh.tangled.feed.reaction",
129129- "subject": "at://did:plc:owner/sh.tangled.repo.pull/tid",
130130- "reaction": "thumbsup",
131131- "createdAt": "2026-03-22T00:00:00Z"
132132- }
133133-}
134134-```
135135-136136-Available reactions: `thumbsup`, `thumbsdown`, `laugh`, `tada`, `confused`, `heart`, `rocket`, `eyes`.
137137-138138-### Optimistic Updates
139139-140140-All mutations use TanStack Query's `useMutation` with optimistic updates:
141141-142142-1. Immediately update the cache (star count +1, follow state toggled)
143143-2. Fire the mutation
144144-3. On error, roll back the cache and show a toast
145145-146146-## Personalized Feed
147147-148148-When signed in, the Activity tab shows a filtered feed based on:
149149-150150-- Users the signed-in user follows
151151-- Repos the signed-in user has starred
152152-153153-Implementation depends on what the appview provides. If no personalized endpoint exists, filter the global feed client-side based on the user's follow/star records.
154154-155155-## Profile Tab (Authenticated)
156156-157157-When signed in, the Profile tab shows:
158158-159159-- User's avatar, handle, bio, location, pronouns, links
160160-- Pinned repos
161161-- Stats (selected from: merged PRs, open PRs, open issues, repo count, star count)
162162-- Starred repos list
163163-- Following/followers lists
164164-- Edit profile (avatar, bio, links, pinned repos)
165165-- Settings
166166-- Logout
167167-- Account switcher (multiple account support via `listStoredSessions`)
-73
docs/app/specs/phase-5.md
···11-# Phase 5 — Offline & Performance Polish
22-33-## Goal
44-55-Make the app feel native. Cached data loads instantly, offline mode is graceful, and navigation is smooth on mid-range devices.
66-77-## Offline Strategy
88-99-### Query Persistence
1010-1111-Use TanStack Query's `persistQueryClient` with an IndexedDB adapter:
1212-1313-- Persist all query cache to IndexedDB on each update (debounced)
1414-- On app launch, hydrate TanStack Query cache from IndexedDB before rendering
1515-- Stale-while-revalidate: show persisted data immediately, refresh in background
1616-1717-### What to Persist
1818-1919-| Data | Max cached items | TTL |
2020-| ------------------------------ | ---------------- | ------ |
2121-| Repo metadata | 200 | 7 days |
2222-| File trees | 50 | 3 days |
2323-| File content (recently viewed) | 100 | 3 days |
2424-| README content | 100 | 7 days |
2525-| User profiles | 100 | 7 days |
2626-| Activity feed pages | 10 pages | 1 day |
2727-| Search results | 20 queries | 1 day |
2828-2929-### Offline Detection
3030-3131-- Listen to `navigator.onLine` + `online`/`offline` events
3232-- Show a persistent banner when offline: "You're offline — showing cached data"
3333-- Disable mutation buttons (star, follow) when offline
3434-- Queue mutations for retry when back online (optional, simple queue)
3535-3636-### Sensitive Data
3737-3838-- Auth tokens: Capacitor Secure Storage on native, encrypted `localStorage` wrapper on web
3939-- Never persist tokens in IndexedDB alongside query cache
4040-- Clear auth storage on logout
4141-4242-## Performance Optimizations
4343-4444-### Navigation
4545-4646-- Prefetch repo detail data on repo card hover/long-press
4747-- Keep previous tab's scroll position and data in memory (Ionic's `ion-router-outlet` + `keep-alive`)
4848-- Use `<ion-virtual-scroll>` or a virtualized list for long lists (repos, activity feed)
4949-5050-### Images
5151-5252-- Lazy-load avatars with `loading="lazy"` or Intersection Observer
5353-- Use `avatar.tangled.sh` CDN URLs with size params if available
5454-- Placeholder avatar component with initials fallback
5555-5656-### Bundle
5757-5858-- Route-level code splitting per feature folder
5959-- Tree-shake unused Ionic components
6060-- Measure and optimize with Lighthouse
6161-6262-### Rendering
6363-6464-- Skeleton screens for every data-driven view (already built in Phase 1)
6565-- Debounce search input (already in Phase 3)
6666-- Throttle scroll-based pagination triggers
6767-6868-## Testing Focus
6969-7070-- Offline → online transition: verify data refreshes without duplicates
7171-- Large repo file trees: ensure virtual scroll handles 1000+ items
7272-- Low-bandwidth simulation: verify skeleton → content transitions
7373-- Memory pressure: verify cache eviction works and app doesn't grow unbounded
-75
docs/app/specs/phase-6.md
···11-# Phase 6 — Write Features & Project Services
22-33-## Goal
44-55-Add authenticated write operations (create issues, comment on PRs/issues, edit profile) and extend the Twister project services only where the client should not or cannot do the work directly.
66-77-## Why Project Services
88-99-Some operations are awkward or unsafe from a browser client:
1010-1111-- **Token hardening**: DPoP keys in browser storage are less secure than server-held credentials
1212-- **Unstable procedures**: Tangled's API may change — a backend adapter isolates the mobile client from churn
1313-- **Push notifications**: require server-side registration and delivery
1414-- **Personalized feeds**: server-side aggregation is more efficient than client-side filtering
1515-- **Graph gaps**: follower lists/counts and other cross-network summaries may require index-backed derivation
1616-- **Rate limiting**: backend can batch and deduplicate requests
1717-1818-### Service Scope
1919-2020-Thin service layer — not a replacement for Tangled's public APIs. Use it for cross-network aggregation, search, notifications, and operations the SPA should not own.
2121-2222-| Endpoint | Purpose |
2323-| ------------------------------------ | --------------------------------------------------- |
2424-| `POST /auth/session` | OAuth token exchange and session management |
2525-| `GET /feed/personalized` | Pre-filtered activity feed for the user |
2626-| `GET /search`, `GET /profiles/:did/summary` | Search and index-backed graph/profile summaries |
2727-| `POST /notifications/register` | Push notification device registration |
2828-| Passthrough for stable XRPC calls | Avoid duplicating what the client already does well |
2929-3030-## Write Features
3131-3232-### Create Issue
3333-3434-- Screen: issue creation form within repo detail
3535-- Fields: title (required), body (markdown), mentions
3636-- Creates `sh.tangled.repo.issue` record on user's PDS
3737-- Optimistic: add to local issue list, remove on failure
3838-3939-### Comment on Issue / PR
4040-4141-- Screen: comment input at bottom of issue/PR detail
4242-- Creates `sh.tangled.repo.issue.comment` or `sh.tangled.repo.pull.comment` record
4343-- Supports `replyTo` for threaded issue comments
4444-- Supports `mentions` (DID array) and `references` (AT-URI array)
4545-4646-### Edit Profile
4747-4848-- Screen: profile edit form
4949-- Updates `sh.tangled.actor.profile` record (key: `self`)
5050-- Fields: avatar (image upload, max 1MB, png/jpeg), bio (max 256 graphemes), links (max 5 URIs), location (max 40 graphemes), pronouns (max 40 chars), pinned repos (max 6 AT-URIs), display stats (max 2 from: merged-pr-count, closed-pr-count, open-pr-count, open-issue-count, closed-issue-count, repo-count, star-count), bluesky cross-posting toggle
5151-5252-### Issue State Management
5353-5454-- Close/reopen issues by creating `sh.tangled.repo.issue.state` records
5555-- State values: `sh.tangled.repo.issue.state.open`, `sh.tangled.repo.issue.state.closed`
5656-5757-## Push Notifications
5858-5959-- Register device token with project services
6060-- Project services subscribe to Jetstream or indexed events relevant to the user
6161-- Deliver via APNs (iOS) / FCM (Android)
6262-- Notification types: PR activity on your repos, issue comments, new followers, stars
6363-6464-## Expanded OAuth Scopes
6565-6666-Phase 6 requires additional scopes beyond Phase 4:
6767-6868-```sh
6969-repo:sh.tangled.repo.issue
7070-repo:sh.tangled.repo.issue.comment
7171-repo:sh.tangled.repo.issue.state
7272-repo:sh.tangled.repo.pull.comment
7373-```
7474-7575-Handle scope upgrades gracefully — re-authorize if the user's existing session lacks required scopes.
-85
docs/app/specs/phase-7.md
···11-# Phase 7 — Real-Time Feed & Advanced Features
22-33-## Goal
44-55-Add real-time event streaming, custom feed logic, and advanced social coding features. This phase makes the app feel alive.
66-77-## Jetstream Integration
88-99-### Package
1010-1111-`@atcute/jetstream` — subscribe to the AT Protocol event stream.
1212-1313-### Architecture
1414-1515-Connect to a Jetstream relay and filter for `sh.tangled.*` collections:
1616-1717-```sh
1818-Jetstream WebSocket
1919- → filter: sh.tangled.* events
2020- → normalize into ActivityItem
2121- → merge into TanStack Query feed cache
2222- → reactive UI update
2323-```
2424-2525-### Connection Management
2626-2727-- Connect on app foreground, disconnect on background
2828-- Reconnect with exponential backoff
2929-- Track cursor position for gap-fill on reconnect
3030-- Battery-aware: reduce polling frequency on low battery (Capacitor Battery API)
3131-3232-### Live Indicators
3333-3434-- Repo detail: show "new commits" banner when ref updates arrive
3535-- Activity feed: show "X new items" pill, tap to scroll to top and reveal
3636-- PR detail: live status updates (open → merged)
3737-3838-## Custom Feeds
3939-4040-Allow users to create saved feed configurations:
4141-4242-- "My repos" — activity on repos I own
4343-- "Watching" — activity on repos I starred
4444-- "Team" — activity from users I follow
4545-- Custom filters: by repo, by user, by event type
4646-4747-Feeds are stored locally in IndexedDB. If project services exist, they can optionally sync server-side for push notification filtering.
4848-4949-## Advanced Features
5050-5151-### Repo Forking
5252-5353-- Fork button on repo detail (requires `rpc:sh.tangled.repo.create` scope)
5454-- Fork status indicator via `sh.tangled.repo.forkStatus` (up-to-date, fast-forwardable, conflict, missing branch)
5555-- Sync fork via `sh.tangled.repo.forkSync`
5656-5757-### Label Support
5858-5959-- Display labels on issues and PRs
6060-- Apply/remove labels (requires label scopes)
6161-- Color-coded label chips
6262-6363-### Reaction Picker
6464-6565-- Expand reaction support beyond star/follow
6666-- Emoji picker for: thumbsup, thumbsdown, laugh, tada, confused, heart, rocket, eyes
6767-- Show reaction counts on PRs, issues, comments
6868-6969-### PR Interdiff
7070-7171-- View diff between PR rounds (round N vs round N+1)
7272-- Useful for code review on mobile — see what changed since last review
7373-7474-### Knot Information
7575-7676-- Show which knot hosts a repo
7777-- Knot version and status
7878-- Useful for debugging and transparency
7979-8080-## Testing
8181-8282-- WebSocket reliability under network transitions (WiFi → cellular)
8383-- Feed deduplication when Jetstream replays events
8484-- Memory usage with long-running WebSocket connections
8585-- Battery impact measurement on mobile
-64
docs/app/tasks/phase-1.md
···11-# Phase 1 Tasks — Project Shell & Design System
22-33-## Scaffold
44-55-- [x] Create Ionic Vue project with TypeScript (`ionic start twisted tabs --type vue`)
66-- [x] Configure Capacitor for iOS and Android
77-- [x] Set up path aliases (`@/` → `src/`)
88-- [x] Install and configure Pinia
99-- [x] Install and configure TanStack Query for Vue
1010-- [x] Create the directory structure per spec (`app/`, `core/`, `services/`, `domain/`, `features/`, `components/`)
1111-1212-## Routing & Navigation
1313-1414-- [x] Define five-tab layout: Home, Explore, Activity, Profile (visible tabs) + Repo (pushed route)
1515-- [x] Configure Vue Router with Ionic tab routing
1616-- [x] Add route definitions for all Phase 1 placeholder pages
1717-- [ ] Verify tab-to-tab navigation preserves scroll position and component state
1818-1919-## Domain Models
2020-2121-- [x] Create `domain/models/user.ts` — `UserSummary` type
2222-- [x] Create `domain/models/repo.ts` — `RepoSummary`, `RepoDetail`, `RepoFile` types
2323-- [x] Create `domain/models/pull-request.ts` — `PullRequestSummary` type
2424-- [x] Create `domain/models/issue.ts` — `IssueSummary` type
2525-- [x] Create `domain/models/activity.ts` — `ActivityItem` type
2626-2727-## Mock Data
2828-2929-- [x] Use realistic data: fetch `desertthunder.dev` to create mock data for repo names, timestamps within last 30 days
3030-- [x] Create `src/mocks/users.ts` — factory for `UserSummary` instances
3131-- [x] Create `src/mocks/repos.ts` — factory for `RepoSummary` and `RepoDetail` instances
3232-- [x] Create `src/mocks/pull-requests.ts` — factory for `PullRequestSummary` instances
3333-- [x] Create `src/mocks/issues.ts` — factory for `IssueSummary` instances
3434-- [x] Create `src/mocks/activity.ts` — factory for `ActivityItem` instances
3535-3636-## Design System Components
3737-3838-- [x] `components/common/RepoCard.vue` — compact repo summary (name, owner, description, language, stars)
3939-- [x] `components/common/UserCard.vue` — avatar + handle + bio snippet
4040-- [x] `components/common/ActivityCard.vue` — icon + actor + verb + target + relative timestamp
4141-- [x] `components/common/EmptyState.vue` — icon + message + optional action button
4242-- [x] `components/common/ErrorBoundary.vue` — catch errors, show retry UI
4343-- [x] `components/common/SkeletonLoader.vue` — shimmer placeholders (variants: card, list-item, profile)
4444-- [x] `components/repo/FileTreeItem.vue` — file/dir icon + name
4545-- [x] `components/repo/MarkdownRenderer.vue` — render markdown to HTML (stub with basic styling)
4646-4747-## Feature Pages (placeholder with mock data)
4848-4949-- [x] `features/home/HomePage.vue` — trending repos list, recent activity list
5050-- [x] `features/explore/ExplorePage.vue` — search bar (non-functional), repo/user tabs, repo list
5151-- [x] `features/repo/RepoDetailPage.vue` — segmented layout: Overview, Files, Issues, PRs
5252-- [x] `features/repo/RepoOverview.vue` — header, description, README placeholder, stats
5353-- [x] `features/repo/RepoFiles.vue` — file tree list from mock data
5454-- [x] `features/repo/RepoIssues.vue` — issue list from mock data
5555-- [x] `features/repo/RepoPRs.vue` — PR list from mock data
5656-- [x] `features/activity/ActivityPage.vue` — filter chips + activity card list
5757-- [x] `features/profile/ProfilePage.vue` — sign-in prompt (unauthenticated state)
5858-5959-## Quality
6060-6161-- [ ] Verify all pages render with skeleton loaders before mock data appears
6262-- [ ] Verify tab switches don't cause layout shift
6363-- [ ] Run Lighthouse on the web build — target first-paint under 2s
6464-- [ ] Verify iOS and Android builds compile and launch via Capacitor
-67
docs/app/tasks/phase-2.md
···11-# Phase 2 Tasks — Public Tangled Browsing
22-33-## Protocol Setup
44-55-- [x] Install `@atcute/client` and `@atcute/tangled`
66-- [x] Create `services/atproto/client.ts` — singleton XRPC client with configurable base URL
77-- [x] Add error interceptor that normalizes XRPC errors into typed app errors
88-- [x] Create `core/errors/tangled.ts` — error types: NotFound, NetworkError, MalformedResponse, RateLimited
99-1010-## API Validation
1111-1212-- [x] Probe `tangled.org` for JSON API endpoints — returns HTML only (no JSON API); all metadata via PDS
1313-- [x] Confirm knot XRPC endpoints work from browser (CORS check against knot) — `Access-Control-Allow-Origin: *` confirmed on `knot1.tangled.sh`; knot hostname comes from `sh.tangled.repo` PDS record, not a fixed host
1414-- [x] Document which data comes from knots vs appview vs PDS (see endpoints.ts header comment)
1515-- [x] Test `com.atproto.repo.getRecord` for fetching user profiles and repo records from PDS — confirmed working on `bsky.social`
1616-1717-## Service Layer
1818-1919-- [x] Create `services/tangled/endpoints.ts` — typed wrappers for each XRPC query
2020-- [x] Create `services/tangled/normalizers.ts` — transform raw responses → domain models
2121-- [x] Create `services/tangled/queries.ts` — TanStack Query hooks with cache keys and stale times
2222-- [x] Implement knot routing: determine correct knot hostname for a given repo
2323-2424-## Repository Browsing
2525-2626-- [x] Wire `RepoDetailPage` to live repo data (metadata from PDS record + git data from knot)
2727-- [x] Implement repo overview: description, topics, default branch, language breakdown
2828-- [x] Implement README fetch: `sh.tangled.repo.blob` for `README.md` on default branch
2929-- [x] Wire `MarkdownRenderer` to render real README content
3030-- [x] Implement file tree: `sh.tangled.repo.tree` → navigate directories
3131-- [x] Implement file viewer: `sh.tangled.repo.blob` → syntax-highlighted display
3232-- [x] Implement commit log: `sh.tangled.repo.log` with cursor pagination
3333-- [x] Implement branch list: `sh.tangled.repo.branches`
3434-3535-## Profile Browsing
3636-3737-- [x] Fetch user profile from PDS: `com.atproto.repo.getRecord` for `sh.tangled.actor.profile`
3838-- [x] Display profile: avatar (via `avatar.tangled.sh`), bio, links, location, pronouns, pinned repos
3939-- [x] List user's repos: fetch `sh.tangled.repo` records from user's PDS
4040-- [x] Wire `UserCard` component to real data
4141-4242-## Issues (read-only)
4343-4444-- [x] Fetch issues for a repo from PDS records (`listIssueRecords` + `listIssueStateRecords` from owner's PDS)
4545-- [x] Display issue list with state filter (open/closed)
4646-- [x] Issue detail view: title, body, author, state
4747-- [x] Issue comments: fetch `sh.tangled.repo.issue.comment` records, render threaded
4848-4949-## Pull Requests (read-only)
5050-5151-- [x] Fetch PRs for a repo from PDS records (`listPullRecords` + `listPullStatusRecords` from owner's PDS)
5252-- [x] Display PR list with status filter (open/closed/merged)
5353-- [x] PR detail view: title, body, author, source/target branches
5454-- [x] PR comments: fetch `sh.tangled.repo.pull.comment` records
5555-5656-## Caching
5757-5858-- [x] Configure TanStack Query stale/gc times per data type (see spec)
5959-- [x] Set up IndexedDB query persister for offline reads
6060-6161-## Quality
6262-6363-- [x] Replace default mock-backed Home/Explore/Activity surfaces with scoped-down curated live discovery/activity
6464-- [ ] Verify stale-while-revalidate behavior: cached data shows immediately, refreshes in background
6565-- [ ] Test with real Tangled repos (e.g., `tangled.org/core`)
6666-- [ ] Verify error states render correctly: 404, network failure, empty repos
6767-- [ ] Test on slow network (throttled devtools) — verify skeleton → content transition
-60
docs/app/tasks/phase-3.md
···11-# Phase 3 Tasks — Search & Activity Feed
22-33-## Search API Discovery
44-55-- [ ] Probe `tangled.org` for search endpoints (try `/search?q=`, `/api/search`, check network tab on live site)
66-- [ ] If JSON search exists, document the request/response format
77-- [ ] If no JSON search, decide: curated discovery now + backend search later, or HTML scraping
88-99-## Search Implementation
1010-1111-- [ ] Create `services/tangled/search.ts` — search service (real endpoint or fallback)
1212-- [ ] Implement debounced search input (300ms) in Explore tab
1313-- [ ] Implement segmented search results: Repos tab, Users tab
1414-- [ ] Implement search result rendering with `RepoCard` and `UserCard`
1515-- [ ] Implement empty search state with suggestions
1616-- [ ] Persist recent searches in local storage (max 20)
1717-- [ ] Clear search history action
1818-1919-## Discovery Sections (if search API unavailable)
2020-2121-- [ ] Implement "Trending repos" section (source TBD — may require appview scraping or curated list)
2222-- [ ] Implement "Recently created repos" section
2323-- [ ] Implement "Suggested users" section
2424-- [ ] Wire discovery sections into Home and Explore tabs
2525-2626-## Activity Feed — Data Source
2727-2828-- [ ] Investigate `tangled.org/timeline` for JSON variant (check with Accept headers)
2929-- [ ] If no JSON timeline, evaluate `@atcute/jetstream` for real-time feed
3030-- [ ] If neither works, implement polling-based feed from known users' PDS records
3131-- [ ] Document chosen approach and any limitations
3232-3333-## Activity Feed — Implementation
3434-3535-- [ ] Create `services/tangled/feed.ts` — feed data source
3636-- [ ] Create normalizer: raw AT Protocol events → `ActivityItem` domain model
3737-- [ ] Implement `ActivityPage` with real feed data
3838-- [ ] Implement filter chips: All, Repos, PRs, Issues, Social
3939-- [ ] Implement infinite scroll with cursor-based pagination
4040-- [ ] Implement pull-to-refresh
4141-- [ ] Implement tap-to-navigate: activity card → repo/profile/PR/issue detail
4242-4343-## Feed Caching
4444-4545-- [ ] Cache last 100 feed items in IndexedDB via query persister
4646-- [ ] Show cached feed immediately on tab switch
4747-- [ ] Stale time: 1 minute
4848-- [ ] Verify feed persists across app restarts
4949-5050-## Home Tab
5151-5252-- [ ] Wire Home tab to real data: trending repos + recent activity
5353-- [ ] Add "personalized" section placeholder (shows sign-in prompt when unauthenticated)
5454-5555-## Quality
5656-5757-- [ ] Test search with various queries — verify results are relevant
5858-- [ ] Test activity feed with pull-to-refresh and pagination
5959-- [ ] Test offline: cached feed shows, search degrades gracefully
6060-- [ ] Verify no duplicate items in feed after refresh
-84
docs/app/tasks/phase-4.md
···11-# Phase 4 Tasks — OAuth & Social Features
22-33-## OAuth Setup
44-55-- [ ] Install `@atcute/oauth-browser-client`
66-- [ ] Host OAuth client metadata JSON at a public URL & configure for local dev
77-- [ ] Create `core/auth/oauth.ts` — call `configureOAuth()` with client metadata URL and redirect URI
88-- [ ] Create `core/auth/session.ts` — session management: get, list, delete stored sessions
99-- [ ] Create `core/auth/store.ts` — Pinia auth store with state machine (idle → authenticating → authenticated → error)
1010-1111-## Login Flow
1212-1313-- [ ] Create `features/profile/LoginPage.vue` — handle input field + "Sign in" button
1414-- [ ] Implement handle → DID resolution
1515-- [ ] Implement OAuth redirect initiation
1616-- [ ] Create `/oauth/callback` route to handle redirect back
1717-- [ ] Implement token exchange on callback
1818-- [ ] Store session and update auth store
1919-- [ ] Redirect to Profile tab after successful login
2020-- [ ] Handle auth errors: invalid handle, OAuth denied, network failure
2121-2222-## Session Management
2323-2424-- [ ] Implement session restoration on app launch (call `getSession()` for stored DID)
2525-- [ ] Implement automatic token refresh (handled by `@atcute/oauth-browser-client` internally)
2626-- [ ] Implement logout: clear session, reset auth store, redirect to Home
2727-- [ ] Implement account switcher: `listStoredSessions()`, switch between accounts
2828-2929-## Capacitor Deep Links
3030-3131-- [ ] Configure custom URL scheme for OAuth callback on iOS/Android
3232-- [ ] Add `App.addListener('appUrlOpen')` handler to capture callback
3333-- [ ] Test OAuth flow on iOS simulator and Android emulator
3434-3535-## Auth-Aware XRPC Client
3636-3737-- [ ] Create authenticated XRPC client that uses `session.dpopFetch` for requests
3838-- [ ] Service layer: use authenticated client for mutations, public client for queries
3939-- [ ] Handle 401/expired session: trigger re-auth flow
4040-4141-## Social Actions — Star
4242-4343-- [ ] Create `services/tangled/mutations.ts` — mutation functions
4444-- [ ] Implement `starRepo(repoAtUri)` — creates `sh.tangled.feed.star` record on user's PDS
4545-- [ ] Implement `unstarRepo(rkey)` — deletes star record
4646-- [ ] Add star/unstar button to `RepoDetailPage` overview
4747-- [ ] Optimistic update: toggle star state and count immediately, rollback on error
4848-- [ ] Track user's existing stars to show correct initial state
4949-5050-## Social Actions — Follow
5151-5252-- [ ] Implement `followUser(targetDid)` — creates `sh.tangled.graph.follow` record
5353-- [ ] Implement `unfollowUser(rkey)` — deletes follow record
5454-- [ ] Add follow/unfollow button to profile pages and user cards
5555-- [ ] Optimistic update: toggle follow state immediately
5656-- [ ] Track user's existing follows to show correct initial state
5757-5858-## Social Actions — React
5959-6060-- [ ] Implement `addReaction(subjectUri, reaction)` — creates `sh.tangled.feed.reaction` record
6161-- [ ] Implement `removeReaction(rkey)` — deletes reaction record
6262-- [ ] Add reaction button/picker to PR and issue detail views
6363-- [ ] Show reaction counts grouped by type
6464-6565-## Profile Tab (Authenticated)
6666-6767-- [ ] Wire Profile tab to show current user's profile data
6868-- [ ] Show pinned repos, stats, starred repos, following list
6969-- [ ] Add logout button
7070-- [ ] Add account switcher UI
7171-7272-## Personalized Feed
7373-7474-- [ ] When signed in, filter activity feed to show activity from followed users and starred repos
7575-- [ ] Add "For You" / "Global" toggle on Activity tab
7676-- [ ] If appview provides a personalized endpoint, use it; otherwise filter client-side
7777-7878-## Quality
7979-8080-- [ ] Test full OAuth flow: login → browse → star → follow → logout
8181-- [ ] Test session restoration after app restart
8282-- [ ] Test on web, iOS simulator, Android emulator
8383-- [ ] Test error cases: denied OAuth, expired session, failed mutation
8484-- [ ] Verify optimistic updates roll back correctly on mutation failure
-64
docs/app/tasks/phase-5.md
···11-# Phase 5 Tasks — Offline & Performance Polish
22-33-## Query Persistence
44-55-- [ ] Set up `persistQueryClient` with IndexedDB adapter
66-- [ ] Configure persistence: debounced writes, max cache size, TTL per data type
77-- [ ] Hydrate query cache from IndexedDB before first render
88-- [ ] Verify: kill app → relaunch → cached data appears immediately without network
99-1010-## Offline Detection
1111-1212-- [ ] Create `core/network/status.ts` — reactive online/offline state (composable)
1313-- [ ] Show persistent offline banner when `navigator.onLine` is false
1414-- [ ] Disable mutation buttons (star, follow, react) when offline
1515-- [ ] Show toast when network returns: "Back online — refreshing"
1616-1717-## Secure Storage
1818-1919-- [ ] Abstract auth token storage behind `core/storage/secure.ts`
2020-- [ ] Web: encrypted localStorage wrapper
2121-- [ ] Native: Capacitor Secure Storage plugin
2222-- [ ] Verify tokens are never stored in IndexedDB query cache
2323-- [ ] Clear secure storage on logout
2424-2525-## Cache Eviction
2626-2727-- [ ] Implement max-item limits per data type (repos: 200, files: 100, profiles: 100)
2828-- [ ] Implement TTL eviction (remove entries older than their configured TTL)
2929-- [ ] Run eviction on app launch and periodically (every 30 min)
3030-- [ ] Measure IndexedDB size and log warnings if approaching limits
3131-3232-## Navigation Performance
3333-3434-- [ ] Verify Ionic `keep-alive` preserves tab state and scroll position
3535-- [ ] Implement data prefetch on repo card visibility (Intersection Observer)
3636-- [ ] Test tab switch speed — should feel instant with cached data
3737-- [ ] Profile and fix any layout shifts during navigation
3838-3939-## List Virtualization
4040-4141-- [ ] Replace flat lists with virtualized scroll for: repo lists, activity feed, file trees
4242-- [ ] Test with 1000+ item lists — verify smooth scrolling
4343-- [ ] Verify scroll position restoration when navigating back
4444-4545-## Image Optimization
4646-4747-- [ ] Lazy-load all avatars
4848-- [ ] Add initials fallback for missing avatars
4949-- [ ] Use appropriate image sizes from `avatar.tangled.sh`
5050-5151-## Bundle Optimization
5252-5353-- [ ] Add route-level code splitting (lazy imports per feature)
5454-- [ ] Tree-shake unused Ionic components (configure Ionic's component imports)
5555-- [ ] Measure bundle size — target under 500KB initial JS
5656-- [ ] Run Lighthouse audit — target 90+ performance score on mobile
5757-5858-## Quality
5959-6060-- [ ] Test offline → online transition: data refreshes without duplicates
6161-- [ ] Test low-bandwidth (3G throttle): skeleton → content transitions are smooth
6262-- [ ] Test memory usage over extended use: navigate many repos, check heap doesn't grow unbounded
6363-- [ ] Test on real iOS and Android devices (not just simulators)
6464-- [ ] Measure and document cold start time, tab switch time, scroll performance
-85
docs/app/tasks/phase-6.md
···11-# Phase 6 Tasks — Write Features & Project Services
22-33-## Project Services Setup
44-55-- [ ] Decide which write and notification operations belong in `packages/api` versus a separate service
66-- [ ] Implement health and readiness endpoints for all public client-facing services
77-- [ ] Configure CORS for the mobile app's origins
88-- [ ] Document the mobile-facing service contract in `docs/api`
99-1010-## Project Services — Auth Proxy
1111-1212-- [ ] Implement OAuth token exchange endpoint (if moving auth server-side)
1313-- [ ] Implement session endpoint that returns user info
1414-- [ ] Decide: keep client-side OAuth or migrate to service-mediated auth
1515-1616-## Project Services — Search and Graph
1717-1818-- [ ] Implement `GET /search` endpoint for repo/profile discovery
1919-- [ ] Return enough repo/profile metadata for the mobile client to render result cards directly
2020-- [ ] Implement `GET /profiles/:did/summary` for follower/following counts and other graph-derived gaps
2121-- [ ] Wire mobile client's search and profile summary services to these endpoints
2222-2323-## Project Services — Personalized Feed
2424-2525-- [ ] Implement `GET /feed/personalized` — aggregate activity for the user's follows and stars
2626-- [ ] Index relevant events from Jetstream
2727-- [ ] Wire mobile client's feed to the project service endpoint when authenticated
2828-2929-## Create Issue
3030-3131-- [ ] Create `features/repo/CreateIssuePage.vue` — title + body (markdown) form
3232-- [ ] Implement `createIssue()` mutation in `services/tangled/mutations.ts`
3333-- [ ] Create `sh.tangled.repo.issue` record on user's PDS
3434-- [ ] Optimistic update: add issue to local list
3535-- [ ] Navigate to new issue detail on success
3636-3737-## Comment on Issue
3838-3939-- [ ] Add comment input to issue detail view
4040-- [ ] Implement `createIssueComment()` mutation
4141-- [ ] Create `sh.tangled.repo.issue.comment` record with `replyTo` support for threading
4242-- [ ] Optimistic update: append comment to list
4343-4444-## Comment on PR
4545-4646-- [ ] Add comment input to PR detail view
4747-- [ ] Implement `createPRComment()` mutation
4848-- [ ] Create `sh.tangled.repo.pull.comment` record
4949-5050-## Issue State Management
5151-5252-- [ ] Add close/reopen button to issue detail (author and repo owner only)
5353-- [ ] Implement `closeIssue()` / `reopenIssue()` — create `sh.tangled.repo.issue.state` record
5454-- [ ] Optimistic update: toggle state badge
5555-5656-## Edit Profile
5757-5858-- [ ] Create `features/profile/EditProfilePage.vue`
5959-- [ ] Implement avatar upload (max 1MB, png/jpeg) via blob upload + record update
6060-- [ ] Implement bio edit (max 256 graphemes)
6161-- [ ] Implement links edit (max 5 URIs)
6262-- [ ] Implement location, pronouns, pinned repos, stats selection, bluesky toggle
6363-- [ ] Update `sh.tangled.actor.profile` record (key: `self`)
6464-6565-## Scope Upgrade
6666-6767-- [ ] Detect when user's session lacks scopes needed for write operations
6868-- [ ] Prompt user to re-authorize with expanded scopes
6969-- [ ] Handle scope upgrade flow gracefully (no data loss)
7070-7171-## Push Notifications (if services exist)
7272-7373-- [ ] Implement `POST /notifications/register` — register device token
7474-- [ ] Configure Capacitor Push Notifications plugin
7575-- [ ] Register device token on login
7676-- [ ] Services: subscribe to events relevant to user, deliver via APNs/FCM
7777-- [ ] Handle notification tap → deep link to relevant content
7878-7979-## Quality
8080-8181-- [ ] Test issue creation end-to-end: create → verify on tangled.org
8282-- [ ] Test commenting on issues and PRs
8383-- [ ] Test profile editing: avatar upload, bio change
8484-- [ ] Test scope upgrade flow
8585-- [ ] Verify mutations work offline-queued (if implemented) or show appropriate offline errors
-68
docs/app/tasks/phase-7.md
···11-# Phase 7 Tasks — Real-Time Feed & Advanced Features
22-33-## Jetstream Integration
44-55-- [ ] Install `@atcute/jetstream`
66-- [ ] Create `services/atproto/jetstream.ts` — WebSocket connection manager
77-- [ ] Filter events for `sh.tangled.*` collections
88-- [ ] Normalize events into `ActivityItem` domain model
99-- [ ] Merge live events into TanStack Query feed cache
1010-- [ ] Implement connection lifecycle: connect on foreground, disconnect on background
1111-- [ ] Implement reconnection with exponential backoff and cursor tracking
1212-- [ ] Add battery-aware throttling (Capacitor Battery API)
1313-1414-## Live UI Indicators
1515-1616-- [ ] Activity feed: "X new items" pill at top, tap to reveal
1717-- [ ] Repo detail: "New commits available" banner on ref update events
1818-- [ ] PR detail: live status badge updates (open → merged)
1919-- [ ] Issue detail: live comment count updates
2020-2121-## Custom Feeds
2222-2323-- [ ] Create `domain/feed/custom-feed.ts` — feed configuration model
2424-- [ ] Implement feed builder UI: name + filter rules (by repo, user, event type)
2525-- [ ] Store custom feeds in IndexedDB
2626-- [ ] Render custom feed as a selectable tab/option on Activity page
2727-- [ ] "My repos" preset, "Watching" preset, "Team" preset
2828-2929-## Repo Forking
3030-3131-- [ ] Add fork button to repo detail (authenticated only)
3232-- [ ] Implement fork creation via `sh.tangled.repo.create` with `source` field
3333-- [ ] Show fork status badge: up-to-date, fast-forwardable, conflict, missing branch
3434-- [ ] Implement "Sync fork" action via `sh.tangled.repo.forkSync`
3535-3636-## Labels
3737-3838-- [ ] Fetch label definitions for a repo
3939-- [ ] Display color-coded label chips on issues and PRs
4040-- [ ] Implement label filtering on issue/PR lists
4141-- [ ] Add/remove labels on issues and PRs (authenticated, with label scopes)
4242-4343-## Expanded Reactions
4444-4545-- [ ] Add reaction picker component: thumbsup, thumbsdown, laugh, tada, confused, heart, rocket, eyes
4646-- [ ] Show grouped reaction counts on PRs, issues, and comments
4747-- [ ] Add/remove reactions with optimistic updates
4848-4949-## PR Interdiff
5050-5151-- [ ] Detect PR round count
5252-- [ ] Add round selector to PR detail
5353-- [ ] Fetch and display diff between selected rounds
5454-- [ ] Use `sh.tangled.repo.compare` for cross-round comparison
5555-5656-## Knot Info
5757-5858-- [ ] Show knot hostname on repo detail
5959-- [ ] Fetch knot version via `sh.tangled.knot.version`
6060-- [ ] Display knot status/health indicator
6161-6262-## Quality
6363-6464-- [ ] Test Jetstream under network transitions (WiFi → cellular → offline → online)
6565-- [ ] Verify no duplicate events after reconnection with cursor
6666-- [ ] Measure battery impact of WebSocket connection on iOS and Android
6767-- [ ] Test memory usage with long-running Jetstream connection
6868-- [ ] Load test: simulate high-frequency events, verify UI stays responsive
-77
docs/qa.md
···11----
22-title: "QA Checklist"
33-updated: 2026-03-23
44----
55-66-# QA Checklist
77-88-## Ingestion (end-to-end)
99-1010-Walk a record through the full pipeline: Tap event → indexer → store → searchable.
1111-1212-- [ ] Indexer connects to Tap via WebSocket and begins processing events
1313-- [ ] Creating a tracked record on Tangled produces a row in `documents`
1414-- [ ] Updating that record changes the existing row (new CID)
1515-- [ ] Deleting that record tombstones the row (`deleted_at` set)
1616-- [ ] Tombstoned documents do not appear in search results
1717-- [ ] Identity events update the handle cache; new documents show resolved handles
1818-- [ ] Unsupported collections are silently skipped (no errors logged)
1919-- [ ] Connection drop triggers automatic reconnect and resumes from last cursor
2020-2121-## Cursor durability
2222-2323-- [ ] Kill the indexer mid-stream, restart — processing resumes without duplicating documents
2424-- [ ] Redeploy the indexer — cursor is persisted before shutdown, no gap or replay
2525-2626-## Backfill
2727-2828-Run `twister backfill` against a small seed file and verify the discovery graph.
2929-3030-- [ ] Seed file with known Tangled users produces a non-empty discovery graph
3131-- [ ] `--max-hops 1` limits discovery to direct follows/collaborators only
3232-- [ ] `--dry-run` logs the plan but does not call Tap mutation endpoints
3333-- [ ] Already-tracked DIDs are reported and not re-submitted
3434-- [ ] Re-running the same seeds is idempotent
3535-- [ ] After backfill + Tap sync, search returns historical content that wasn't there before
3636-3737-## Search API
3838-3939-- [ ] `GET /search?q=<repo-name>` returns the expected repo as top result
4040-- [ ] Searching by title keyword returns expected documents
4141-- [ ] Searching by author handle returns their content
4242-- [ ] `collection`, `type`, `author`, `repo` filters restrict results correctly
4343-- [ ] Pagination: `offset=0&limit=5` then `offset=5&limit=5` return disjoint result sets
4444-- [ ] Missing `q` param returns 400 with error JSON
4545-- [ ] Unknown query param returns 400
4646-- [ ] `GET /documents/{id}` returns the full document; 404 for missing or tombstoned
4747-- [ ] `GET /healthz` returns 200
4848-- [ ] `GET /readyz` returns 503 when DB is unreachable
4949-5050-## Deployment (Railway)
5151-5252-- [ ] API service healthy and routable at public URL
5353-- [ ] Indexer service healthy on `:9090/health`
5454-- [ ] A new Tangled record ingested post-deploy becomes searchable within seconds
5555-- [ ] Redeploying the API preserves availability (health-check-gated rollout)
5656-- [ ] Restarting the indexer does not lose sync position
5757-- [ ] Environment variables match the documented set in `docs/api/deploy.md`
5858-5959-## Mobile — Navigation & Shell
6060-6161-- [ ] All five tabs render and switch without layout shift
6262-- [ ] Tab-to-tab navigation preserves scroll position and component state
6363-- [ ] Pages show skeleton loaders before data appears
6464-- [ ] iOS and Android builds compile and launch via Capacitor
6565-6666-## Mobile — Live Tangled Browsing
6767-6868-- [ ] Repo detail page loads metadata from PDS + git data from knot
6969-- [ ] README renders via markdown renderer
7070-- [ ] File tree navigates directories; file viewer shows syntax-highlighted content
7171-- [ ] Commit log paginates with cursor
7272-- [ ] Profile page shows avatar, bio, and repos from PDS
7373-- [ ] Issue list filters by state (open/closed); detail shows body + threaded comments
7474-- [ ] PR list filters by status; detail shows source/target branches + comments
7575-- [ ] Stale-while-revalidate: cached data shows immediately, refreshes in background
7676-- [ ] Error states render correctly: 404, network failure, empty repo
7777-- [ ] Slow network: skeleton → content transition is smooth (test with throttled devtools)
+161
docs/reference/api.md
···11+---
22+title: API Service Reference
33+updated: 2026-03-24
44+---
55+66+Twister is a Go service that indexes Tangled content and serves a search API. It connects to the AT Protocol ecosystem via Tap (firehose consumer), XRPC (direct record lookups), and Constellation (backlink queries), storing indexed data in Turso/libSQL with FTS5 full-text search.
77+88+## Architecture
99+1010+The service is a single Go binary with multiple subcommands, each running a different runtime mode. All modes share the same database and configuration layer.
1111+1212+**Runtime modes:**
1313+1414+| Command | Purpose |
1515+| ------------- | ------------------------------------------------------------------- |
1616+| `api` (serve) | HTTP search API server |
1717+| `indexer` | Consumes Tap firehose events, normalizes and indexes records |
1818+| `backfill` | Discovers users from seed files, registers them with Tap |
1919+| `enrich` | Backfills missing metadata (repo names, handles, web URLs) via XRPC |
2020+| `reindex` | Re-syncs all documents into the FTS index |
2121+| `healthcheck` | One-shot liveness probe for container orchestration |
2222+2323+The `embed-worker` and `reembed` commands exist as stubs for the upcoming semantic search pipeline (Nomic Embed Text v1.5 deployed via Railway template).
2424+2525+All commands accept a `--local` flag that switches to a local SQLite file and text-format logging for development.
2626+2727+## HTTP API
2828+2929+The API server binds to `:8080` by default (configurable via `HTTP_BIND_ADDR`). CORS is open (`*` origin, GET/OPTIONS).
3030+3131+### Search
3232+3333+**`GET /search`** — Main search endpoint. Routes to keyword, semantic, or hybrid based on `mode` parameter.
3434+3535+**`GET /search/keyword`** — Full-text search via FTS5 with BM25 scoring.
3636+3737+Parameters:
3838+3939+- `q` (required) — Query string
4040+- `limit` (1–100, default 20) — Results per page
4141+- `offset` (default 0) — Pagination offset
4242+- `collection` — Filter by AT Protocol collection NSID
4343+- `type` — Filter by record type (repo, issue, pull, profile, string)
4444+- `author` — Filter by handle or DID
4545+- `repo` — Filter by repo name or DID
4646+- `language` — Filter by primary language
4747+- `from`, `to` — Date range (ISO 8601)
4848+- `state` — Filter issues/PRs by state (open, closed, merged)
4949+- `mode` — Search mode (keyword, semantic, hybrid)
5050+5151+Response includes query metadata, total count, and an array of results each containing: ID, collection, record type, title, summary, body snippet (with `<mark>` highlights), score, repo name, author handle, DID, AT-URI, web URL, and timestamps.
5252+5353+**`GET /documents/{id}`** — Fetch a single document by stable ID.
5454+5555+### Health
5656+5757+- **`GET /healthz`** — Liveness probe, always 200
5858+- **`GET /readyz`** — Readiness probe, pings database
5959+6060+### Admin
6161+6262+When `ENABLE_ADMIN_ENDPOINTS=true` with a configured `ADMIN_AUTH_TOKEN`:
6363+6464+- **`POST /admin/reindex`** — Trigger FTS re-sync
6565+6666+### Static Content
6767+6868+The API also serves a search site with live search and API documentation at `/` and `/docs*`, built with Alpine.js (no build step, embedded in `internal/view/`).
6969+7070+## Database
7171+7272+Turso (libSQL) with the following tables:
7373+7474+**documents** — Core search index. Each record gets a stable ID of `did|collection|rkey`. Stores title, body, summary, metadata (repo name, author handle, web URL, language, tags), and timestamps. Soft-deleted via `deleted_at`.
7575+7676+**documents_fts** — FTS5 virtual table for full-text search over title, body, summary, repo name, author handle, and tags. Uses `unicode61` tokenizer with tuned BM25 weights (title weighted highest at 2.5, then author handle at 2.0, summary at 1.5).
7777+7878+**sync_state** — Cursor tracking for the Tap consumer. Stores consumer name, current cursor, high water mark, and last update time. Enables crash-safe resume.
7979+8080+**identity_handles** — DID-to-handle cache. Updated from Tap identity events and XRPC lookups.
8181+8282+**record_state** — Issue and PR state cache (open/closed/merged). Keyed by subject AT-URI.
8383+8484+**document_embeddings** — Vector storage (768-dim F32_BLOB with DiskANN cosine index). Schema ready but not yet populated.
8585+8686+**embedding_jobs** — Async embedding job queue. Schema ready but worker not yet active.
8787+8888+## Indexing Pipeline
8989+9090+The indexer connects to Tap via WebSocket, consuming AT Protocol record events in real-time. For each event:
9191+9292+1. Filter against the configured collection allowlist (supports wildcards like `sh.tangled.*`)
9393+2. Route to the appropriate normalizer based on collection
9494+3. Normalize into a document (extract title, body, summary, metadata)
9595+4. Optionally enrich via XRPC (resolve author handle, repo name, web URL)
9696+5. Upsert into the database (auto-syncs FTS)
9797+6. Advance cursor and acknowledge to Tap
9898+9999+The indexer resumes from its last cursor on restart (no duplicate processing). It logs status every 30 seconds and uses exponential backoff (1s–5s) for transient failures.
100100+101101+## Record Normalizers
102102+103103+Each AT Protocol collection has a dedicated normalizer that extracts searchable content:
104104+105105+| Collection | Record Type | Searchable | Content |
106106+| ------------------------------- | ------------- | ------------------------ | --------------------------- |
107107+| `sh.tangled.repo` | repo | Yes (if named) | Name, description, topics |
108108+| `sh.tangled.repo.issue` | issue | Yes | Title, body, repo reference |
109109+| `sh.tangled.repo.pull` | pull | Yes | Title, body, target branch |
110110+| `sh.tangled.repo.issue.comment` | issue_comment | Yes (if has body) | Comment body |
111111+| `sh.tangled.repo.pull.comment` | pull_comment | Yes (if has body) | Comment body |
112112+| `sh.tangled.string` | string | Yes (if has content) | Filename, contents |
113113+| `sh.tangled.actor.profile` | profile | Yes (if has description) | Profile description |
114114+| `sh.tangled.graph.follow` | follow | No | Graph edge only |
115115+116116+State records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) update the `record_state` table rather than creating documents.
117117+118118+## XRPC Client
119119+120120+The built-in XRPC client provides typed access to AT Protocol endpoints with caching (1-hour TTL for DID docs and repo names):
121121+122122+- DID resolution via PLC Directory (`did:plc:`) or `.well-known/did.json` (`did:web:`)
123123+- Identity resolution (PDS endpoint + handle from DID document)
124124+- Record fetching (`com.atproto.repo.getRecord`, `com.atproto.repo.listRecords`)
125125+- Repo name resolution from `sh.tangled.repo` records
126126+- Web URL construction for Tangled entities
127127+128128+## Backfill
129129+130130+The backfill command discovers users from a seed file and registers them with Tap for indexing. Discovery fans out via follow graphs and repo collaborators up to a configurable hop depth (default 2). Supports dry-run mode, configurable concurrency and batch sizes, and is idempotent.
131131+132132+## Configuration
133133+134134+All configuration is via environment variables (with `.env` file support):
135135+136136+| Variable | Default | Purpose |
137137+| -------------------------- | ----------------------- | ---------------------------------------------- |
138138+| `TURSO_DATABASE_URL` | — | Database connection (required) |
139139+| `TURSO_AUTH_TOKEN` | — | Auth token (required for remote) |
140140+| `TAP_URL` | — | Tap WebSocket URL |
141141+| `TAP_AUTH_PASSWORD` | — | Tap admin password |
142142+| `INDEXED_COLLECTIONS` | all | Collection allowlist (CSV, supports wildcards) |
143143+| `HTTP_BIND_ADDR` | `:8080` | API server bind address |
144144+| `INDEXER_HEALTH_ADDR` | `:9090` | Indexer health probe address |
145145+| `LOG_LEVEL` | info | debug/info/warn/error |
146146+| `LOG_FORMAT` | json | json or text |
147147+| `ENABLE_ADMIN_ENDPOINTS` | false | Enable admin routes |
148148+| `ADMIN_AUTH_TOKEN` | — | Bearer token for admin |
149149+| `ENABLE_INGEST_ENRICHMENT` | true | XRPC enrichment at ingest time |
150150+| `PLC_DIRECTORY_URL` | `https://plc.directory` | PLC Directory |
151151+| `XRPC_TIMEOUT` | 15s | XRPC HTTP timeout |
152152+153153+## Deployment
154154+155155+Deployed on Railway with three services:
156156+157157+- **api** — HTTP server (port 8080, health at `/healthz`)
158158+- **indexer** — Tap consumer (health at `:9090/healthz`)
159159+- **tap** — Tap instance (external dependency)
160160+161161+All services share the same Turso database. The API and indexer are separate deployments of the same binary with different subcommands.
+78
docs/reference/app.md
···11+---
22+title: Mobile App Reference
33+updated: 2026-03-24
44+---
55+66+Twisted is an Ionic Vue mobile app for browsing Tangled, a git hosting platform built on the AT Protocol. It targets iOS and Android via Capacitor (no web target).
77+88+## Tech Stack
99+1010+- **Vue 3** with TypeScript and Composition API
1111+- **Ionic Vue** for native-feeling UI components
1212+- **Capacitor** for iOS/Android builds
1313+- **Pinia** for state management
1414+- **TanStack Query** for async data with caching
1515+- **@atcute/client** and **@atcute/tangled** for AT Protocol XRPC
1616+1717+TypeScript files use `.js` extensions in imports. Package management via pnpm.
1818+1919+## Architecture
2020+2121+Three-layer design:
2222+2323+**Presentation** — Vue components and pages using Ionic's component library. Five-tab navigation: Home, Explore, Activity, Profile (visible tabs) plus Repo (pushed route). Repo detail uses segmented tabs: Overview, Files, Issues, PRs.
2424+2525+**Domain** — TypeScript types modeling the app's data: UserSummary, RepoSummary, RepoDetail, RepoFile, PullRequestSummary, IssueSummary, ActivityItem. These are app-internal representations, decoupled from API response shapes.
2626+2727+**Data** — Service layer that fetches from external sources and normalizes into domain types. The flow is: Vue component → composable → TanStack Query hook → service function → XRPC call → normalizer → domain model.
2828+2929+## Directory Structure
3030+3131+```sh
3232+src/
3333+ app/ — App shell, router, global config
3434+ core/ — Shared utilities, constants
3535+ services/ — API clients and data fetching
3636+ atproto/ — @atcute client setup, error handling
3737+ tangled/ — Endpoints, normalizers, TanStack Query hooks
3838+ domain/ — TypeScript type definitions
3939+ features/ — Feature modules (home, explore, repo, etc.)
4040+ components/ — Shared UI components
4141+```
4242+4343+## Data Sources
4444+4545+The app reads from multiple sources depending on what's needed:
4646+4747+- **Knots** (Tangled XRPC servers) — Git data: file trees, blobs, commits, branches, diffs. Each repo is hosted on a specific knot.
4848+- **PDS** (Personal Data Servers) — AT Protocol records: profiles, issues, PRs, comments, stars, follows. Accessed via `com.atproto.repo.getRecord` and `com.atproto.repo.listRecords`.
4949+- **Twister API** — Search and index-backed summaries (when available).
5050+- **Constellation** — Social signal counts and backlinks (stars, followers, reactions).
5151+5252+Knots serve XRPC endpoints for git operations. The appview at `tangled.org` returns HTML only (no JSON API), so the app goes directly to knots for git data and PDS for AT Protocol records.
5353+5454+## Completed Features
5555+5656+### Navigation & Shell (Phase 1)
5757+5858+Five-tab layout with Vue Router, skeleton loaders, placeholder pages. Design system components: RepoCard, UserCard, ActivityCard, FileTreeItem, EmptyState, ErrorBoundary, SkeletonLoader, MarkdownRenderer.
5959+6060+### Public Browsing (Phase 2)
6161+6262+All read-only browsing works without authentication:
6363+6464+**Repository browsing** — Metadata display, README rendering (markdown), file tree navigation, file viewer with syntax context, commit log with pagination, branch listing.
6565+6666+**Profile browsing** — Avatar, bio, links fetched from PDS. User's repos listed.
6767+6868+**Issues** — List view with open/closed state filter, detail view with threaded comments.
6969+7070+**Pull Requests** — List view with status filter (open/closed/merged), detail view with comments.
7171+7272+**Caching** — TanStack Query configured with per-data-type stale times. Persistence via Dexie (IndexedDB) — works in Capacitor's WebView on device and in the browser during local dev.
7373+7474+## Routing
7575+7676+The app resolves identities through AT Protocol: handle → DID (via PDS resolution) → records. For repo git data, the knot hostname is extracted from the repo's DID document.
7777+7878+Home tab currently provides direct handle-based browsing: enter a known handle to view their profile and repos. This works without any index or search dependency.
+138
docs/reference/lexicons.md
···11+---
22+title: Tangled Lexicons
33+updated: 2026-03-24
44+---
55+66+Tangled defines its AT Protocol record types under the `sh.tangled.*` namespace. These are the records stored on users' Personal Data Servers (PDS) and consumed by the indexing pipeline.
77+88+## Searchable Records
99+1010+### sh.tangled.repo
1111+1212+Repository metadata. Created when a user registers a repo with Tangled.
1313+1414+- `name` (string, required) — Repository name
1515+- `description` (string) — Short description
1616+- `createdAt` (datetime) — Creation timestamp
1717+- `knot` (string) — Knot DID hosting the git data
1818+- `topics` (array of strings) — Tags/topics
1919+2020+### sh.tangled.repo.issue
2121+2222+Issue on a repository.
2323+2424+- `repo` (at-uri, required) — Reference to the parent repo record
2525+- `title` (string, required) — Issue title
2626+- `body` (string) — Issue body (markdown)
2727+- `createdAt` (datetime)
2828+2929+### sh.tangled.repo.pull
3030+3131+Pull request on a repository.
3232+3333+- `repo` (at-uri, required) — Reference to the parent repo record
3434+- `title` (string, required) — PR title
3535+- `body` (string) — PR body (markdown)
3636+- `head` (string) — Source branch
3737+- `base` (string) — Target branch
3838+- `createdAt` (datetime)
3939+4040+### sh.tangled.string
4141+4242+Code snippet or gist.
4343+4444+- `filename` (string) — File name with extension
4545+- `contents` (string, required) — Code content
4646+- `language` (string) — Programming language
4747+- `createdAt` (datetime)
4848+4949+### sh.tangled.actor.profile
5050+5151+User profile information.
5252+5353+- `displayName` (string) — Display name
5454+- `description` (string) — Bio/about text
5555+- `avatar` (blob) — Profile image
5656+- `pronouns` (string) — Pronouns
5757+- `location` (string) — Location
5858+- `links` (array of strings) — External links
5959+- `pinnedRepos` (array of at-uri) — Pinned repository references
6060+6161+## Interaction Records
6262+6363+### sh.tangled.feed.star
6464+6565+Star/favorite on a repository.
6666+6767+- `subject` (object, required) — `{ uri: at-uri, cid: cid }` referencing the starred repo
6868+6969+### sh.tangled.graph.follow
7070+7171+Follow relationship between users.
7272+7373+- `subject` (did, required) — DID of the followed user
7474+- `createdAt` (datetime)
7575+7676+### sh.tangled.feed.reaction
7777+7878+Emoji reaction on content.
7979+8080+- `subject` (object, required) — `{ uri: at-uri, cid: cid }` referencing the target
8181+- `emoji` (string, required) — Reaction emoji
8282+8383+## State Records
8484+8585+### sh.tangled.repo.issue.state
8686+8787+Tracks whether an issue is open or closed.
8888+8989+- `issue` (at-uri, required) — Reference to the issue
9090+- `state` (string, required) — `open` or `closed`
9191+9292+### sh.tangled.repo.pull.status
9393+9494+Tracks pull request lifecycle.
9595+9696+- `pull` (at-uri, required) — Reference to the PR
9797+- `status` (string, required) — `open`, `closed`, or `merged`
9898+9999+## Comment Records
100100+101101+### sh.tangled.repo.issue.comment
102102+103103+Comment on an issue.
104104+105105+- `issue` (at-uri, required) — Reference to the parent issue
106106+- `body` (string, required) — Comment body (markdown)
107107+- `parent` (at-uri) — Parent comment for threading
108108+- `createdAt` (datetime)
109109+110110+### sh.tangled.repo.pull.comment
111111+112112+Comment on a pull request.
113113+114114+- `pull` (at-uri, required) — Reference to the parent PR
115115+- `body` (string, required) — Comment body (markdown)
116116+- `parent` (at-uri) — Parent comment for threading
117117+- `createdAt` (datetime)
118118+119119+## Infrastructure Records
120120+121121+### sh.tangled.knot.member
122122+123123+Knot membership record.
124124+125125+- `knot` (did, required) — Knot DID
126126+- `permission` (string) — Permission level
127127+128128+### sh.tangled.knot.version
129129+130130+Knot software version metadata.
131131+132132+## Stable ID Format
133133+134134+Documents in the search index use the stable ID format: `did|collection|rkey` (e.g., `did:plc:abc123|sh.tangled.repo|repo-name`). This ensures idempotent upserts regardless of CID changes.
135135+136136+## AT-URI Format
137137+138138+Records are addressed as `at://did/collection/rkey` (e.g., `at://did:plc:abc123/sh.tangled.repo/repo-name`).
+126
docs/roadmap.md
···11+---
22+title: Roadmap
33+updated: 2026-03-24
44+---
55+66+## API: Constellation Integration
77+88+Add a Constellation client to the Go API for enriching search results with social signals.
99+1010+- [ ] Constellation XRPC client (`internal/constellation/`) with `getBacklinksCount` and `getBacklinks`
1111+- [ ] User-agent header with project name and contact
1212+- [ ] Enrich search results with star counts from Constellation
1313+- [ ] Profile summary endpoint (`GET /profiles/{did}/summary`) with follower/following counts from Constellation
1414+- [ ] Cache Constellation responses with short TTL (star/follower counts change infrequently)
1515+1616+## API: Semantic Search Pipeline
1717+1818+Nomic Embed Text v1.5 via Railway template, async embedding pipeline.
1919+2020+- [ ] Deploy nomic-embed Railway template (`POST /api/embeddings` with Bearer auth)
2121+- [ ] Embedding client in Go API (`internal/embedding/`) calling the Nomic service
2222+- [ ] Embed-worker: consume `embedding_jobs` queue, generate 768-dim vectors, store in `document_embeddings`
2323+- [ ] `GET /search/semantic` endpoint using DiskANN vector_top_k
2424+- [ ] Reembed command for bulk re-generation
2525+2626+## API: Hybrid Search
2727+2828+Combine keyword and semantic results.
2929+3030+- [ ] Score normalization (keyword BM25 → [0,1], semantic cosine → [0,1])
3131+- [ ] Weighted merge (0.65 keyword + 0.35 semantic, configurable)
3232+- [ ] Deduplication by document ID
3333+- [ ] `matched_by` metadata in results
3434+3535+## API: Search Quality
3636+3737+- [ ] Field weight tuning based on real queries
3838+- [ ] Recency boost for recently updated content
3939+- [ ] Star count ranking signal (via Constellation)
4040+- [ ] State filtering defaults (exclude closed issues)
4141+- [ ] Better snippets with longer context
4242+- [ ] Relevance test fixtures
4343+4444+## API: Observability
4545+4646+- [ ] Structured metrics: ingestion rate, search latency, embedding throughput
4747+- [ ] Dashboard or log-based monitoring
4848+4949+## App: Search & Discovery
5050+5151+Wire the Explore tab to the search API and add activity feed.
5252+5353+**Depends on:** API: Constellation Integration
5454+5555+- [ ] Search service pointing at Twister API
5656+- [ ] Constellation service for star/follower counts
5757+- [ ] Debounced search on Explore tab with segmented results
5858+- [ ] Recent search history (local)
5959+- [ ] Graceful fallback when search API unavailable
6060+- [ ] Activity feed data source investigation (Jetstream vs polling)
6161+- [ ] Activity tab with filters, infinite scroll, pull-to-refresh
6262+- [ ] Home tab: surface recently viewed repos/profiles
6363+6464+## App: Authentication & Social
6565+6666+Bluesky OAuth and authenticated actions.
6767+6868+**Depends on:** App: Search & Discovery (for Constellation service), API: Constellation Integration
6969+7070+- [ ] OAuth setup with `@atcute/oauth-browser-client`
7171+- [ ] Login page, OAuth flow, callback handling
7272+- [ ] Capacitor deep link configuration
7373+- [ ] Session management (restore, refresh, logout, account switcher)
7474+- [ ] Auth-aware XRPC client using dpopFetch
7575+- [ ] Star repos (write to PDS, count from Constellation)
7676+- [ ] Follow users (write to PDS, count from Constellation)
7777+- [ ] React to content (write to PDS, count from Constellation)
7878+- [ ] Authenticated profile tab (pinned repos, stats, starred, following)
7979+- [ ] Personalized feed ("For You" / "Global" toggle)
8080+8181+## App: Write Features
8282+8383+**Depends on:** App: Authentication & Social
8484+8585+- [ ] Create issue (title + markdown body)
8686+- [ ] Comment on issues and PRs (threaded)
8787+- [ ] Close/reopen issues
8888+- [ ] Edit profile (bio, links, avatar, pinned repos)
8989+- [ ] OAuth scope upgrade flow
9090+9191+## App: Offline & Performance
9292+9393+**Depends on:** App: Search & Discovery (for cache persistence of search/feed data)
9494+9595+- [ ] Dexie setup with database schema (query cache + pinned content tables)
9696+- [ ] TanStack Query persister backed by Dexie
9797+- [ ] Pinned content store (save/unsave files for offline reading)
9898+- [ ] Pinned files UI (list, pin/unpin actions on file viewer, last-fetched timestamp)
9999+- [ ] Offline detection and banner
100100+- [ ] Secure token storage (Capacitor Secure Storage)
101101+- [ ] Cache eviction (per-type limits and TTL, pinned content exempt)
102102+- [ ] List virtualization for large datasets
103103+- [ ] Lazy-load avatars, prefetch on hover
104104+- [ ] Code splitting and bundle optimization (target <500KB JS)
105105+106106+## App: Real-Time & Advanced
107107+108108+**Depends on:** App: Authentication & Social, App: Offline & Performance
109109+110110+- [ ] Jetstream integration for live `sh.tangled.*` events
111111+- [ ] Live UI indicators (new commits, new feed items, PR status)
112112+- [ ] Custom feed presets ("My repos", "Watching", "Team")
113113+- [ ] Repo forking
114114+- [ ] Labels (display, filter, manage)
115115+- [ ] Expanded reactions with emoji picker
116116+- [ ] PR interdiff (compare rounds)
117117+- [ ] Knot info display
118118+119119+## App: Push Notifications
120120+121121+**Depends on:** App: Authentication & Social
122122+123123+- [ ] Register device token on login
124124+- [ ] Subscribe to relevant events
125125+- [ ] Deliver via APNs/FCM
126126+- [ ] Handle notification taps (deep link to relevant screen)
+176
docs/specs/app-features.md
···11+---
22+title: App Features
33+updated: 2026-03-24
44+---
55+66+## Search & Discovery
77+88+**Depends on:** Search API (Twister), Constellation API
99+1010+### Search
1111+1212+- Create search service pointing at Twister API
1313+- Debounced search input on Explore tab
1414+- Segmented results: repos, users, issues/PRs
1515+- Recent search history (local storage, clearable)
1616+- Graceful fallback when search API is unavailable
1717+1818+### Discovery Sections
1919+2020+- Explore tab shows search prominently
2121+- Optional: trending repos or recently active repos (if data supports it)
2222+- Profile summaries enriched with Constellation data (star counts, follower counts)
2323+2424+### Home Tab
2525+2626+- Handle-based direct browsing (already works)
2727+- Surface recently viewed repos/profiles from local history
2828+- Optional: personalized suggestions for signed-in users (later)
2929+3030+### Activity Feed
3131+3232+- Investigate data sources: Jetstream, polling PDS, or Twister-aggregated feed
3333+- Activity tab shows recent events from followed users and starred repos
3434+- Filters by event type (commits, issues, PRs, stars)
3535+- Infinite scroll with pull-to-refresh
3636+3737+## Authentication & Social
3838+3939+**Depends on:** Bluesky OAuth, Constellation API
4040+4141+### OAuth Sign-In
4242+4343+- Install `@atcute/oauth-browser-client`
4444+- Host client metadata JSON with required scopes
4545+- Login page: handle input → resolution → OAuth redirect → callback
4646+- Capacitor deep link handling for native redirect
4747+- Session restoration on app launch, automatic token refresh
4848+- Logout, account switcher for multiple accounts
4949+- Auth state: idle → authenticating → authenticated → error
5050+5151+### Social Actions
5252+5353+All social actions are AT Protocol record writes to the user's PDS. Counts come from Constellation.
5454+5555+- **Star:** Create/delete `sh.tangled.feed.star` record. Show star count via Constellation `getBacklinksCount`.
5656+- **Follow:** Create/delete `sh.tangled.graph.follow` record. Show follower count via Constellation.
5757+- **React:** Create `sh.tangled.feed.reaction` record. Show reaction counts via Constellation.
5858+- Optimistic UI updates via TanStack Query mutation + cache invalidation.
5959+6060+### Authenticated Profile
6161+6262+- Profile tab shows current user's data when signed in
6363+- Pinned repos, stats (repos, stars, followers via Constellation)
6464+- Starred repos list
6565+- Following/followers lists (via Constellation `getBacklinks`)
6666+- Settings and logout
6767+6868+### Personalized Feed
6969+7070+- Filter activity feed to followed users and starred repos
7171+- "For You" / "Global" toggle on activity tab
7272+7373+## Write Features
7474+7575+**Depends on:** Authentication
7676+7777+### Issues
7878+7979+- Create issue: title + markdown body, posted as `sh.tangled.repo.issue` record
8080+- Comment on issue: threaded comments as `sh.tangled.repo.issue.comment` records
8181+- Close/reopen: create `sh.tangled.repo.issue.state` record
8282+8383+### Pull Requests
8484+8585+- Comment on PR: `sh.tangled.repo.pull.comment` records
8686+8787+### Profile Editing
8888+8989+- Edit bio, links, location, pronouns, pinned repos
9090+- Avatar upload (max 1MB, png/jpeg)
9191+- Cross-posting toggle
9292+- Posted as updated `sh.tangled.actor.profile` record
9393+9494+### OAuth Scope Upgrade
9595+9696+- Detect when an action requires a scope not yet granted
9797+- Prompt user to re-authorize with expanded scopes
9898+9999+## Offline & Performance
100100+101101+### Local Storage
102102+103103+All local persistence uses **Dexie** over IndexedDB. This works natively in Capacitor's WebView on both iOS and Android, and in the browser during local development — no platform branching or plugins needed.
104104+105105+Three storage layers, each with a distinct purpose:
106106+107107+- **TanStack Query persister (Dexie-backed)** — Automatic cache persistence. Previously-viewed data hydrates on launch and serves from cache when offline. Subject to normal cache eviction (stale times, GC).
108108+- **Pinned content store (Dexie)** — User-initiated "save for offline" storage for files, READMEs, and other reference content. Exempt from cache eviction — only the user removes pinned items. Stores file content, metadata, repo handle, pinned timestamp.
109109+- **Capacitor Preferences** — Small key-value settings (theme, recent search history, feed preferences).
110110+- **Capacitor Secure Storage** — Auth tokens only. Never in Dexie or the query cache.
111111+112112+### Offline Behavior
113113+114114+- TanStack Query serves cached data when offline (stale-while-revalidate)
115115+- Pinned files always available regardless of connectivity
116116+- Offline detection via `navigator.onLine`, persistent banner
117117+- Mutations disabled when offline
118118+- Background refresh when connectivity returns
119119+120120+### Pinned Files
121121+122122+Users can pin/save references to files for offline reading:
123123+124124+- Pin action on file viewer saves content + metadata to the Dexie pinned store
125125+- Pinned files list accessible from profile or a dedicated section
126126+- Content persists until the user explicitly unpins
127127+- Pinned items show last-fetched timestamp; refresh when online
128128+129129+### Cache Management
130130+131131+- Per-type limits: repo metadata (200 items/7 days), file trees (50/3 days), profiles (100/7 days), search results (20/1 day)
132132+- Eviction on app launch and periodically
133133+- Pinned content exempt from eviction
134134+- Measure and cap IndexedDB usage
135135+136136+### Performance
137137+138138+- Prefetch on hover/visibility for likely navigation targets
139139+- Virtualized lists for large datasets (1000+ items)
140140+- Lazy-load avatars with initials fallback
141141+- Route-level code splitting
142142+- Tree-shake Ionic components
143143+- Target: under 500KB JS, shell first-paint under 2s
144144+145145+## Real-Time & Advanced
146146+147147+**Depends on:** Authentication, Activity Feed
148148+149149+### Jetstream Integration
150150+151151+- Connect to Jetstream for real-time `sh.tangled.*` events
152152+- Filter and normalize into ActivityItem, merge into TanStack Query cache
153153+- Connect on foreground, disconnect on background
154154+- Cursor tracking for gap-free resume
155155+- Battery-aware throttling
156156+157157+### Live UI Indicators
158158+159159+- "New commits" banner in repo detail
160160+- "X new items" pill on activity feed
161161+- Live status updates on PR detail
162162+- Issue comment count updates
163163+164164+### Custom Feeds
165165+166166+- Presets: "My repos", "Watching", "Team"
167167+- Feed builder UI for custom filters
168168+- Local storage in IndexedDB
169169+170170+### Advanced Features
171171+172172+- **Repo forking:** Create repo with source field, show fork status, sync action
173173+- **Labels:** Display color-coded chips, filter by label, add/remove with auth
174174+- **Expanded reactions:** Emoji picker, grouped counts, add/remove
175175+- **PR interdiff:** Compare rounds via `sh.tangled.repo.compare`
176176+- **Knot info:** Show hostname, version, health status on repo detail
+154
docs/specs/data-sources.md
···11+---
22+title: Data Sources & Integration
33+updated: 2026-03-24
44+---
55+66+Twisted pulls data from four external sources and authenticates users via Bluesky OAuth. Each source has a distinct role — no single source is authoritative for everything.
77+88+## Source Overview
99+1010+| Source | What it provides | Access pattern |
1111+| ------------------------ | ------------------------------------------------------------------------------ | ---------------------------------------------------------- |
1212+| **Tangled XRPC (Knots)** | Git data — file trees, blobs, commits, branches, diffs, tags | Direct XRPC calls to the knot hosting each repo |
1313+| **AT Protocol (PDS)** | User records — profiles, repos, issues, PRs, comments, stars, follows | `com.atproto.repo.getRecord` / `listRecords` on user's PDS |
1414+| **Constellation** | Social signals — star counts, follower counts, reaction counts, backlink lists | Public JSON API at `constellation.microcosm.blue` |
1515+| **Tap** | Real-time firehose of AT Protocol record events for indexing | WebSocket consumer, feeds our search index |
1616+1717+## Constellation
1818+1919+[Constellation](https://constellation.microcosm.blue) is a public, self-hosted index of AT Protocol backlinks. It answers "who linked to this?" across the entire network — making it the right source for aggregated social signals instead of maintaining our own counters.
2020+2121+### Key Endpoints
2222+2323+**`GET /xrpc/blue.microcosm.links.getBacklinks`** — Get records linking to a target.
2424+2525+- `subject` (required) — The target (AT-URI, DID, or URL)
2626+- `source` (required) — Collection and path, e.g. `sh.tangled.feed.star:subject.uri`
2727+- `did` — Filter to specific users (repeatable)
2828+- `limit` — Default 16, max 100
2929+- `reverse` — Reverse ordering
3030+3131+**`GET /xrpc/blue.microcosm.links.getBacklinksCount`** — Count of links to a target.
3232+3333+- `subject`, `source` — Same as above
3434+3535+**`GET /xrpc/blue.microcosm.links.getManyToManyCounts`** — Secondary link counts in many-to-many relationships.
3636+3737+- `subject`, `source`, `pathToOther` (required)
3838+- `did`, `otherSubject`, `limit` (optional)
3939+4040+### Usage in Twisted
4141+4242+| Need | Constellation call |
4343+| ------------------------- | ---------------------------------------------------------------------------------------- |
4444+| Star count for a repo | `getBacklinksCount(subject=repo_at_uri, source=sh.tangled.feed.star:subject.uri)` |
4545+| Who starred a repo | `getBacklinks(subject=repo_at_uri, source=sh.tangled.feed.star:subject.uri)` |
4646+| Follower count for a user | `getBacklinksCount(subject=user_did, source=sh.tangled.graph.follow:subject)` |
4747+| Who follows a user | `getBacklinks(subject=user_did, source=sh.tangled.graph.follow:subject)` |
4848+| Reaction count on content | `getBacklinksCount(subject=content_at_uri, source=sh.tangled.feed.reaction:subject.uri)` |
4949+5050+This replaces the need to index and count interaction records ourselves. Our Tap pipeline still indexes interaction records for search and graph discovery, but Constellation is the source of truth for counts and lists.
5151+5252+### Integration Notes
5353+5454+- No authentication required. Constellation asks for a user-agent header with project name and contact.
5555+- Responses are paginated via cursor. Plan for multiple pages when listing (e.g., all followers).
5656+- The API is read-only — social actions (star, follow, react) are still AT Protocol record writes to the user's PDS.
5757+5858+## Tangled XRPC (Knots)
5959+6060+Knots are Tangled's git hosting servers. Each repo lives on a specific knot, identified by the knot DID in the repo's AT Protocol record.
6161+6262+### Endpoints Used
6363+6464+- `sh.tangled.repo.tree` — File tree for a ref
6565+- `sh.tangled.repo.blob` — File content
6666+- `sh.tangled.repo.log` — Commit history
6767+- `sh.tangled.repo.branches` / `sh.tangled.repo.tags` — Refs
6868+- `sh.tangled.repo.getDefaultBranch` — Default branch name
6969+- `sh.tangled.repo.diff` / `sh.tangled.repo.compare` — Diffs
7070+- `sh.tangled.repo.languages` — Language breakdown
7171+- `sh.tangled.knot.version` — Knot software version
7272+7373+### Routing
7474+7575+The app resolves which knot hosts a repo by reading the repo's AT Protocol record (which contains the knot DID), then resolving the knot DID to its service endpoint. XRPC calls go directly to that knot.
7676+7777+The Tangled appview at `tangled.org` serves HTML only — there is no JSON API at the appview level.
7878+7979+## AT Protocol (PDS)
8080+8181+Standard AT Protocol record access for reading and writing user data.
8282+8383+### Read Operations
8484+8585+- `com.atproto.repo.getRecord` — Fetch a single record by collection + rkey
8686+- `com.atproto.repo.listRecords` — List records in a collection with pagination
8787+8888+Used for: profiles, repo metadata, issues, PRs, comments, stars, follows, reactions.
8989+9090+### Write Operations (Authenticated)
9191+9292+- `com.atproto.repo.createRecord` — Create a new record (star, follow, react, issue, comment)
9393+- `com.atproto.repo.deleteRecord` — Delete a record (unstar, unfollow)
9494+9595+All writes go to the authenticated user's PDS using their OAuth session.
9696+9797+### Identity Resolution
9898+9999+- Handle → DID via `com.atproto.identity.resolveHandle`
100100+- DID → DID document via PLC Directory (`plc.directory`) or `.well-known/did.json`
101101+- DID document → PDS endpoint (from `#atprotoPersonalDataServer` service)
102102+103103+## Tap (Firehose)
104104+105105+Tap provides a filtered firehose of AT Protocol events. Our indexer consumes Tap via WebSocket, indexing records into the search database.
106106+107107+### What We Index via Tap
108108+109109+- Repos, issues, PRs, comments, strings, profiles — for full-text search
110110+- Follows — for graph discovery during backfill
111111+- Issue state and PR status changes — for state filtering in search
112112+113113+### What We Don't Need to Count via Tap
114114+115115+Stars, followers, reactions — Constellation handles counts and lists. We still process these events for graph discovery but don't need to maintain our own counters.
116116+117117+### Tap Protocol
118118+119119+- WebSocket connection with cursor-based resume
120120+- Events contain: operation (create/update/delete), DID, collection, rkey, CID, record payload
121121+- Acks required after processing each event
122122+- Backfill via `/repos/add` endpoint to request historical data for specific users
123123+124124+## Bluesky OAuth
125125+126126+Authentication uses AT Protocol OAuth via `@atcute/oauth-browser-client`.
127127+128128+### Flow
129129+130130+1. User enters their handle
131131+2. App resolves handle → DID → PDS → authorization server metadata
132132+3. App initiates OAuth with requested scopes
133133+4. User authorizes in browser, redirected back to app
134134+5. App exchanges code for tokens
135135+6. Session provides `dpopFetch` for authenticated XRPC calls
136136+137137+### Scopes
138138+139139+The app requests scopes for:
140140+141141+- `sh.tangled.feed.star` — Star/unstar repos
142142+- `sh.tangled.graph.follow` — Follow/unfollow users
143143+- `sh.tangled.feed.reaction` — Add reactions
144144+- `sh.tangled.actor.profile` — Edit profile
145145+- `sh.tangled.repo.issue` / `sh.tangled.repo.issue.comment` — Create issues and comments
146146+- `sh.tangled.repo.pull.comment` — Comment on PRs
147147+148148+### Capacitor Integration
149149+150150+On native platforms, OAuth callback uses a deep link URL scheme registered with Capacitor. The app listens via `App.addListener('appUrlOpen', ...)` to catch the redirect.
151151+152152+### Session Management
153153+154154+Tokens are stored in secure storage (encrypted localStorage on web, Capacitor Secure Storage on native). Sessions auto-refresh. The app supports multiple accounts with an account switcher.
+116
docs/specs/search.md
···11+---
22+title: Search
33+updated: 2026-03-24
44+---
55+66+Search lets users find repos, issues, PRs, profiles, and code snippets across the Tangled network. The API supports three modes with progressive capability.
77+88+## Modes
99+1010+### Keyword Search (Implemented)
1111+1212+Full-text search powered by SQLite FTS5 with BM25 scoring. Queries are tokenized, matched against title, body, summary, repo name, author handle, and tags. Results are ranked by relevance with field-specific weights (title highest, then author handle, summary, body).
1313+1414+Snippets are generated from the body field with match terms wrapped in `<mark>` tags.
1515+1616+### Semantic Search (Planned)
1717+1818+Vector similarity search using **Nomic Embed Text v1.5**, deployed on Railway via the [nomic-embed template](https://railway.com/deploy/nomic-embed). The template runs Ollama behind an authenticated Caddy proxy.
1919+2020+**Embedding service:**
2121+2222+- Model: `nomic-embed-text:latest` (8192-token context, 768-dimensional vectors, Matryoshka support for variable dimensionality)
2323+- Endpoint: `POST /api/embeddings` with Bearer token auth
2424+- Request: `{ "model": "nomic-embed-text:latest", "prompt": "text to embed" }`
2525+- Deployed as a separate Railway service alongside the API and indexer
2626+2727+**Pipeline:**
2828+2929+- The embed-worker consumes the `embedding_jobs` queue, calls the Nomic Embed service, and stores 768-dim vectors in the `document_embeddings` table
3030+- Documents are embedded asynchronously after indexing — the embed-worker runs independently of the ingestion loop
3131+- Search queries are embedded at request time (single prompt, low latency)
3232+- Vectors are matched via DiskANN cosine similarity index in Turso
3333+3434+### Hybrid Search (Planned)
3535+3636+Weighted combination of keyword and semantic results. Default blend: 0.65 keyword + 0.35 semantic (configurable). Scores are normalized to [0, 1] before blending. Results are deduplicated by document ID with the higher score retained. Each result includes a `matched_by` field indicating which mode(s) contributed.
3737+3838+## API Contract
3939+4040+**`GET /search`** — Unified endpoint, routes by `mode` parameter.
4141+4242+### Parameters
4343+4444+| Param | Required | Default | Description |
4545+| ------------ | -------- | ------- | ------------------------------------- |
4646+| `q` | Yes | — | Query string |
4747+| `mode` | No | keyword | keyword, semantic, or hybrid |
4848+| `limit` | No | 20 | Results per page (1–100) |
4949+| `offset` | No | 0 | Pagination offset |
5050+| `collection` | No | — | Filter by collection NSID |
5151+| `type` | No | — | Filter by record type |
5252+| `author` | No | — | Filter by handle or DID |
5353+| `repo` | No | — | Filter by repo name or DID |
5454+| `language` | No | — | Filter by primary language |
5555+| `from` | No | — | Created after (ISO 8601) |
5656+| `to` | No | — | Created before (ISO 8601) |
5757+| `state` | No | — | Issue/PR state (open, closed, merged) |
5858+5959+### Response
6060+6161+```json
6262+{
6363+ "query": "tangled vue",
6464+ "mode": "keyword",
6565+ "total": 42,
6666+ "limit": 20,
6767+ "offset": 0,
6868+ "results": [
6969+ {
7070+ "id": "did:plc:abc|sh.tangled.repo|my-repo",
7171+ "collection": "sh.tangled.repo",
7272+ "record_type": "repo",
7373+ "title": "my-repo",
7474+ "summary": "A Vue component library",
7575+ "body_snippet": "...building <mark>Vue</mark> components for <mark>Tangled</mark>...",
7676+ "score": 4.82,
7777+ "matched_by": ["keyword"],
7878+ "repo_name": "my-repo",
7979+ "author_handle": "alice.bsky.social",
8080+ "did": "did:plc:abc",
8181+ "at_uri": "at://did:plc:abc/sh.tangled.repo/my-repo",
8282+ "web_url": "https://tangled.sh/alice.bsky.social/my-repo",
8383+ "created_at": "2026-01-15T10:00:00Z",
8484+ "updated_at": "2026-03-20T14:30:00Z"
8585+ }
8686+ ]
8787+}
8888+```
8989+9090+## Pragmatic Search Strategy
9191+9292+Indexing via Tap is useful but has proven unreliable for maintaining complete, up-to-date coverage. The approach:
9393+9494+1. **Keyword search is the foundation.** It works now and covers the primary use case — finding repos, issues, and people by name or content.
9595+9696+2. **Constellation supplements search results.** Star counts and follower counts from Constellation can be used as ranking signals without needing to index interaction records ourselves.
9797+9898+3. **Semantic search is additive.** It improves discovery for vague queries but isn't required for the app to be useful. It ships when the embedding pipeline is stable.
9999+100100+4. **Graceful degradation.** The mobile app treats the search API as optional. If Twister is unavailable, handle-based direct browsing still works. Search results link into the same browsing screens.
101101+102102+## Quality Improvements (Planned)
103103+104104+- Field weight tuning based on real query patterns
105105+- Recency boost for recently updated content
106106+- Collection-aware ranking (repos weighted higher for short queries)
107107+- Star count as a ranking signal (via Constellation)
108108+- State filtering (exclude closed issues by default)
109109+- Better snippet generation with longer context windows
110110+- Relevance test fixtures for regression testing
111111+112112+## Mobile Integration
113113+114114+The app calls the search API from the Explore tab. Results are displayed in segmented views (repos, users, issues/PRs). Each result links to the corresponding browsing screen (repo detail, profile, issue detail).
115115+116116+When the search API is unavailable, the Explore tab shows an appropriate state rather than breaking. The Home tab's handle-based browsing is fully independent of search.
+7
docs/todo.md
···11+---
22+title: Parking Lot
33+updated: 2026-03-24
44+---
55+66+- Constellation requests would be a good opportunity to dispatch a job to index what
77+ was requested.