a love letter to tangled (android, iOS, and a search API)

docs: consolidate and reorganize

+972 -4454
+16 -8
docs/README.md
··· 1 1 # Twisted Documentation 2 2 3 - Documentation is organized by project: 3 + ## Reference 4 4 5 - - [`app/`](app/) for the Ionic/Vue client 6 - - [`api/`](api/) for the Go Tap/index/search service 5 + Completed work — architecture, APIs, and data models as built. 7 6 8 - ## Quick Links 7 + - [`reference/api.md`](reference/api.md) — Go search API service 8 + - [`reference/app.md`](reference/app.md) — Ionic Vue mobile app 9 + - [`reference/lexicons.md`](reference/lexicons.md) — Tangled AT Protocol record types 9 10 10 - - App spec index: [`app/specs/README.md`](app/specs/README.md) 11 - - App task index: [`app/tasks/phase-6.md`](app/tasks/phase-6.md) 12 - - API spec index: [`api/specs/README.md`](api/specs/README.md) 13 - - API task index: [`api/tasks/README.md`](api/tasks/README.md) 11 + ## Specs 12 + 13 + Forward-looking designs for remaining work. 14 + 15 + - [`specs/data-sources.md`](specs/data-sources.md) — Constellation, Tangled XRPC, Tap, AT Protocol, Bluesky OAuth 16 + - [`specs/search.md`](specs/search.md) — Keyword, semantic, and hybrid search 17 + - [`specs/app-features.md`](specs/app-features.md) — Remaining mobile app features 18 + 19 + ## Roadmap 20 + 21 + - [`roadmap.md`](roadmap.md) — All remaining milestones and tasks
-13
docs/TODO.md
··· 1 - --- 2 - title: To-Dos 3 - updated: 2026-03-23 4 - --- 5 - 6 - A catch-all for ideas, issues/bugs, and future work that doesn't fit into the current specs or tasks. This is a "parking lot." 7 - 8 - ## App 9 - 10 - - Repo stars, forks, etc. are not properly parsed from JSON. 11 - - ATOM/RSS feed link for repos: (`tangled.org/{did}/{repo}/feed.atom`) 12 - 13 - ## API
-100
docs/api/deploy.md
··· 1 - --- 2 - title: "Deployment Guide" 3 - updated: 2026-03-23 4 - --- 5 - 6 - # Railway Deployment Guide 7 - 8 - Deploy the Twister API and indexer as Railway services alongside the existing Tap instance. 9 - 10 - ## Prerequisites 11 - 12 - - Railway project with Tap already deployed 13 - - Turso database created with auth token 14 - - GitHub repository connected to Railway 15 - 16 - ## Service Layout 17 - 18 - | Service | Start Command | Health Check | Public | Port | 19 - | ------- | ----------------- | -------------- | ------ | ---- | 20 - | tap | (pre-existing) | `GET /health` | no | — | 21 - | api | `twister api` | `GET /healthz` | yes | 8080 | 22 - | indexer | `twister indexer` | `GET /health` | no | 9090 | 23 - 24 - All services use the same Docker image. Railway overrides `CMD` with the per-service start command. 25 - 26 - ## Step 1 — Create Services 27 - 28 - In the Railway dashboard, create two new services from the same GitHub repo: 29 - 30 - 1. **api** — set start command to `twister api` 31 - 2. **indexer** — set start command to `twister indexer` 32 - 33 - Both services build from `packages/api/Dockerfile`. 34 - 35 - ## Step 2 — Set Environment Variables 36 - 37 - ### Shared (set on both services) 38 - 39 - ```sh 40 - TURSO_DATABASE_URL=libsql://twister-prod-<org>.turso.io 41 - TURSO_AUTH_TOKEN=<turso-jwt> 42 - LOG_LEVEL=info 43 - LOG_FORMAT=json 44 - ``` 45 - 46 - ### API only 47 - 48 - ```sh 49 - HTTP_BIND_ADDR=:8080 50 - SEARCH_DEFAULT_LIMIT=20 51 - SEARCH_MAX_LIMIT=100 52 - ``` 53 - 54 - ### Indexer only 55 - 56 - ```sh 57 - TAP_URL=wss://${{tap.RAILWAY_PRIVATE_DOMAIN}}/channel 58 - TAP_AUTH_PASSWORD=<tap-admin-password> 59 - INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile,sh.tangled.repo.issue.comment,sh.tangled.repo.pull.comment,sh.tangled.repo.issue.state,sh.tangled.repo.pull.status,sh.tangled.feed.star 60 - INDEXER_HEALTH_ADDR=:9090 61 - ``` 62 - 63 - Use `${{tap.RAILWAY_PRIVATE_DOMAIN}}` to reference Tap's internal hostname. This keeps traffic on Railway's private network. 64 - 65 - ## Step 3 — Configure Health Checks 66 - 67 - In the Railway dashboard, configure per-service: 68 - 69 - - **api**: HTTP health check on path `/healthz`, port `8080` 70 - - **indexer**: HTTP health check on path `/health`, port `9090` 71 - 72 - Railway uses these to gate deployment rollouts and restart unhealthy containers. 73 - 74 - ## Step 4 — Configure Autodeploy 75 - 76 - Connect the GitHub repository in the Railway dashboard. Railway will build and deploy on every push to the configured branch. 77 - 78 - The Dockerfile uses multi-stage builds with `CGO_ENABLED=0` for a static binary on Alpine. 79 - 80 - ## Step 5 — Deploy and Verify 81 - 82 - After the first deploy: 83 - 84 - 1. Confirm API is healthy: `curl https://<api-domain>/healthz` 85 - 2. Confirm API readiness: `curl https://<api-domain>/readyz` 86 - 3. Check indexer health in Railway logs (health check on `:9090/health`) 87 - 88 - ## Step 6 — Bootstrap Content 89 - 90 - Run graph backfill to populate initial content from seed users: 91 - 92 - ```bash 93 - twister backfill --seeds=docs/api/seeds.txt --max-hops=2 94 - ``` 95 - 96 - Wait for Tap to finish historical sync, then verify search returns results: 97 - 98 - ```bash 99 - curl "https://<api-domain>/search?q=tangled" 100 - ```
-9
docs/api/seeds.txt
··· 1 - # Example seed handles for Twister graph backfill 2 - # One DID or handle per line. Comments and blank lines are ignored. 3 - 4 - anirudh.fi 5 - atprotocol.dev 6 - zzstoatzz.io 7 - oppi.li 8 - desertthunder.dev 9 - tangled.org
-313
docs/api/specs/01-architecture.md
··· 1 - --- 2 - title: "Spec 01 — Architecture" 3 - updated: 2026-03-22 4 - --- 5 - 6 - ## 1. Purpose 7 - 8 - Build a Go-based search service for Tangled content on AT Protocol that: 9 - 10 - - ingests Tangled records through **Tap** (already deployed on Railway) 11 - - denormalizes them into internal search documents 12 - - indexes them in **Turso/libSQL** 13 - - exposes a search API with **keyword**, **semantic**, and **hybrid** retrieval modes 14 - - exposes index-backed summary APIs for data the public Tangled APIs do not answer efficiently, such as followers 15 - 16 - ## 2. Functional Goals 17 - 18 - The system shall: 19 - 20 - - index Tangled-specific ATProto collections under the `sh.tangled.*` namespace 21 - - support initial backfill and continuous incremental sync via Tap 22 - - support lexical retrieval using Turso's Tantivy-backed FTS 23 - - support semantic retrieval using vector embeddings 24 - - support hybrid ranking combining lexical and semantic signals 25 - - expose stable HTTP APIs for search, document lookup, and graph/profile summaries 26 - - support deployment on **Railway** 27 - 28 - ## 3. Non-Functional Goals 29 - 30 - The system shall prioritize: 31 - 32 - - **correctness of sync** — cursors never advance ahead of committed data 33 - - **operational simplicity** — single binary, subcommand-driven 34 - - **incremental delivery** — keyword search ships before embeddings 35 - - **small deployable services** — process groups, not microservices 36 - - **reindexability** — any document or collection can be re-normalized and re-indexed 37 - - **low coupling** — sync, indexing, and serving are independent concerns 38 - 39 - ## 4. Out of Scope (v1) 40 - 41 - - code-aware symbol search 42 - - sourcegraph-style structural search 43 - - personalized ranking 44 - - access control beyond public/private visibility flags in indexed records 45 - - full analytics pipeline 46 - - custom ANN infrastructure outside Turso/libSQL 47 - 48 - ## 5. Design Principles 49 - 50 - 1. **Tap owns synchronization correctness.** The application does not consume the raw firehose. Tap handles connection, cryptographic verification, backfill, and filtering. 51 - 52 - 2. **The indexer owns denormalization.** Raw ATProto records are never queried directly by the public API. 53 - 54 - 3. **The public API serves denormalized projections.** Search ranking and graph summaries depend on the indexed document model, not transport. 55 - 56 - 4. **Keyword search is the baseline.** Semantic and hybrid search are layered on top. 57 - 58 - 5. **Embeddings are asynchronous.** Ingestion is never blocked on vector generation unless explicitly configured. 59 - 60 - 6. **Twister complements public Tangled APIs.** Repo detail stays on knots/PDSes; the index adds discovery and cross-network summaries. 61 - 62 - ## 6. External Systems 63 - 64 - - **AT Protocol network** — source of all Tangled content 65 - - **Tap** — filtered event delivery from the AT Protocol firehose (deployed on Railway) 66 - - **Turso/libSQL** — relational storage, Tantivy-backed FTS, and native vector search 67 - - **Ollama** — local embedding model server (nomic-embed-text or EmbeddingGemma); deployed as a Railway sidecar service 68 - - **Railway** — deployment platform for Twister services, Tap, and Ollama 69 - 70 - ## 7. Architecture Summary 71 - 72 - ```text 73 - ATProto Firehose / PDS 74 - 75 - 76 - Tap (Railway) 77 - │ WebSocket / webhook JSON events 78 - 79 - Go Indexer Service 80 - ├─ decode Tap events 81 - ├─ normalize records → documents 82 - ├─ upsert documents 83 - ├─ schedule embeddings 84 - └─ persist sync cursor 85 - 86 - 87 - Turso/libSQL 88 - ├─ documents table 89 - ├─ document_embeddings table 90 - ├─ FTS index (Tantivy-backed) 91 - ├─ vector index (DiskANN) 92 - └─ sync_state table 93 - 94 - 95 - Go Search API 96 - ├─ keyword search (fts_match / fts_score) 97 - ├─ semantic search (vector_top_k) 98 - ├─ hybrid search (weighted merge) 99 - ├─ profile and graph summaries 100 - └─ document fetch 101 - ``` 102 - 103 - ## 8. Runtime Units 104 - 105 - | Unit | Role | Deployment | 106 - | -------------- | -------------------------------------------- | -------------------------- | 107 - | `api` | HTTP search, graph summary, and document API | Railway service (public) | 108 - | `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) | 109 - | `embed-worker` | Async embedding generation via Ollama | Optional Railway service | 110 - | `ollama` | Local embedding model server | Railway service (internal) | 111 - | `tap` | ATProto sync | Railway (already deployed) | 112 - 113 - ## 9. Repository Structure 114 - 115 - ```text 116 - main.go 117 - 118 - internal/ 119 - api/ # HTTP handlers, middleware, routes 120 - config/ # Config struct, env parsing 121 - embed/ # Embedding provider abstraction, worker 122 - index/ # FTS and vector index management 123 - ingest/ # Tap event consumer, ingestion loop 124 - normalize/ # Per-collection record → document adapters 125 - observability/# Structured logging, metrics 126 - ranking/ # Score normalization, hybrid merge 127 - search/ # Search orchestration (keyword, semantic, hybrid) 128 - store/ # DB access layer, migrations, domain types 129 - tapclient/ # Tap WebSocket/webhook client 130 - ``` 131 - 132 - ## 10. Binary Subcommands 133 - 134 - ```bash 135 - twister api # Start HTTP search API 136 - twister indexer # Start Tap consumer / indexer 137 - twister embed-worker # Start async embedding worker 138 - twister reindex # Re-normalize and upsert documents 139 - twister reembed # Re-generate embeddings 140 - twister backfill # Bootstrap index from seed users 141 - twister healthcheck # One-shot health probe 142 - ``` 143 - 144 - ## 11. Technology Choices 145 - 146 - ### Embedding: Ollama (self-hosted) 147 - 148 - Embeddings are generated locally via Ollama rather than an external API service. This eliminates per-token costs, external service dependencies, and data egress concerns. 149 - 150 - **Recommended models (in order of preference):** 151 - 152 - | Model | Parameters | Dimensions | Quantized Size | Notes | 153 - |-------|-----------|------------|----------------|-------| 154 - | nomic-embed-text-v1.5 | 137M | 768 (Matryoshka: 64–768) | ~262 MB (F16) | 8192 context, battle-tested, Railway template exists | 155 - | EmbeddingGemma | 308M | 768 | <200 MB (quantized) | Best-in-class MTEB for size, released Sept 2025 | 156 - | all-minilm | 23M | 384 | ~46 MB | Budget option, lower quality | 157 - 158 - **Go integration:** Use the official Ollama Go client (`github.com/ollama/ollama/api`) with the `Embed()` method. The embed-worker calls Ollama over Railway's internal network (`ollama.railway.internal:11434`). 159 - 160 - **Railway deployment:** Ollama runs as a separate Railway service (~1–2 GB RAM, 1–2 vCPU, ~$10–30/mo). The nomic-embed Railway template provides a proven starting point. No cold starts on always-on services; model loads in 2–10 seconds on first request after deploy. 161 - 162 - ### Language: Go 163 - 164 - Go is the implementation language for the API server, indexer, embedding worker, and CLI commands. Rationale: straightforward long-running services, excellent HTTP support, good concurrency model, small container footprint. 165 - 166 - ### Sync Layer: Tap 167 - 168 - Tap is the only supported sync source in v1. It handles firehose connection, cryptographic verification, backfill, and filtering, then delivers simple JSON events via WebSocket or webhook. 169 - 170 - **Tap is already deployed on Railway.** Twister connects to it as a WebSocket client. 171 - 172 - #### Tap Capabilities 173 - 174 - - Validates repo structure, MST integrity, and identity signatures 175 - - Automatic backfill fetches full repo history from PDS when repos are added 176 - - Filtered output by DID list, collection, or full network mode 177 - - Ordering guarantees: historical events (`live: false`) delivered before live events (`live: true`) 178 - 179 - #### Tap Delivery Modes 180 - 181 - | Mode | Config | Behavior | 182 - | -------------------------- | ----------------------- | ------------------------------------------------- | 183 - | WebSocket + acks (default) | — | Client acks each event; no data loss | 184 - | Fire-and-forget | `TAP_DISABLE_ACKS=true` | Events marked acked on receipt; simpler but lossy | 185 - | Webhook | `TAP_WEBHOOK_URL=...` | Events POSTed as JSON; acked on HTTP 200 | 186 - 187 - #### Tap API Endpoints (reference) 188 - 189 - | Endpoint | Method | Purpose | 190 - | --------------------- | ------ | ------------------------------------- | 191 - | `/health` | GET | Health check | 192 - | `/channel` | WS | WebSocket event stream | 193 - | `/repos/add` | POST | Add DIDs to track | 194 - | `/repos/remove` | POST | Stop tracking a repo | 195 - | `/info/:did` | GET | Repo state, rev, record count, errors | 196 - | `/stats/repo-count` | GET | Total tracked repos | 197 - | `/stats/record-count` | GET | Total tracked records | 198 - | `/stats/cursors` | GET | Firehose and list repos cursors | 199 - 200 - #### Key Tap Configuration 201 - 202 - | Variable | Default | Purpose | 203 - | ------------------------ | ------- | ---------------------------------------------------------------------------------- | 204 - | `TAP_SIGNAL_COLLECTION` | — | Auto-track repos with records in this collection | 205 - | `TAP_COLLECTION_FILTERS` | — | Comma-separated collection filters (e.g., `sh.tangled.repo,sh.tangled.repo.issue`) | 206 - | `TAP_ADMIN_PASSWORD` | — | Basic auth for API access | 207 - | `TAP_DISABLE_ACKS` | `false` | Fire-and-forget mode | 208 - | `TAP_WEBHOOK_URL` | — | Webhook delivery URL | 209 - 210 - ### Storage and Search: Turso/libSQL 211 - 212 - Turso/libSQL is used for relational metadata storage, Tantivy-backed full-text search, and native vector search. 213 - 214 - #### Go SDK Options 215 - 216 - | Package | CGo | Embedded Replicas | Remote | 217 - | -------------------------------------------------- | --- | ----------------- | ------ | 218 - | `github.com/tursodatabase/go-libsql` | Yes | Yes | Yes | 219 - | `github.com/tursodatabase/libsql-client-go/libsql` | No | No | Yes | 220 - 221 - Both register as `database/sql` drivers under `"libsql"`. They cannot be imported in the same binary. 222 - 223 - **Recommendation:** Use `libsql-client-go` (pure Go, remote-only) unless embedded replicas are needed for local read performance. 224 - 225 - #### Connection Patterns 226 - 227 - ```go 228 - // Remote only (pure Go, no CGo) 229 - import _ "github.com/tursodatabase/libsql-client-go/libsql" 230 - db, err := sql.Open("libsql", "libsql://your-db.turso.io?authToken=TOKEN") 231 - 232 - // Embedded replica (CGo required) 233 - import "github.com/tursodatabase/go-libsql" 234 - connector, err := libsql.NewEmbeddedReplicaConnector( 235 - "local.db", "libsql://your-db.turso.io", 236 - libsql.WithAuthToken("TOKEN"), 237 - libsql.WithSyncInterval(time.Minute), 238 - ) 239 - db := sql.OpenDB(connector) 240 - ``` 241 - 242 - #### Full-Text Search (Tantivy-backed) 243 - 244 - Turso FTS is **not** standard SQLite FTS5. It uses Tantivy under the hood. 245 - 246 - ```sql 247 - -- Create FTS index with per-column tokenizers and weights 248 - CREATE INDEX idx_docs_fts ON documents USING fts ( 249 - title WITH tokenizer=default, 250 - body WITH tokenizer=default, 251 - summary WITH tokenizer=default, 252 - repo_name WITH tokenizer=simple, 253 - author_handle WITH tokenizer=raw 254 - ) WITH (weights='title=3.0,repo_name=2.5,author_handle=2.0,summary=1.5,body=1.0'); 255 - 256 - -- Filter by match 257 - SELECT id, title FROM documents 258 - WHERE fts_match(title, body, summary, repo_name, author_handle, 'search query'); 259 - 260 - -- BM25 scoring 261 - SELECT id, title, fts_score(title, body, summary, repo_name, author_handle, 'search query') AS score 262 - FROM documents 263 - ORDER BY score DESC; 264 - 265 - -- Highlighting 266 - SELECT fts_highlight(title, '<b>', '</b>', 'search query') AS highlighted 267 - FROM documents; 268 - ``` 269 - 270 - **Available tokenizers:** `default` (Unicode-aware), `raw` (exact match), `simple` (whitespace+punctuation), `whitespace`, `ngram` (2-3 char n-grams). 271 - 272 - **Query syntax (Tantivy):** `database AND search`, `database NOT nosql`, `"exact phrase"`, `data*` (prefix), `title:database` (field-specific), `title:database^2` (boosting). 273 - 274 - **Limitations:** No snippet function (use highlighting). No automatic segment merging (manual `OPTIMIZE INDEX` required). 275 - No read-your-writes within a transaction. No MATCH operator (use `fts_match()` function). 276 - 277 - #### Vector Search 278 - 279 - ```sql 280 - -- Vector column type 281 - embedding F32_BLOB(768) 282 - 283 - -- Insert 284 - INSERT INTO document_embeddings (document_id, embedding, ...) 285 - VALUES (?, vector32(?), ...); -- ? is JSON array '[0.1, 0.2, ...]' 286 - 287 - -- Brute-force similarity search 288 - SELECT d.id, vector_distance_cos(e.embedding, vector32(?)) AS distance 289 - FROM documents d 290 - JOIN document_embeddings e ON d.id = e.document_id 291 - ORDER BY distance ASC LIMIT 20; 292 - 293 - -- Create ANN index (DiskANN) 294 - CREATE INDEX idx_embeddings ON document_embeddings( 295 - libsql_vector_idx(embedding, 'metric=cosine') 296 - ); 297 - 298 - -- ANN search via index 299 - SELECT d.id, d.title 300 - FROM vector_top_k('idx_embeddings', vector32(?), 20) AS v 301 - JOIN document_embeddings e ON e.rowid = v.id 302 - JOIN documents d ON d.id = e.document_id; 303 - ``` 304 - 305 - **Vector types:** `F32_BLOB` (recommended), `F16_BLOB`, `F64_BLOB`, `F8_BLOB`, `F1BIT_BLOB`. 306 - 307 - **Distance functions:** `vector_distance_cos` (cosine), `vector_distance_l2` (Euclidean). 308 - 309 - **Max dimensions:** 65,536. Dimension is fixed at table creation. 310 - 311 - ### Deployment: Railway 312 - 313 - Railway is the deployment platform. It supports health checks, autodeploy, per-service scaling, and internal networking. Tap is already deployed here. Twister deploys as separate Railway services (api, indexer, embed-worker) within the same project.
-192
docs/api/specs/02-tangled-lexicons.md
··· 1 - --- 2 - title: "Spec 02 — Tangled Lexicons" 3 - updated: 2026-03-22 4 - source: https://github.com/mary-ext/atcute/tree/trunk/packages/definitions/tangled/lexicons/sh/tangled 5 - --- 6 - 7 - All Tangled records use the `sh.tangled.*` namespace. Records use TID keys unless noted otherwise. 8 - 9 - ## 1. Searchable Record Types 10 - 11 - These are the primary records Twister indexes for search. 12 - 13 - ### sh.tangled.repo 14 - 15 - Repository metadata. Key: `tid`. 16 - 17 - | Field | Type | Required | Description | 18 - | ------------- | -------- | -------- | ---------------------------------------------- | 19 - | `name` | string | yes | Repository name | 20 - | `knot` | string | yes | Knot (hosting node) where the repo was created | 21 - | `spindle` | string | no | CI runner for jobs | 22 - | `description` | string | no | 1–140 graphemes | 23 - | `website` | uri | no | Related URI | 24 - | `topics` | string[] | no | Up to 50 topic tags, each 1–50 chars | 25 - | `source` | uri | no | Upstream source | 26 - | `labels` | at-uri[] | no | Label definitions this repo subscribes to | 27 - | `createdAt` | datetime | yes | | 28 - 29 - ### sh.tangled.repo.issue 30 - 31 - Issue on a repository. Key: `tid`. 32 - 33 - | Field | Type | Required | Description | 34 - | ------------ | -------- | -------- | -------------------------------- | 35 - | `repo` | at-uri | yes | AT-URI of the parent repo record | 36 - | `title` | string | yes | Issue title | 37 - | `body` | string | no | Issue body (markdown) | 38 - | `createdAt` | datetime | yes | | 39 - | `mentions` | did[] | no | Mentioned users | 40 - | `references` | at-uri[] | no | Referenced records | 41 - 42 - ### sh.tangled.repo.pull 43 - 44 - Pull request. Key: `tid`. 45 - 46 - | Field | Type | Required | Description | 47 - | ------------ | -------- | -------- | -------------------------------------------------- | 48 - | `target` | object | yes | `{repo: at-uri, branch: string}` | 49 - | `title` | string | yes | PR title | 50 - | `body` | string | no | PR description (markdown) | 51 - | `patchBlob` | blob | yes | Patch content (`text/x-patch`) | 52 - | `source` | object | no | `{branch: string, sha: string(40), repo?: at-uri}` | 53 - | `createdAt` | datetime | yes | | 54 - | `mentions` | did[] | no | Mentioned users | 55 - | `references` | at-uri[] | no | Referenced records | 56 - 57 - ### sh.tangled.string 58 - 59 - Code snippet / gist. Key: `tid`. 60 - 61 - | Field | Type | Required | Description | 62 - | ------------- | -------- | -------- | ------------------- | 63 - | `filename` | string | yes | 1–140 graphemes | 64 - | `description` | string | yes | Up to 280 graphemes | 65 - | `createdAt` | datetime | yes | | 66 - | `contents` | string | yes | Snippet content | 67 - 68 - ### sh.tangled.actor.profile 69 - 70 - User profile. Key: `literal:self` (singleton per account). 71 - 72 - | Field | Type | Required | Description | 73 - | -------------------- | -------- | -------- | ---------------------------- | 74 - | `avatar` | blob | no | PNG/JPEG, max 1MB | 75 - | `description` | string | no | Bio, up to 256 graphemes | 76 - | `links` | uri[] | no | Up to 5 social/website links | 77 - | `stats` | string[] | no | Up to 2 vanity stat types | 78 - | `bluesky` | boolean | yes | Show Bluesky link | 79 - | `location` | string | no | Up to 40 graphemes | 80 - | `pinnedRepositories` | at-uri[] | no | Up to 6 pinned repos | 81 - | `pronouns` | string | no | Up to 40 chars | 82 - 83 - ## 2. Interaction Record Types 84 - 85 - These records represent social interactions. They may be indexed for counts/signals but are lower priority for text search. 86 - 87 - ### sh.tangled.feed.star 88 - 89 - Star/favorite on a record. Key: `tid`. 90 - 91 - | Field | Type | Required | 92 - | ----------- | -------- | -------- | 93 - | `subject` | at-uri | yes | 94 - | `createdAt` | datetime | yes | 95 - 96 - ### sh.tangled.feed.reaction 97 - 98 - Emoji reaction on a record. Key: `tid`. 99 - 100 - | Field | Type | Required | Description | 101 - | ----------- | -------- | -------- | ------------------------------- | 102 - | `subject` | at-uri | yes | | 103 - | `reaction` | string | yes | One of: 👍 👎 😆 🎉 🫤 ❤️ 🚀 👀 | 104 - | `createdAt` | datetime | yes | | 105 - 106 - ### sh.tangled.graph.follow 107 - 108 - Follow a user. Key: `tid`. 109 - 110 - | Field | Type | Required | 111 - | ----------- | -------- | -------- | 112 - | `subject` | did | yes | 113 - | `createdAt` | datetime | yes | 114 - 115 - ## 3. State Record Types 116 - 117 - These records track mutable state of issues and PRs. 118 - 119 - ### sh.tangled.repo.issue.state 120 - 121 - | Field | Type | Required | Description | 122 - | ------- | ------ | -------- | -------------------------------------------------------------------------- | 123 - | `issue` | at-uri | yes | | 124 - | `state` | string | yes | `sh.tangled.repo.issue.state.open` or `sh.tangled.repo.issue.state.closed` | 125 - 126 - ### sh.tangled.repo.pull.status 127 - 128 - | Field | Type | Required | Description | 129 - | -------- | ------ | -------- | ----------------------------------------------------------- | 130 - | `pull` | at-uri | yes | | 131 - | `status` | string | yes | `sh.tangled.repo.pull.status.open`, `.closed`, or `.merged` | 132 - 133 - ## 4. Comment Record Types 134 - 135 - ### sh.tangled.repo.issue.comment 136 - 137 - | Field | Type | Required | Description | 138 - | ------------ | -------- | -------- | ------------------------------ | 139 - | `issue` | at-uri | yes | Parent issue | 140 - | `body` | string | yes | Comment body | 141 - | `createdAt` | datetime | yes | | 142 - | `replyTo` | at-uri | no | Parent comment (for threading) | 143 - | `mentions` | did[] | no | | 144 - | `references` | at-uri[] | no | | 145 - 146 - ### sh.tangled.repo.pull.comment 147 - 148 - | Field | Type | Required | Description | 149 - | ------------ | -------- | -------- | ------------ | 150 - | `pull` | at-uri | yes | Parent PR | 151 - | `body` | string | yes | Comment body | 152 - | `createdAt` | datetime | yes | | 153 - | `mentions` | did[] | no | | 154 - | `references` | at-uri[] | no | | 155 - 156 - ## 5. Infrastructure Record Types 157 - 158 - These are not indexed for search but may be consumed for operational context. 159 - 160 - | Collection | Description | 161 - | ----------------------------- | ---------------------------------------------------- | 162 - | `sh.tangled.label.definition` | Label definitions with name, valueType, scope, color | 163 - | `sh.tangled.label.op` | Label application operations | 164 - | `sh.tangled.git.refUpdate` | Git reference update events | 165 - | `sh.tangled.knot.member` | Knot membership | 166 - | `sh.tangled.spindle.member` | Spindle (CI runner) membership | 167 - | `sh.tangled.pipeline.status` | CI pipeline status | 168 - 169 - ## 6. Collection Priority for v1 Indexing 170 - 171 - | Priority | Collection | Rationale | 172 - | -------- | ------------------------------- | ------------------------------------ | 173 - | P0 | `sh.tangled.repo` | Core searchable content | 174 - | P0 | `sh.tangled.repo.issue` | High-signal text content | 175 - | P0 | `sh.tangled.repo.pull` | High-signal text content | 176 - | P1 | `sh.tangled.string` | Searchable code snippets | 177 - | P1 | `sh.tangled.actor.profile` | User/org discovery | 178 - | P2 | `sh.tangled.repo.issue.comment` | Body text, high volume | 179 - | P2 | `sh.tangled.repo.pull.comment` | Body text, high volume | 180 - | P2 | `sh.tangled.repo.issue.state` | State for filtering, not text search | 181 - | P2 | `sh.tangled.repo.pull.status` | State for filtering, not text search | 182 - | P3 | `sh.tangled.feed.star` | Ranking signal (star count) | 183 - | P3 | `sh.tangled.feed.reaction` | Ranking signal | 184 - | P3 | `sh.tangled.graph.follow` | Ranking signal | 185 - 186 - ### Tap Collection Filter for v1 187 - 188 - ```sh 189 - TAP_COLLECTION_FILTERS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.issue.comment,sh.tangled.repo.issue.state,sh.tangled.repo.pull,sh.tangled.repo.pull.comment,sh.tangled.repo.pull.status,sh.tangled.string,sh.tangled.actor.profile,sh.tangled.feed.star 190 - 191 - # or sh.tangled.* 192 - ```
-184
docs/api/specs/03-data-model.md
··· 1 - --- 2 - title: "Spec 03 — Data Model" 3 - updated: 2026-03-22 4 - --- 5 - 6 - ## 1. Search Document 7 - 8 - A **search document** is the internal denormalized representation used for retrieval. It is derived from one or more ATProto records via normalization. 9 - 10 - ### Stable Identifier 11 - 12 - ```sh 13 - id = did + "|" + collection + "|" + rkey 14 - ``` 15 - 16 - Example: `did:plc:abc123|sh.tangled.repo|3kb3fge5lm32x` 17 - 18 - ### Required Fields 19 - 20 - | Field | Type | Description | 21 - | --------------- | ------- | -------------------------------------------------------------------------- | 22 - | `id` | TEXT PK | Stable composite identifier | 23 - | `did` | TEXT | Author DID | 24 - | `collection` | TEXT | ATProto collection NSID | 25 - | `rkey` | TEXT | Record key (TID) | 26 - | `at_uri` | TEXT | Full AT-URI | 27 - | `cid` | TEXT | Content identifier (hash) | 28 - | `record_type` | TEXT | Normalized type label (e.g., `repo`, `issue`, `pull`, `string`, `profile`) | 29 - | `title` | TEXT | Normalized title | 30 - | `body` | TEXT | Normalized body text | 31 - | `summary` | TEXT | Short summary / description | 32 - | `repo_did` | TEXT | DID of the repo owner (resolved from at-uri for issues/PRs) | 33 - | `repo_name` | TEXT | Repository name (resolved) | 34 - | `author_handle` | TEXT | Author handle (resolved via identity) | 35 - | `tags_json` | TEXT | JSON array of tags/topics | 36 - | `language` | TEXT | Detected or declared language | 37 - | `created_at` | TEXT | Record creation timestamp (ISO 8601) | 38 - | `updated_at` | TEXT | Last record update timestamp | 39 - | `indexed_at` | TEXT | When this document was last indexed | 40 - | `deleted_at` | TEXT | Soft-delete timestamp (tombstone) | 41 - 42 - ### Derived Fields (not stored in documents table) 43 - 44 - | Field | Location | Description | 45 - | ---------------- | -------------------------------------- | ------------------------------ | 46 - | Embedding vector | `document_embeddings` table | F32_BLOB(N) | 47 - | FTS index | Turso FTS index | Tantivy-backed full-text index | 48 - | Star count | Aggregated from `sh.tangled.feed.star` | Ranking signal | 49 - 50 - ## 2. Core Documents Table 51 - 52 - ```sql 53 - CREATE TABLE documents ( 54 - id TEXT PRIMARY KEY, 55 - did TEXT NOT NULL, 56 - collection TEXT NOT NULL, 57 - rkey TEXT NOT NULL, 58 - at_uri TEXT NOT NULL, 59 - cid TEXT NOT NULL, 60 - record_type TEXT NOT NULL, 61 - title TEXT, 62 - body TEXT, 63 - summary TEXT, 64 - repo_did TEXT, 65 - repo_name TEXT, 66 - author_handle TEXT, 67 - tags_json TEXT, 68 - language TEXT, 69 - created_at TEXT, 70 - updated_at TEXT, 71 - indexed_at TEXT NOT NULL, 72 - deleted_at TEXT 73 - ); 74 - 75 - CREATE INDEX idx_documents_did ON documents(did); 76 - CREATE INDEX idx_documents_collection ON documents(collection); 77 - CREATE INDEX idx_documents_record_type ON documents(record_type); 78 - CREATE INDEX idx_documents_repo_did ON documents(repo_did); 79 - CREATE INDEX idx_documents_created_at ON documents(created_at); 80 - CREATE INDEX idx_documents_deleted_at ON documents(deleted_at); 81 - ``` 82 - 83 - ## 3. FTS Index 84 - 85 - ```sql 86 - CREATE INDEX idx_documents_fts ON documents USING fts ( 87 - title WITH tokenizer=default, 88 - body WITH tokenizer=default, 89 - summary WITH tokenizer=default, 90 - repo_name WITH tokenizer=simple, 91 - author_handle WITH tokenizer=raw, 92 - tags_json WITH tokenizer=simple 93 - ) WITH (weights='title=3.0,repo_name=2.5,author_handle=2.0,summary=1.5,tags_json=1.2,body=1.0'); 94 - ``` 95 - 96 - ### FTS Maintenance 97 - 98 - Turso's Tantivy-backed FTS uses `NoMergePolicy` — segment count grows with writes and is never automatically compacted. This increases query fan-out over time. 99 - 100 - **Required maintenance:** Run `OPTIMIZE INDEX idx_documents_fts;` periodically (e.g., daily cron or after bulk backfill). This merges segments and reclaims space. 101 - 102 - **Known limitations:** 103 - - No read-your-writes within a transaction — FTS queries see a pre-commit snapshot 104 - - No snippet function (use `fts_highlight()` for highlighting) 105 - - FTS is experimental in Turso; requires the `fts` feature flag 106 - 107 - ## 4. Embeddings Table 108 - 109 - ```sql 110 - CREATE TABLE document_embeddings ( 111 - document_id TEXT PRIMARY KEY REFERENCES documents(id), 112 - embedding F32_BLOB(768), 113 - embedding_model TEXT NOT NULL, 114 - embedded_at TEXT NOT NULL 115 - ); 116 - 117 - CREATE INDEX idx_embeddings_vec ON document_embeddings( 118 - libsql_vector_idx(embedding, 'metric=cosine') 119 - ); 120 - ``` 121 - 122 - The vector dimension (768) matches nomic-embed-text-v1.5 and EmbeddingGemma defaults. Changing models may require a new column or table migration if the dimension changes. 123 - 124 - ### Vector Index Tuning 125 - 126 - The DiskANN index accepts tuning parameters at creation time: 127 - 128 - ```sql 129 - CREATE INDEX idx_embeddings_vec ON document_embeddings( 130 - libsql_vector_idx(embedding, 'metric=cosine', 'max_neighbors=50', 'search_l=200') 131 - ); 132 - ``` 133 - 134 - | Parameter | Default | Description | 135 - |-----------|---------|-------------| 136 - | `max_neighbors` | 3*sqrt(D) | Graph connectivity; higher = better recall, more storage | 137 - | `search_l` | 200 | Neighbors visited during search; higher = better recall, slower | 138 - | `insert_l` | 70 | Neighbors visited during insert | 139 - | `alpha` | 1.2 | Graph sparsity factor | 140 - | `compress_neighbors` | — | Quantize neighbor vectors for storage savings | 141 - 142 - Start with defaults and tune after measuring recall on representative queries. 143 - 144 - ## 5. Sync State Table 145 - 146 - ```sql 147 - CREATE TABLE sync_state ( 148 - consumer_name TEXT PRIMARY KEY, 149 - cursor TEXT NOT NULL, 150 - high_water_mark TEXT, 151 - updated_at TEXT NOT NULL 152 - ); 153 - ``` 154 - 155 - Stores the Tap event ID that has been successfully committed. On restart, the indexer resumes from this cursor. 156 - 157 - ## 6. Embedding Jobs Table 158 - 159 - ```sql 160 - CREATE TABLE embedding_jobs ( 161 - document_id TEXT PRIMARY KEY REFERENCES documents(id), 162 - status TEXT NOT NULL, -- 'pending', 'processing', 'completed', 'failed' 163 - attempts INTEGER NOT NULL DEFAULT 0, 164 - last_error TEXT, 165 - scheduled_at TEXT NOT NULL, 166 - updated_at TEXT NOT NULL 167 - ); 168 - 169 - CREATE INDEX idx_embedding_jobs_status ON embedding_jobs(status); 170 - ``` 171 - 172 - ## 7. Issue/PR State Cache (optional) 173 - 174 - To support filtering search results by issue state or PR status without joining back to the raw records: 175 - 176 - ```sql 177 - CREATE TABLE record_state ( 178 - subject_uri TEXT PRIMARY KEY, -- at-uri of the issue or PR 179 - state TEXT NOT NULL, -- 'open', 'closed', 'merged' 180 - updated_at TEXT NOT NULL 181 - ); 182 - ``` 183 - 184 - Updated when `sh.tangled.repo.issue.state` or `sh.tangled.repo.pull.status` events are ingested.
-364
docs/api/specs/04-data-pipeline.md
··· 1 - --- 2 - title: "Spec 04 — Data Pipeline" 3 - updated: 2026-03-22 4 - --- 5 - 6 - Covers the full data path: Tap event ingestion, record normalization, and failure handling. 7 - 8 - ## 1. Tap Event Format 9 - 10 - ### Record Events 11 - 12 - ```json 13 - { 14 - "id": 12345, 15 - "type": "record", 16 - "record": { 17 - "live": true, 18 - "rev": "3kb3fge5lm32x", 19 - "did": "did:plc:abc123", 20 - "collection": "sh.tangled.repo", 21 - "rkey": "3kb3fge5lm32x", 22 - "action": "create", 23 - "cid": "bafyreig...", 24 - "record": { 25 - "$type": "sh.tangled.repo", 26 - "name": "my-project", 27 - "knot": "knot.tangled.org", 28 - "description": "A cool project", 29 - "topics": ["go", "search"], 30 - "createdAt": "2026-03-22T12:00:00.000Z" 31 - } 32 - } 33 - } 34 - ``` 35 - 36 - Key fields: 37 - 38 - - `id` — monotonic event ID, used as cursor 39 - - `type` — `"record"` or `"identity"` 40 - - `record.live` — `true` for real-time events, `false` for backfill 41 - - `record.action` — `"create"`, `"update"`, or `"delete"` 42 - - `record.did` — author DID 43 - - `record.collection` — ATProto collection NSID 44 - - `record.rkey` — record key 45 - - `record.cid` — content identifier 46 - - `record.record` — the full ATProto record payload (absent on delete) 47 - 48 - ### Identity Events 49 - 50 - ```json 51 - { 52 - "id": 12346, 53 - "type": "identity", 54 - "identity": { 55 - "did": "did:plc:abc123", 56 - "handle": "alice.tangled.org", 57 - "isActive": true, 58 - "status": "active" 59 - } 60 - } 61 - ``` 62 - 63 - Identity events are always delivered for tracked repos, regardless of collection filters. 64 - 65 - ## 2. WebSocket Protocol 66 - 67 - ### Connection 68 - 69 - Connect to `wss://<tap-host>/channel` (or `ws://` for local dev). 70 - 71 - If `TAP_ADMIN_PASSWORD` is set, authenticate with HTTP Basic auth (`admin:<password>`). 72 - 73 - ### Acknowledgment Protocol 74 - 75 - Default mode requires the client to ack each event by sending the event `id` back over the WebSocket. Events are retried after `TAP_RETRY_TIMEOUT` (default 60s) if unacked. 76 - 77 - For simpler development, set `TAP_DISABLE_ACKS=true` on Tap for fire-and-forget delivery. 78 - 79 - ### Ordering Guarantees 80 - 81 - Events are ordered **per-repo** (per-DID), not globally: 82 - 83 - - **Historical events** (`live: false`) may be sent concurrently within a repo 84 - - **Live events** (`live: true`) are synchronization barriers — all prior events for that repo must complete before a live event is sent 85 - - No ordering guarantee across different repos 86 - 87 - Example sequence for one repo: `H1, H2, L1, H3, H4, L2` 88 - 89 - - H1 and H2 sent concurrently 90 - - Wait for completion, send L1 alone 91 - - Wait for L1, send H3 and H4 concurrently 92 - - Wait for completion, send L2 alone 93 - 94 - ### Delivery Guarantee 95 - 96 - Events are delivered **at least once**. Duplicates may occur on crashes or ack timeouts. The indexer must handle idempotent upserts. 97 - 98 - ## 3. Ingestion Contract 99 - 100 - For each event, the indexer: 101 - 102 - 1. Validates `type` is `"record"` (identity events are handled separately) 103 - 2. Checks `record.collection` against the allowlist 104 - 3. Maps `record.action` to an operation: 105 - - `create` → upsert document 106 - - `update` → upsert document 107 - - `delete` → tombstone document (`deleted_at = now`) 108 - 4. Decodes `record.record` into the collection-specific struct 109 - 5. Normalizes to internal `Document` 110 - 6. Upserts into the documents table 111 - 7. Schedules embedding job if eligible 112 - 8. Persists cursor (`event.id`) **only after successful DB commit** 113 - 114 - ### Cursor Persistence Rules 115 - 116 - - If DB commit fails → cursor does not advance → event will be retried 117 - - After successful DB writes, ack Tap first, then persist cursor for operator-visible resume 118 - - If ack fails → cursor does not advance 119 - - If ack succeeds but cursor persistence fails → retry cursor persistence until successful or process exit 120 - - If normalization fails → log error, optionally dead-letter, skip → cursor advances 121 - - If embedding scheduling fails → document remains keyword-searchable → cursor advances 122 - 123 - ## 4. Backfill Behavior 124 - 125 - When a repo is added to Tap (via `/repos/add`, signal collection, or full network mode): 126 - 127 - 1. Tap fetches full repo history from PDS via `com.atproto.sync.getRepo` 128 - 2. Firehose events for that repo are buffered during backfill 129 - 3. Historical events (`live: false`) are delivered first 130 - 4. After backfill completes, buffered live events drain 131 - 5. New firehose events stream normally (`live: true`) 132 - 133 - ### Application-Level Backfill Support 134 - 135 - The indexer also supports: 136 - 137 - - Full reindex from existing corpus (re-normalize all stored documents) 138 - - Targeted reindex by collection 139 - - Targeted reindex by DID 140 - 141 - These do not involve Tap — they re-process documents already in the database. 142 - 143 - ## 5. Normalization 144 - 145 - Normalization converts heterogeneous `sh.tangled.*` records into the common `Document` shape defined in [03-data-model.md](03-data-model.md). 146 - 147 - ### Adapter Interface 148 - 149 - Each indexed collection provides an adapter: 150 - 151 - ```go 152 - type RecordAdapter interface { 153 - Collection() string 154 - RecordType() string 155 - Normalize(event TapRecordEvent) (*Document, error) 156 - Searchable(record map[string]any) bool 157 - } 158 - ``` 159 - 160 - ### Per-Collection Normalization 161 - 162 - #### sh.tangled.repo → `repo` 163 - 164 - | Document Field | Source | 165 - | -------------- | -------------------------------- | 166 - | `title` | `record.name` | 167 - | `body` | `record.description` | 168 - | `summary` | `record.description` (truncated) | 169 - | `repo_name` | `record.name` | 170 - | `repo_did` | `event.did` | 171 - | `tags_json` | `json(record.topics)` | 172 - | `created_at` | `record.createdAt` | 173 - 174 - **Searchable:** Always (unless empty name). 175 - 176 - #### sh.tangled.repo.issue → `issue` 177 - 178 - | Document Field | Source | 179 - | -------------- | ------------------------------------------- | 180 - | `title` | `record.title` | 181 - | `body` | `record.body` | 182 - | `summary` | First ~200 chars of `record.body` | 183 - | `repo_did` | Extracted from `record.repo` AT-URI | 184 - | `repo_name` | Resolved from repo AT-URI | 185 - | `tags_json` | `[]` (labels resolved separately if needed) | 186 - | `created_at` | `record.createdAt` | 187 - 188 - **Searchable:** Always. 189 - 190 - #### sh.tangled.repo.pull → `pull` 191 - 192 - | Document Field | Source | 193 - | -------------- | ------------------------------------------ | 194 - | `title` | `record.title` | 195 - | `body` | `record.body` | 196 - | `summary` | First ~200 chars of `record.body` | 197 - | `repo_did` | Extracted from `record.target.repo` AT-URI | 198 - | `repo_name` | Resolved from target repo AT-URI | 199 - | `tags_json` | `[]` | 200 - | `created_at` | `record.createdAt` | 201 - 202 - **Searchable:** Always. 203 - 204 - #### sh.tangled.string → `string` 205 - 206 - | Document Field | Source | 207 - | -------------- | -------------------- | 208 - | `title` | `record.filename` | 209 - | `body` | `record.contents` | 210 - | `summary` | `record.description` | 211 - | `repo_name` | — | 212 - | `repo_did` | — | 213 - | `tags_json` | `[]` | 214 - | `created_at` | `record.createdAt` | 215 - 216 - **Searchable:** Always (content is required). 217 - 218 - #### sh.tangled.actor.profile → `profile` 219 - 220 - | Document Field | Source | 221 - | -------------- | ---------------------------------------------------- | 222 - | `title` | Author handle (resolved from DID) | 223 - | `body` | `record.description` | 224 - | `summary` | `record.description` (truncated) + `record.location` | 225 - | `repo_name` | — | 226 - | `repo_did` | — | 227 - | `tags_json` | `[]` | 228 - | `created_at` | — (profiles don't have createdAt) | 229 - 230 - **Searchable:** If `description` is non-empty. 231 - 232 - #### sh.tangled.repo.issue.comment → `issue_comment` 233 - 234 - | Document Field | Source | 235 - | -------------- | ----------------------------------------- | 236 - | `title` | — (derived: "Comment on {issue title}") | 237 - | `body` | `record.body` | 238 - | `summary` | First ~200 chars of `record.body` | 239 - | `repo_did` | Resolved from `record.issue` AT-URI chain | 240 - | `repo_name` | Resolved | 241 - | `created_at` | `record.createdAt` | 242 - 243 - **Searchable:** If body is non-empty. 244 - 245 - #### sh.tangled.repo.pull.comment → `pull_comment` 246 - 247 - Same pattern as issue comments, using `record.pull` instead of `record.issue`. 248 - 249 - ### State Event Handling 250 - 251 - State and status records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) do **not** produce new search documents. Instead, they update the `record_state` cache table (see [03-data-model.md](03-data-model.md)). 252 - 253 - ### Interaction Event Handling 254 - 255 - Stars (`sh.tangled.feed.star`) and reactions (`sh.tangled.feed.reaction`) do not produce search documents. They may be aggregated for ranking signals in later phases. 256 - 257 - ### Embedding Input Text 258 - 259 - For documents eligible for embedding, compose the input as: 260 - 261 - ```sh 262 - {title}\n{repo_name}\n{author_handle}\n{tags}\n{summary}\n{body} 263 - ``` 264 - 265 - Fields are joined with newlines. Empty fields are omitted. 266 - 267 - ### Repo Name Resolution 268 - 269 - Issues, PRs, and comments reference their parent repo via AT-URI (e.g., `at://did:plc:abc/sh.tangled.repo/tid`). Resolving the repo name requires either: 270 - 271 - 1. Looking up the repo document in the local `documents` table 272 - 2. Caching repo metadata in a lightweight lookup table 273 - 274 - Option 1 is preferred for v1. If the repo document hasn't been indexed yet, `repo_name` is left empty and backfilled on the next reindex pass. 275 - 276 - ## 6. Identity Event Handling 277 - 278 - Identity events should be used to maintain an author handle cache: 279 - 280 - ```sh 281 - did → handle mapping 282 - ``` 283 - 284 - When an identity event arrives with a new handle, update `author_handle` on all documents with that DID. This ensures search by handle returns current results. 285 - 286 - ## 7. Repo Management 287 - 288 - To add repos for tracking, POST to Tap's `/repos/add` endpoint: 289 - 290 - ```bash 291 - curl -u admin:PASSWORD -X POST https://tap-host/repos/add \ 292 - -H "Content-Type: application/json" \ 293 - -d '{"dids": ["did:plc:abc123", "did:plc:def456"]}' 294 - ``` 295 - 296 - Alternatively, use `TAP_SIGNAL_COLLECTION=sh.tangled.repo` to auto-track any repo that has Tangled repo records. 297 - 298 - ## 8. Failure Handling 299 - 300 - ### Ingestion Failures 301 - 302 - If Tap event processing fails before DB commit: 303 - 304 - - Log the failure with event ID, DID, collection, rkey, and error class 305 - - Retry with exponential backoff (for transient errors like DB timeouts) 306 - - Do **not** advance cursor — the event will be re-delivered by Tap 307 - - After max retries for a persistent error, log and skip (cursor advances) 308 - 309 - ### Normalization Failures 310 - 311 - If a record cannot be normalized: 312 - 313 - - Log collection, DID, rkey, CID, and error class 314 - - Do not crash the process 315 - - Skip the event and advance cursor 316 - - Optionally insert into a `dead_letter` table for manual inspection 317 - 318 - ### Embedding Failures 319 - 320 - If embedding generation fails: 321 - 322 - - The document remains keyword-searchable 323 - - The embedding job is marked `failed` with `last_error` and incremented `attempts` 324 - - Jobs are retried with exponential backoff up to a max attempt count 325 - - After max attempts, the job enters `dead` state 326 - - The embed-worker exposes failed job count as a metric 327 - - If Ollama is unreachable (sidecar down), all pending jobs pause until connectivity is restored 328 - 329 - ### DB Failures 330 - 331 - If Turso/libSQL is unreachable: 332 - 333 - - **API** returns `503` for search endpoints; `/healthz` still returns 200 (liveness), `/readyz` returns 503 334 - - **Indexer** pauses event processing and retries DB connection with backoff; cursor does not advance 335 - - **Embed-worker** pauses job processing and retries 336 - 337 - ### Tap Connection Failures 338 - 339 - If the WebSocket connection to Tap drops: 340 - 341 - - Reconnect with exponential backoff 342 - - Resume from the last persisted cursor 343 - - Log reconnection attempts and success 344 - 345 - Tap itself handles firehose reconnection independently — a Tap restart does not require indexer intervention beyond reconnecting the WebSocket. 346 - 347 - ### Duplicate Event Handling 348 - 349 - Tap delivers events **at least once**. Duplicates are handled by: 350 - 351 - - Using `id = did|collection|rkey` as the primary key 352 - - All writes are upserts (`INSERT OR REPLACE` / `ON CONFLICT ... DO UPDATE`) 353 - - CID comparison can detect true no-ops (same content) vs. actual updates 354 - 355 - ### Startup Recovery 356 - 357 - On indexer startup: 358 - 359 - 1. Read `cursor` from `sync_state` table 360 - 2. Connect to Tap WebSocket 361 - 3. Tap replays events from the stored cursor position 362 - 4. Processing resumes normally 363 - 364 - If no cursor exists (first run), Tap delivers all historical events from backfill.
-306
docs/api/specs/05-search.md
··· 1 - --- 2 - title: "Spec 05 — Search" 3 - updated: 2026-03-22 4 - --- 5 - 6 - Covers all search modes, the public search API contract, scoring, and filtering. 7 - 8 - ## 1. Search Modes 9 - 10 - | Mode | Backing | Available | 11 - | ---------- | ------------------------------------ | --------- | 12 - | `keyword` | Turso Tantivy-backed FTS | MVP | 13 - | `semantic` | Vector similarity (DiskANN index) | Phase 2 | 14 - | `hybrid` | Weighted merge of keyword + semantic | Phase 3 | 15 - 16 - ## 2. Keyword Search 17 - 18 - ### Implementation 19 - 20 - Uses Turso's `fts_score()` function for BM25 ranking: 21 - 22 - ```sql 23 - SELECT 24 - d.id, d.title, d.summary, d.repo_name, d.author_handle, 25 - d.collection, d.record_type, d.updated_at, 26 - fts_score(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) AS score 27 - FROM documents d 28 - WHERE fts_match(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) 29 - AND d.deleted_at IS NULL 30 - ORDER BY score DESC 31 - LIMIT ? OFFSET ?; 32 - ``` 33 - 34 - ### Field Weights 35 - 36 - Configured in the FTS index definition: 37 - 38 - | Field | Weight | Rationale | 39 - | --------------- | ------ | ------------------------------------ | 40 - | `title` | 3.0 | Highest signal for relevance | 41 - | `repo_name` | 2.5 | Exact repo lookups should rank first | 42 - | `author_handle` | 2.0 | Author search is common | 43 - | `summary` | 1.5 | More focused than body | 44 - | `tags_json` | 1.2 | Topic matching | 45 - | `body` | 1.0 | Baseline | 46 - 47 - ### Query Features 48 - 49 - Tantivy query syntax is exposed to users: 50 - 51 - - Boolean: `go AND search`, `rust NOT unsafe` 52 - - Phrase: `"pull request"` 53 - - Prefix: `tang*` 54 - - Field-specific: `title:parser` 55 - 56 - ### Snippets 57 - 58 - Use `fts_highlight()` to generate highlighted snippets: 59 - 60 - ```sql 61 - fts_highlight(d.body, '<mark>', '</mark>', ?) AS body_snippet 62 - ``` 63 - 64 - ### FTS Operational Notes 65 - 66 - - **Segment merging:** Turso FTS uses Tantivy's `NoMergePolicy`. Run `OPTIMIZE INDEX idx_documents_fts;` after bulk writes (backfill) and periodically in production to keep query performance stable. 67 - - **Read-your-writes:** FTS queries within the same transaction see a pre-commit snapshot. If a document is written and immediately searched in the same transaction, FTS will not find it. The indexer and API are separate processes, so this is not a concern in normal operation. 68 - - **Feature flag:** Turso FTS requires the `fts` feature flag to be enabled on the database. 69 - 70 - ## 3. Semantic Search 71 - 72 - ### Query Flow 73 - 74 - 1. Convert user query text to embedding via Ollama (self-hosted) 75 - 2. Query `vector_top_k` for nearest neighbors 76 - 3. Join back to `documents` to get metadata 77 - 4. Filter out deleted/hidden documents 78 - 5. Return results with distance as score 79 - 80 - ```sql 81 - SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle, 82 - d.collection, d.record_type, d.updated_at 83 - FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v 84 - JOIN document_embeddings e ON e.rowid = v.id 85 - JOIN documents d ON d.id = e.document_id 86 - WHERE d.deleted_at IS NULL; 87 - ``` 88 - 89 - ### Score Normalization 90 - 91 - Cosine distance ranges from 0 (identical) to 2 (opposite). Normalize to a 0–1 relevance score: 92 - 93 - ```text 94 - semantic_score = 1.0 - (distance / 2.0) 95 - ``` 96 - 97 - ## 4. Hybrid Search 98 - 99 - ### v1: Weighted Score Blending 100 - 101 - ```text 102 - hybrid_score = 0.65 * keyword_score_normalized + 0.35 * semantic_score_normalized 103 - ``` 104 - 105 - ### Score Normalization for Blending 106 - 107 - Keyword (BM25) scores are unbounded. Normalize using min-max within the result set: 108 - 109 - ```text 110 - keyword_normalized = (score - min_score) / (max_score - min_score) 111 - ``` 112 - 113 - Semantic scores are already bounded after the distance-to-relevance conversion. 114 - 115 - ### Merge Strategy 116 - 117 - 1. Fetch top N keyword results (e.g., N=50) 118 - 2. Fetch top N semantic results 119 - 3. Merge on `document_id` 120 - 4. For documents appearing in both sets, combine scores 121 - 5. For documents in only one set, use that score (with 0 for the missing signal) 122 - 6. Sort by `hybrid_score` descending 123 - 7. Deduplicate 124 - 8. Apply limit/offset 125 - 126 - ### v2: Reciprocal Rank Fusion (future) 127 - 128 - If keyword and semantic score scales prove unstable under weighted blending, replace with RRF: 129 - 130 - ```text 131 - rrf_score = Σ 1 / (k + rank_i) 132 - ``` 133 - 134 - where `k` is a constant (typically 60) and `rank_i` is the document's rank in each result list. 135 - 136 - ## 5. Filtering 137 - 138 - All search modes support these filters, applied as SQL WHERE clauses: 139 - 140 - | Filter | Parameter | SQL | 141 - | ----------- | ------------ | ------------------------------------------- | 142 - | Collection | `collection` | `d.collection = ?` | 143 - | Author | `author` | `d.author_handle = ?` or `d.did = ?` | 144 - | Repo | `repo` | `d.repo_name = ?` or `d.repo_did = ?` | 145 - | Record type | `type` | `d.record_type = ?` | 146 - | Language | `language` | `d.language = ?` | 147 - | Date range | `from`, `to` | `d.created_at >= ?` and `d.created_at <= ?` | 148 - | State | `state` | Join to `record_state` table | 149 - 150 - ## 6. Embedding Eligibility 151 - 152 - A document is eligible for embedding if: 153 - 154 - - `deleted_at IS NULL` 155 - - `record_type` is one of: `repo`, `issue`, `pull`, `string`, `profile` 156 - - At least one of `title`, `body`, or `summary` is non-empty 157 - - Total text length exceeds a minimum threshold (e.g., 20 characters) 158 - 159 - ## 7. API Endpoints 160 - 161 - ### Health 162 - 163 - | Method | Path | Description | 164 - | ------ | ---------- | -------------------------------- | 165 - | GET | `/healthz` | Liveness — process is responsive | 166 - | GET | `/readyz` | Readiness — DB is reachable | 167 - 168 - ### Search 169 - 170 - | Method | Path | Description | 171 - | ------ | ------------------ | ------------------------------------------------ | 172 - | GET | `/search` | Search with configurable mode (default: keyword) | 173 - | GET | `/search/keyword` | Keyword-only search | 174 - | GET | `/search/semantic` | Semantic-only search | 175 - | GET | `/search/hybrid` | Hybrid search | 176 - 177 - ### Documents 178 - 179 - | Method | Path | Description | 180 - | ------ | ----------------- | ----------------------------- | 181 - | GET | `/documents/{id}` | Fetch a single document by ID | 182 - 183 - ### Admin 184 - 185 - | Method | Path | Description | 186 - | ------ | ---------------- | -------------------- | 187 - | POST | `/admin/reindex` | Trigger reindex | 188 - | POST | `/admin/reembed` | Trigger re-embedding | 189 - 190 - Admin endpoints are disabled by default. Enable with `ENABLE_ADMIN_ENDPOINTS=true`. 191 - 192 - ## 8. Query Parameters 193 - 194 - | Parameter | Type | Default | Description | 195 - | ------------ | ------ | --------- | -------------------------------------------------------------------- | 196 - | `q` | string | required | Search query | 197 - | `mode` | string | `keyword` | `keyword`, `semantic`, or `hybrid` | 198 - | `limit` | int | 20 | Results per page (max: `SEARCH_MAX_LIMIT`) | 199 - | `offset` | int | 0 | Pagination offset | 200 - | `collection` | string | — | Filter by `sh.tangled.*` collection | 201 - | `type` | string | — | Filter by record type (`repo`, `issue`, `pull`, `string`, `profile`) | 202 - | `author` | string | — | Filter by author handle or DID | 203 - | `repo` | string | — | Filter by repo name or repo DID | 204 - | `language` | string | — | Filter by language | 205 - | `from` | string | — | Created after (ISO 8601) | 206 - | `to` | string | — | Created before (ISO 8601) | 207 - | `state` | string | — | Filter by state (`open`, `closed`, `merged`) | 208 - 209 - ## 9. Search Response 210 - 211 - ```json 212 - { 213 - "query": "rust markdown tui", 214 - "mode": "hybrid", 215 - "total": 142, 216 - "limit": 20, 217 - "offset": 0, 218 - "results": [ 219 - { 220 - "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x", 221 - "collection": "sh.tangled.repo", 222 - "record_type": "repo", 223 - "title": "glow-rs", 224 - "body_snippet": "A TUI markdown viewer inspired by <mark>Glow</mark>...", 225 - "summary": "Rust TUI markdown viewer", 226 - "repo_name": "glow-rs", 227 - "author_handle": "desertthunder.dev", 228 - "score": 0.842, 229 - "matched_by": ["keyword", "semantic"], 230 - "created_at": "2026-03-20T10:00:00Z", 231 - "updated_at": "2026-03-22T15:03:11Z" 232 - } 233 - ] 234 - } 235 - ``` 236 - 237 - ### Result Fields 238 - 239 - | Field | Type | Description | 240 - | ------------------ | -------- | ------------------------------------------- | 241 - | `id` | string | Document stable ID | 242 - | `collection` | string | ATProto collection NSID | 243 - | `record_type` | string | Normalized type label | 244 - | `title` | string | Document title | 245 - | `body_snippet` | string | Highlighted body excerpt | 246 - | `summary` | string | Short description | 247 - | `repo_name` | string | Repository name (if applicable) | 248 - | `author_handle` | string | Author handle | 249 - | `did` | string | Author DID when available | 250 - | `at_uri` | string | Canonical AT URI when available | 251 - | `primary_language` | string | Primary language for repo results | 252 - | `stars` | number | Indexed star count for repo results | 253 - | `follower_count` | number | Indexed follower count for profile results | 254 - | `following_count` | number | Indexed following count for profile results | 255 - | `score` | float | Relevance score (0–1) | 256 - | `matched_by` | string[] | Which search modes produced this result | 257 - | `created_at` | string | ISO 8601 creation timestamp | 258 - | `updated_at` | string | ISO 8601 last update timestamp | 259 - 260 - ## 10. Document Response 261 - 262 - `GET /documents/{id}` returns the full document: 263 - 264 - ```json 265 - { 266 - "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x", 267 - "did": "did:plc:abc", 268 - "collection": "sh.tangled.repo", 269 - "rkey": "3kb3fge5lm32x", 270 - "at_uri": "at://did:plc:abc/sh.tangled.repo/3kb3fge5lm32x", 271 - "cid": "bafyreig...", 272 - "record_type": "repo", 273 - "title": "glow-rs", 274 - "body": "A TUI markdown viewer inspired by Glow, written in Rust.", 275 - "summary": "Rust TUI markdown viewer", 276 - "repo_name": "glow-rs", 277 - "author_handle": "desertthunder.dev", 278 - "tags_json": "[\"rust\", \"tui\", \"markdown\"]", 279 - "language": "en", 280 - "created_at": "2026-03-20T10:00:00Z", 281 - "updated_at": "2026-03-22T15:03:11Z", 282 - "indexed_at": "2026-03-22T15:05:00Z", 283 - "has_embedding": true 284 - } 285 - ``` 286 - 287 - ## 11. Error Responses 288 - 289 - | Status | Condition | 290 - | ------ | ------------------------------------------------------------------ | 291 - | 400 | Missing `q` parameter, invalid `limit`/`offset`, malformed filters | 292 - | 404 | Document not found | 293 - | 503 | DB unreachable (readiness failure) | 294 - 295 - ```json 296 - { "error": "invalid_parameter", "message": "limit must be between 1 and 100" } 297 - ``` 298 - 299 - ## 12. API Behavior 300 - 301 - - `keyword` returns only lexical matches via `fts_match`/`fts_score` 302 - - `semantic` returns only embedding-backed matches via `vector_top_k` 303 - - `hybrid` merges both result sets and reranks 304 - - All modes exclude documents with `deleted_at IS NOT NULL` by default 305 - - Pagination uses `limit`/`offset` (cursor-based pagination deferred) 306 - - Mobile clients may use `type=repo` and `type=profile` to render repo/profile search directly
-434
docs/api/specs/06-operations.md
··· 1 - --- 2 - title: "Spec 06 — Operations" 3 - updated: 2026-03-23 4 - --- 5 - 6 - Covers configuration, observability, security, and deployment. 7 - 8 - ## 0. Quick Setup 9 - 10 - Tap is already deployed. For a new environment, the minimum operator work is: 11 - 12 - 1. Create or choose a Turso database for that environment 13 - 2. Generate a Turso auth token for that database 14 - 3. Point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at that database 15 - 4. Create Railway services for `api` and `indexer` 16 - 5. Point `TAP_URL` at the existing Tap deployment 17 - 6. Run migrations/start the services 18 - 7. Run `twister backfill` before treating the environment as search-ready 19 - 20 - No separate `*_DEV` or `*_PROD` variables are required. Each environment keeps using the same variable names and simply points them at the appropriate Turso database. 21 - 22 - ## 1. Configuration 23 - 24 - All configuration is via environment variables. 25 - 26 - ### Required 27 - 28 - | Variable | Description | 29 - | --------------------- | ----------------------------------------------------------- | 30 - | `TAP_URL` | Tap WebSocket URL (e.g., `wss://tap.example.com/channel`) | 31 - | `TAP_AUTH_PASSWORD` | Tap admin password for Basic auth (if set on Tap) | 32 - | `TURSO_DATABASE_URL` | Turso connection URL (e.g., `libsql://db-name.turso.io`) | 33 - | `TURSO_AUTH_TOKEN` | Turso JWT auth token | 34 - | `INDEXED_COLLECTIONS` | Comma-separated list of `sh.tangled.*` collections to index | 35 - 36 - ### Search 37 - 38 - | Variable | Default | Description | 39 - | ---------------------- | --------- | ------------------------ | 40 - | `SEARCH_DEFAULT_LIMIT` | `20` | Default results per page | 41 - | `SEARCH_MAX_LIMIT` | `100` | Maximum results per page | 42 - | `SEARCH_DEFAULT_MODE` | `keyword` | Default search mode | 43 - 44 - ### Embedding (Ollama — self-hosted) 45 - 46 - | Variable | Default | Description | 47 - | ---------------------- | ------------------------------------------ | ---------------------------------------------- | 48 - | `OLLAMA_URL` | `http://ollama.railway.internal:11434` | Ollama server URL | 49 - | `EMBEDDING_MODEL` | `nomic-embed-text` | Ollama model name | 50 - | `EMBEDDING_DIM` | `768` | Vector dimensionality (must match model) | 51 - | `EMBEDDING_BATCH_SIZE` | `32` | Documents per embedding batch | 52 - 53 - ### Hybrid Search 54 - 55 - | Variable | Default | Description | 56 - | ------------------------ | ------- | --------------------------------------- | 57 - | `HYBRID_KEYWORD_WEIGHT` | `0.65` | Keyword score weight in hybrid ranking | 58 - | `HYBRID_SEMANTIC_WEIGHT` | `0.35` | Semantic score weight in hybrid ranking | 59 - 60 - ### Server 61 - 62 - | Variable | Default | Description | 63 - | ------------------------ | ------- | ------------------------------------------- | 64 - | `HTTP_BIND_ADDR` | `:8080` | API server bind address | 65 - | `LOG_LEVEL` | `info` | Log level: `debug`, `info`, `warn`, `error` | 66 - | `LOG_FORMAT` | `json` | Log format: `json` or `text` | 67 - | `ENABLE_ADMIN_ENDPOINTS` | `false` | Enable `/admin/*` endpoints | 68 - | `ADMIN_AUTH_TOKEN` | — | Bearer token for admin endpoints | 69 - 70 - ### Example `.env` 71 - 72 - ```bash 73 - # Tap (deployed on Railway) 74 - TAP_URL=wss://tap-instance.up.railway.app/channel 75 - TAP_AUTH_PASSWORD=your-tap-admin-password 76 - 77 - # Turso 78 - TURSO_DATABASE_URL=libsql://twister-db.turso.io 79 - TURSO_AUTH_TOKEN=eyJhbGci... 80 - 81 - # Collections 82 - INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile,sh.tangled.repo.issue.comment,sh.tangled.repo.pull.comment,sh.tangled.repo.issue.state,sh.tangled.repo.pull.status,sh.tangled.feed.star 83 - 84 - # Search 85 - SEARCH_DEFAULT_LIMIT=20 86 - SEARCH_MAX_LIMIT=100 87 - 88 - # Embedding — Ollama (Phase 2) 89 - # OLLAMA_URL=http://ollama.railway.internal:11434 90 - # EMBEDDING_MODEL=nomic-embed-text 91 - # EMBEDDING_DIM=768 92 - 93 - # Server 94 - HTTP_BIND_ADDR=:8080 95 - LOG_LEVEL=info 96 - ENABLE_ADMIN_ENDPOINTS=false 97 - ``` 98 - 99 - ### Environment Selection 100 - 101 - Use the same variable names in every environment: 102 - 103 - - local development can point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at `twister-dev` 104 - - production can point those same variables at `twister-prod` 105 - 106 - The application should not care which database it is talking to; only the environment wiring changes. 107 - 108 - ## 1.5. Turso Setup 109 - 110 - ### Recommended Databases 111 - 112 - Use one Turso database per environment, for example: 113 - 114 - - `twister-dev` 115 - - `twister-prod` 116 - 117 - Keep the app config identical across environments and swap only these values: 118 - 119 - - `TURSO_DATABASE_URL` 120 - - `TURSO_AUTH_TOKEN` 121 - 122 - ### Basic Flow 123 - 124 - Using the Turso dashboard or CLI: 125 - 126 - 1. Create the database for the target environment 127 - 2. Capture its libSQL URL 128 - 3. Create an auth token for the service 129 - 4. Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` in that environment 130 - 131 - Example values: 132 - 133 - ```bash 134 - # Development environment 135 - TURSO_DATABASE_URL=libsql://twister-dev-your-org.turso.io 136 - TURSO_AUTH_TOKEN=... 137 - 138 - # Production environment 139 - TURSO_DATABASE_URL=libsql://twister-prod-your-org.turso.io 140 - TURSO_AUTH_TOKEN=... 141 - ``` 142 - 143 - ### Practical Rule 144 - 145 - Do not introduce `TURSO_DATABASE_URL_DEV`, `TURSO_DATABASE_URL_PROD`, or similar split variables. Railway environments, local shells, and CI should all set the same names with environment-specific values. 146 - 147 - ## 1.6. Railway Setup 148 - 149 - ### Project Layout 150 - 151 - Create or reuse one Railway project containing: 152 - 153 - - existing `tap` service 154 - - `api` service running `twister api` 155 - - `indexer` service running `twister indexer` 156 - 157 - ### Basic Steps 158 - 159 - 1. Connect the monorepo to Railway 160 - 2. Create the `api` and `indexer` services from the same source repo/Docker image 161 - 3. Set shared variables on both services: 162 - - `TURSO_DATABASE_URL` 163 - - `TURSO_AUTH_TOKEN` 164 - - `LOG_LEVEL` 165 - - `LOG_FORMAT` 166 - 4. Set API-specific variables: 167 - - `HTTP_BIND_ADDR` 168 - - `SEARCH_DEFAULT_LIMIT` 169 - - `SEARCH_MAX_LIMIT` 170 - 5. Set indexer-specific variables: 171 - - `TAP_URL` 172 - - `TAP_AUTH_PASSWORD` 173 - - `INDEXED_COLLECTIONS` 174 - 6. Configure health checks 175 - 7. Deploy 176 - 8. Run backfill against the environment before public validation 177 - 178 - ### Dev vs Production on Railway 179 - 180 - If you use multiple Railway environments, keep the same service definitions and variable names in each one. Only the values change: 181 - 182 - - dev Railway environment -> `TURSO_DATABASE_URL=...twister-dev...` 183 - - prod Railway environment -> `TURSO_DATABASE_URL=...twister-prod...` 184 - 185 - This keeps deployment logic simple and avoids conditional application config. 186 - 187 - ## 2. Observability 188 - 189 - ### Structured Logging 190 - 191 - Use Go's `slog` with JSON output. Every log entry includes: 192 - 193 - | Field | Description | 194 - | --------- | ----------------------------------- | 195 - | `ts` | Timestamp (RFC 3339) | 196 - | `level` | Log level | 197 - | `service` | `api`, `indexer`, or `embed-worker` | 198 - | `msg` | Human-readable message | 199 - 200 - #### Context Fields (where applicable) 201 - 202 - | Field | When | 203 - | ------------- | ------------------------ | 204 - | `event_name` | Tap event processing | 205 - | `event_id` | Tap event ID | 206 - | `document_id` | Document operations | 207 - | `did` | Any DID-scoped operation | 208 - | `collection` | Record processing | 209 - | `rkey` | Record processing | 210 - | `cursor` | Cursor persistence | 211 - | `error_class` | Error handling | 212 - | `duration_ms` | Timed operations | 213 - 214 - ### Metrics 215 - 216 - Recommended counters and gauges (via logs, Prometheus, or platform metrics): 217 - 218 - #### Ingestion 219 - 220 - | Metric | Type | Description | 221 - | ------------------------------ | --------- | ---------------------------------- | 222 - | `events_processed_total` | counter | Total Tap events processed | 223 - | `events_failed_total` | counter | Events that failed processing | 224 - | `normalization_failures_total` | counter | Normalization errors by collection | 225 - | `upsert_duration_ms` | histogram | DB upsert latency | 226 - | `cursor_position` | gauge | Current Tap cursor position | 227 - 228 - #### Embedding 229 - 230 - | Metric | Type | Description | 231 - | -------------------------- | --------- | ------------------------------ | 232 - | `embedding_queue_depth` | gauge | Pending embedding jobs | 233 - | `embedding_failures_total` | counter | Failed embedding attempts | 234 - | `embedding_duration_ms` | histogram | Per-document embedding latency | 235 - 236 - #### Search 237 - 238 - | Metric | Type | Description | 239 - | ----------------------- | --------- | -------------------------- | 240 - | `search_requests_total` | counter | Requests by mode | 241 - | `search_duration_ms` | histogram | Query latency by mode | 242 - | `search_results_count` | histogram | Results returned per query | 243 - 244 - ### Health Checks 245 - 246 - #### API Process 247 - 248 - | Endpoint | Check | Healthy | 249 - | -------------- | --------------------- | ------------------- | 250 - | `GET /healthz` | Process is responsive | Always (liveness) | 251 - | `GET /readyz` | DB connection works | `SELECT 1` succeeds | 252 - 253 - #### Indexer Process 254 - 255 - The indexer exposes a top-level health probe (not HTTP-routed): 256 - 257 - - Tap WebSocket connected or reconnecting 258 - - Cursor advancing or intentionally idle 259 - - DB reachable 260 - 261 - On Railway, this is a health check endpoint on a separate port (9090). 262 - 263 - #### Embed Worker 264 - 265 - - DB reachable 266 - - Embedding provider reachable (periodic test call) 267 - - Job queue not stalled (jobs processing within expected timeframe) 268 - 269 - ## 3. Security 270 - 271 - ### Secrets Management 272 - 273 - Secrets are injected through platform secret management: 274 - 275 - - **Railway:** Environment variables in the dashboard or `railway variables` 276 - 277 - Secrets are never stored in code, config files, or Docker images. 278 - 279 - Required secrets: 280 - 281 - | Secret | Purpose | 282 - | ------------------- | --------------------------------- | 283 - | `TURSO_AUTH_TOKEN` | Turso database authentication | 284 - | `TAP_AUTH_PASSWORD` | Tap admin API authentication | 285 - | `OLLAMA_URL` | Ollama sidecar connection (no secret if internal networking) | 286 - | `ADMIN_AUTH_TOKEN` | Admin endpoint authentication | 287 - 288 - ### Admin Endpoints 289 - 290 - Admin endpoints (`/admin/reindex`, `/admin/reembed`) are: 291 - 292 - - Disabled by default (`ENABLE_ADMIN_ENDPOINTS=false`) 293 - - When enabled, protected by bearer token (`ADMIN_AUTH_TOKEN`) 294 - - Alternatively, exposed only on internal networking (Railway private networking) 295 - 296 - ### Input Validation 297 - 298 - The search API shall: 299 - 300 - - Validate `limit` is between 1 and `SEARCH_MAX_LIMIT` 301 - - Validate `offset` is non-negative 302 - - Reject unknown or malformed filter parameters with 400 303 - - Sanitize query strings before passing to FTS (Tantivy query parser handles this, but validate basic structure) 304 - - Bound hybrid requests (limit concurrent vector searches) 305 - 306 - ### Tap Authentication 307 - 308 - The indexer authenticates to Tap using HTTP Basic auth (`admin:<TAP_AUTH_PASSWORD>`). The WebSocket upgrade request includes the auth header. 309 - 310 - ### Data Privacy 311 - 312 - - All indexed content is public ATProto data 313 - - No private or authenticated content is ingested 314 - - Deleted records are tombstoned (`deleted_at` set) and excluded from search results 315 - - Tombstoned documents are periodically purged (configurable retention) 316 - 317 - ## 4. Deployment 318 - 319 - ### Railway (Primary) 320 - 321 - All Twister services deploy as separate Railway services within the same project. Tap is already deployed here. 322 - 323 - #### Service Layout 324 - 325 - | Service | Start Command | Health Check | Public | 326 - | ------------ | ---------------------- | ------------------ | ------ | 327 - | tap | (already deployed) | `GET /health` | no | 328 - | api | `twister api` | `GET /healthz` | yes | 329 - | indexer | `twister indexer` | `GET :9090/health` | no | 330 - | embed-worker | `twister embed-worker` | `GET :9091/health` | no | 331 - | ollama | (Railway template) | `GET /api/tags` | no | 332 - 333 - All services share the same Docker image. Railway uses the start command to select the subcommand. 334 - 335 - #### Environment Variables 336 - 337 - Set per-service in the Railway dashboard or via `railway variables`: 338 - 339 - ```bash 340 - # Shared across services 341 - TURSO_DATABASE_URL=libsql://twister-db.turso.io 342 - TURSO_AUTH_TOKEN=eyJ... 343 - LOG_LEVEL=info 344 - LOG_FORMAT=json 345 - 346 - # API service 347 - HTTP_BIND_ADDR=:8080 348 - SEARCH_DEFAULT_LIMIT=20 349 - SEARCH_MAX_LIMIT=100 350 - ENABLE_ADMIN_ENDPOINTS=false 351 - 352 - # Indexer service 353 - TAP_URL=wss://${{tap.RAILWAY_PUBLIC_DOMAIN}}/channel # Railway service reference 354 - TAP_AUTH_PASSWORD=... 355 - INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile 356 - 357 - # Embed-worker + Ollama (Phase 2) 358 - # OLLAMA_URL=http://ollama.railway.internal:11434 359 - # EMBEDDING_MODEL=nomic-embed-text 360 - ``` 361 - 362 - Railway supports referencing other services' variables with `${{service.VAR}}` syntax, which is useful for linking the indexer to Tap's domain. 363 - 364 - #### First-Time Bootstrap Checklist 365 - 366 - After the first successful deploy of a new environment: 367 - 368 - 1. Confirm API readiness on `/readyz` 369 - 2. Confirm indexer health and Tap connectivity 370 - 3. Run graph backfill with the environment's seed file 371 - 4. Wait for Tap historical sync to settle 372 - 5. Verify that search returns known historical repos/profiles 373 - 374 - #### Health Checks 375 - 376 - Railway activates deployments based on health check responses. Configure per-service: 377 - 378 - - **api:** HTTP health check on `/healthz` port 8080 379 - - **indexer:** HTTP health check on `/health` port 9090 380 - - **embed-worker:** HTTP health check on `/health` port 9091 381 - 382 - #### Autodeploy 383 - 384 - Connect the GitHub repository for automatic deployments on push. Railway builds from the Dockerfile and uses the start command configured per service. 385 - 386 - #### Internal Networking 387 - 388 - Railway services within the same project can communicate over private networking using `service.railway.internal` hostnames. The indexer connects to Tap via this internal network when both are in the same project. 389 - 390 - ### Dockerfile 391 - 392 - ```dockerfile 393 - FROM golang:1.24-alpine AS builder 394 - 395 - WORKDIR /app 396 - 397 - COPY go.mod go.sum ./ 398 - RUN go mod download 399 - 400 - COPY . . 401 - 402 - RUN CGO_ENABLED=0 GOOS=linux go build \ 403 - -ldflags="-s -w" \ 404 - -o /app/twister \ 405 - ./main.go 406 - 407 - FROM alpine:3.21 408 - 409 - RUN apk add --no-cache ca-certificates tzdata 410 - 411 - COPY --from=builder /app/twister /usr/local/bin/twister 412 - 413 - EXPOSE 8080 9090 9091 414 - 415 - CMD ["twister", "api"] 416 - ``` 417 - 418 - Notes: 419 - 420 - - `CGO_ENABLED=0` for static binary (required if using `libsql-client-go`; not compatible with `go-libsql` which needs CGo) 421 - - Railway overrides `CMD` with the start command configured per service 422 - - Multiple ports exposed: 8080 (API), 9090 (indexer health), 9091 (embed-worker health) 423 - 424 - ### Graceful Shutdown 425 - 426 - All processes handle `SIGTERM` and `SIGINT`: 427 - 428 - 1. Stop accepting new requests/events 429 - 2. Drain in-flight work (with timeout) 430 - 3. Persist current cursor (indexer) 431 - 4. Close DB connections 432 - 5. Exit 0 433 - 434 - Railway sends `SIGTERM` during deployments and restarts.
-142
docs/api/specs/07-graph-backfill.md
··· 1 - --- 2 - title: "Spec 07 — Graph Backfill" 3 - updated: 2026-03-22 4 - --- 5 - 6 - ## 1. Purpose 7 - 8 - Bootstrap the search index with existing Tangled content by discovering users from a seed set and triggering Tap backfill for their repositories. Without this, the index only captures new events after deployment. 9 - 10 - ## 2. Seed Set 11 - 12 - A manually curated list of known Tangled users (DIDs or handles), stored in a plain text file: 13 - 14 - ```text 15 - # Known active Tangled users 16 - did:plc:abc123 17 - did:plc:def456 18 - alice.tangled.sh 19 - bob.tangled.sh 20 - # Add more as discovered 21 - ``` 22 - 23 - Format: 24 - 25 - - One entry per line 26 - - Lines starting with `#` are comments 27 - - Blank lines are ignored 28 - - Entries can be DIDs (`did:plc:...`) or handles (`alice.tangled.sh`) 29 - - Handles are resolved to DIDs before processing 30 - 31 - ## 3. Fan-Out Strategy 32 - 33 - From each seed user, discover connected users to expand the crawl set: 34 - 35 - ### Discovery Sources 36 - 37 - 1. **Follows**: Fetch `sh.tangled.graph.follow` records for the user → extract `subject` DIDs 38 - 2. **Collaborators**: For repos owned by the user, identify other users who have created issues, PRs, or comments → extract their DIDs 39 - 40 - ### Depth Limit 41 - 42 - Fan-out is configurable with a max hops parameter (default: 2): 43 - 44 - - **Hop 0**: Seed users themselves 45 - - **Hop 1**: Direct follows and collaborators of seed users 46 - - **Hop 2**: Follows and collaborators of hop-1 users 47 - 48 - Higher hop counts discover more users but increase time and may pull in loosely related accounts. Start with 2 hops and adjust based on the size of the Tangled network. 49 - 50 - ### Crawl Queue 51 - 52 - Discovered DIDs are added to a queue, deduplicated by DID. Each entry tracks: 53 - 54 - - DID 55 - - Discovery hop (distance from seed) 56 - - Source (which seed/user led to discovery) 57 - 58 - ## 4. Backfill Mechanism 59 - 60 - For each discovered user: 61 - 62 - 1. **Check Tap status**: Query Tap's `/info/:did` endpoint and classify by status: 63 - - tracked + backfilled: skip 64 - - tracked + backfilling/in-progress: skip and let current backfill finish 65 - - untracked or tracked-without-backfill-state: submit to `/repos/add` 66 - 2. **Register with Tap**: POST to `/repos/add` with the DID — Tap handles the actual repo export and event delivery 67 - 3. **Tap backfill flow**: Tap fetches full repo history from PDS via `com.atproto.sync.getRepo`, then delivers historical events (`live: false`) through the normal WebSocket channel 68 - 4. **Indexer processes normally**: The indexer's existing ingestion loop handles backfill events the same as live events — no special backfill code path needed 69 - 70 - ### Rate Limiting 71 - 72 - - Batch `/repos/add` calls (e.g., 10 DIDs per request) 73 - - Add configurable delay between batches to avoid overwhelming Tap 74 - - Respect Tap's processing capacity — monitor `/stats/repo-count` to track progress 75 - 76 - ## 5. Deduplication 77 - 78 - - **User-level**: Maintain a visited set of DIDs during fan-out; skip already-seen DIDs 79 - - **Tap-level**: Tap's `/repos/add` is idempotent — adding an already-tracked DID is a no-op 80 - - **Record-level**: The indexer's upsert logic (keyed on `did|collection|rkey`) handles duplicate events naturally 81 - 82 - ## 6. CLI Interface 83 - 84 - ```bash 85 - # Basic backfill from seed file 86 - twister backfill --seeds seeds.txt 87 - 88 - # Limit fan-out depth 89 - twister backfill --seeds seeds.txt --max-hops 1 90 - 91 - # Preview discovered users without triggering backfill 92 - twister backfill --seeds seeds.txt --dry-run 93 - 94 - # Control parallelism 95 - twister backfill --seeds seeds.txt --concurrency 5 96 - ``` 97 - 98 - ### Flags 99 - 100 - | Flag | Default | Description | 101 - | --------------- | -------- | ----------------------------------------------- | 102 - | `--seeds` | required | Seed source: file path or comma-separated list | 103 - | `--max-hops` | `2` | Max fan-out depth from seed users | 104 - | `--dry-run` | `false` | List discovered users without submitting to Tap | 105 - | `--concurrency` | `5` | Parallel discovery workers | 106 - | `--batch-size` | `10` | DIDs per `/repos/add` call | 107 - | `--batch-delay` | `1s` | Delay between batches | 108 - 109 - ### Output 110 - 111 - Progress is logged to stdout: 112 - 113 - ```text 114 - [hop 0] Processing 5 seed users... 115 - [hop 0] did:plc:abc123 → 12 follows, 3 collaborators 116 - [hop 0] did:plc:def456 → 8 follows, 1 collaborator 117 - [hop 1] Processing 24 discovered users (18 new)... 118 - ... 119 - [done] Discovered 142 unique users across 2 hops 120 - [done] Submitted 98 new DIDs to Tap (44 already tracked) 121 - ``` 122 - 123 - ## 7. Idempotency 124 - 125 - The entire backfill process is safe to re-run: 126 - 127 - - Seed file parsing is stateless 128 - - Fan-out discovery is deterministic for a given network state 129 - - Tap's `/repos/add` is idempotent 130 - - The indexer's upsert logic handles re-delivered events 131 - - No local state is persisted between runs (the crawl queue is in-memory) 132 - 133 - ## 8. Configuration 134 - 135 - | Variable | Default | Description | 136 - | -------------------- | ---------- | ----------------------------- | 137 - | `TAP_URL` | (existing) | Tap base URL for API calls | 138 - | `TAP_AUTH_PASSWORD` | (existing) | Tap admin auth | 139 - | `TURSO_DATABASE_URL` | (existing) | For checking existing records | 140 - | `TURSO_AUTH_TOKEN` | (existing) | DB auth | 141 - 142 - No new environment variables are needed — backfill reuses existing Tap and DB configuration.
-89
docs/api/specs/08-app-integration.md
··· 1 - --- 2 - title: "Spec 08 — App Integration" 3 - updated: 2026-03-23 4 - --- 5 - 6 - ## 1. Purpose 7 - 8 - Define the mobile-facing Twister API surface. 9 - 10 - The Twisted app should keep using Tangled's public knot and PDS APIs for canonical repo/profile detail. Twister is responsible for: 11 - 12 - - cross-network discovery via search 13 - - index-backed summaries for data gaps such as followers 14 - 15 - ## 2. Client Boundary 16 - 17 - The mobile client uses Twister only for: 18 - 19 - - Explore search 20 - - index-backed profile summaries 21 - - future feed and notification features 22 - 23 - The mobile client does not use Twister for: 24 - 25 - - repo tree/blob/detail reads 26 - - direct profile record reads 27 - - issue/PR detail reads 28 - 29 - Those remain on Tangled's public APIs. 30 - 31 - ## 3. Search Contract 32 - 33 - `GET /search` 34 - 35 - Required query parameters: 36 - 37 - - `q` 38 - 39 - Optional query parameters: 40 - 41 - - `mode=keyword|semantic|hybrid` 42 - - `type=repo|profile` 43 - - `limit` 44 - - `offset` 45 - 46 - For mobile clients, repo and profile results should include: 47 - 48 - - `did` 49 - - `at_uri` 50 - - `record_type` 51 - - `title` 52 - - `summary` 53 - - `repo_name` 54 - - `author_handle` 55 - - `updated_at` 56 - - `primary_language` for repos when known 57 - - `stars` for repos when known 58 - - `follower_count` and `following_count` for profiles when known 59 - 60 - ## 4. Profile Summary Contract 61 - 62 - `GET /profiles/{did}/summary` 63 - 64 - Response: 65 - 66 - ```json 67 - { 68 - "did": "did:plc:abc123", 69 - "handle": "desertthunder.dev", 70 - "follower_count": 128, 71 - "following_count": 84, 72 - "indexed_at": "2026-03-23T10:15:00Z" 73 - } 74 - ``` 75 - 76 - This endpoint exists because follower counts and follower lists are derived from indexed graph state, not from a single direct public Tangled API call. 77 - 78 - ## 5. Failure Handling 79 - 80 - If Twister is unavailable: 81 - 82 - - the app should keep direct known-handle browsing working 83 - - Explore should show a clear "index unavailable" state 84 - - profile pages should omit index-backed follower counts rather than fail entirely 85 - 86 - ## 6. Ownership 87 - 88 - - Twister owns search ranking, document normalization, and graph summary derivation 89 - - The app owns result presentation, route transitions, and fallback behavior
-166
docs/api/specs/09-search-site.md
··· 1 - --- 2 - title: "Spec 09 — Search Site" 3 - updated: 2026-03-23 4 - --- 5 - 6 - A minimal static site that serves as both the public Twister API documentation and a live search showcase. Dark mode only, no framework or build step. 7 - 8 - ## 1. Purpose 9 - 10 - - Give developers a browsable reference for the Twister search API 11 - - Give anyone a way to try search against live indexed Tangled content 12 - - Provide a shareable public URL before the mobile app ships 13 - 14 - ## 2. Scope 15 - 16 - In scope: 17 - 18 - - Static HTML/CSS/JS (Alpine.js, no bundler) 19 - - API reference pages generated from the spec docs 20 - - Live search input wired to `GET /search` 21 - - Result rendering with type-aware cards (repo, issue, PR, profile, string) 22 - - Filter controls for collection, type, author, language, state 23 - - Pagination 24 - - Responsive layout (mobile-friendly, single breakpoint) 25 - 26 - Out of scope: 27 - 28 - - Auth, OAuth, or any write operations 29 - - Semantic or hybrid mode toggle (keyword only for MVP) 30 - - Server-side rendering or static-site generator 31 - - Analytics or telemetry 32 - 33 - ## 3. Pages 34 - 35 - | Route | Content | 36 - | ----------------- | --------------------------------------------------------------------------- | 37 - | `/` | Search input + results (the homepage is the search page) | 38 - | `/docs` | API overview: base URL, auth (none for public), rate limits, response shape | 39 - | `/docs/search` | `GET /search` — parameters, filters, response contract, examples | 40 - | `/docs/documents` | `GET /documents/{id}` — request/response, examples | 41 - | `/docs/health` | `GET /healthz`, `GET /readyz` — purpose and expected responses | 42 - 43 - ## 4. Search Page Behavior 44 - 45 - 1. Text input with a submit button. No debounce search-as-you-type for MVP. 46 - 2. On submit, fetch `GET {API_BASE}/search?q={query}&limit=20` (plus any active filters). 47 - 3. Render results as a vertical list of cards. 48 - 4. Each card shows: `record_type` badge, `title`, `body_snippet` (with `<mark>` highlights preserved), `author_handle`, `repo_name` (when present), `updated_at` relative time. 49 - 5. Clicking a result opens the canonical Tangled URL (`https://tangled.org/{handle}/{repo}` for repos, etc.) in a new tab. 50 - 6. "Load more" button appends the next page (`offset += limit`). 51 - 7. Empty state: "No results" message. 52 - 8. Error state: inline message if the API is unreachable. 53 - 9. Filter bar above results: dropdowns/inputs for `type`, `language`, `author`. Filters are query params so URLs are shareable. 54 - 55 - ## 5. API Docs Pages 56 - 57 - Hand-written HTML mirroring the contracts in spec 05 (search) and spec 08 (app integration). Each page includes: 58 - 59 - - Endpoint signature (method, path) 60 - - Parameter table (name, type, default, description) 61 - - Example request (curl) 62 - - Example response (JSON block with syntax highlighting via `<pre><code>`) 63 - 64 - No generated docs tooling. The pages are static and updated manually when the API changes. 65 - 66 - ## 6. Styling 67 - 68 - Minimal CSS, no utility framework. 69 - 70 - ### Tokens 71 - 72 - ```css 73 - :root { 74 - --bg: #0e0e0e; 75 - --surface: #1a1a1a; 76 - --border: #2a2a2a; 77 - --text: #e0e0e0; 78 - --text-dim: #888; 79 - --accent: #7aa2f7; 80 - --mark-bg: #7aa2f733; 81 - --mono: "Google Sans Mono", monospace; 82 - --sans: "Google Sans", sans-serif; 83 - --radius: 6px; 84 - } 85 - ``` 86 - 87 - ### Rules 88 - 89 - - Dark theming. 90 - - `Google Sans` for body text. `Google Sans Mono` for code, JSON, and badges. 91 - - Fonts loaded via Google Fonts `<link>`. System fallbacks: `sans-serif`, `monospace`. 92 - - Max content width: `720px`, centered. 93 - - Cards: `var(--surface)` background, `var(--border)` border, `var(--radius)` corners. 94 - - `<mark>` tags in snippets styled with `var(--mark-bg)` background and `var(--accent)` text. 95 - - Code blocks: `var(--surface)` background, horizontal scroll, no wrapping. 96 - - Links: `var(--accent)`, no underline, underline on hover. 97 - - Inputs and buttons: `var(--surface)` background, `var(--border)` border, `var(--text)` text. 98 - - One breakpoint at `640px` for mobile: full-width cards, stacked filter bar. 99 - 100 - ## 7. Package Design 101 - 102 - The site lives in `internal/view/` as a self-contained Go package. It owns the templates, static assets, and HTTP handlers. The `api` package mounts `view.Handler()` into its router — nothing else leaks out. 103 - 104 - ### Exports 105 - 106 - The package exposes a single constructor: 107 - 108 - ```go 109 - // Handler returns an http.Handler that serves the site pages and static assets. 110 - func Handler() http.Handler 111 - ``` 112 - 113 - The `api` package calls `view.Handler()` and mounts it as a fallback after API routes. 114 - 115 - ### Package Structure 116 - 117 - ```text 118 - internal/view/ 119 - view.go # Handler(), route setup, embed directives 120 - templates/ 121 - layout.html # Shared shell (head, nav, footer) 122 - index.html # Search page 123 - docs/ 124 - index.html # API overview 125 - search.html # GET /search docs 126 - documents.html # GET /documents/{id} docs 127 - health.html # Health endpoints docs 128 - static/ 129 - style.css # All styles, single file 130 - search.js # Search fetch, render, pagination, filters 131 - ``` 132 - 133 - ### Embedding 134 - 135 - `view.go` uses `//go:embed` to bundle `templates/` and `static/`. Templates are parsed once at init. Static assets are served under `/static/` via `http.FileServer`. 136 - 137 - ### Routing 138 - 139 - `view.Handler()` returns a mux that handles: 140 - 141 - | Pattern | Handler | 142 - | --- | --- | 143 - | `GET /` | Render `index.html` | 144 - | `GET /docs` | Render `docs/index.html` | 145 - | `GET /docs/search` | Render `docs/search.html` | 146 - | `GET /docs/documents` | Render `docs/documents.html` | 147 - | `GET /docs/health` | Render `docs/health.html` | 148 - | `GET /static/*` | Serve embedded CSS/JS files | 149 - 150 - ## 9. Configuration 151 - 152 - Since the site is served by the same origin as the API, search requests use relative paths (`/search?q=...`). No `API_BASE` config needed — the browser's origin is the API. 153 - 154 - ## 10. Local Development 155 - 156 - Run `twister api` locally. The site is served at `http://localhost:8080/` alongside the API endpoints. No separate dev server or file server required. 157 - 158 - The API docs pages render without any indexed data. The search page needs a running indexer and populated database to return results. 159 - 160 - ## 11. Constraints 161 - 162 - - No dependencies besides Alpine via CDN. 163 - - Total site weight target: under 50 KB excluding fonts. 164 - - Works in modern browsers (last 2 versions of Chrome, Firefox, Safari). 165 - - All fetch calls include error handling for network failures and non-200 responses. 166 - - No CORS concerns — the site and API share an origin.
-23
docs/api/specs/README.md
··· 1 - --- 2 - title: "Twister — Technical Specification Index" 3 - updated: 2026-03-22 4 - --- 5 - 6 - # Twister Technical Specifications 7 - 8 - Twister is a Go-based index and search service for [Tangled](https://tangled.org) content on AT Protocol. 9 - It ingests records through [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap), denormalizes them into search documents and graph summaries, indexes them in [Turso/libSQL](https://docs.turso.tech), and exposes public APIs for search and index-backed data gaps. 10 - 11 - ## Specifications 12 - 13 - | # | Document | Description | 14 - | --- | ------------------------------------------ | --------------------------------------------------------------- | 15 - | 1 | [Architecture](01-architecture.md) | Purpose, goals, design principles, system context, tech choices | 16 - | 2 | [Tangled Lexicons](02-tangled-lexicons.md) | `sh.tangled.*` record schemas and fields | 17 - | 3 | [Data Model](03-data-model.md) | Database schema, search documents, sync state | 18 - | 4 | [Data Pipeline](04-data-pipeline.md) | Tap integration, normalization, failure handling | 19 - | 5 | [Search](05-search.md) | Search modes, API contract, scoring, filtering | 20 - | 6 | [Operations](06-operations.md) | Configuration, observability, security, deployment | 21 - | 7 | [Graph Backfill](07-graph-backfill.md) | Seed-based user discovery and content backfill | 22 - | 8 | [App Integration](08-app-integration.md) | Mobile-facing contracts for search and graph summaries | 23 - | 9 | [Search Site](09-search-site.md) | Static site for API docs and live search |
-41
docs/api/tasks/README.md
··· 1 - --- 2 - title: "Twister — Task Index" 3 - updated: 2026-03-22 4 - --- 5 - 6 - # Twister Tasks 7 - 8 - Assumes Go, Tap (deployed on Railway), Turso/libSQL, and Railway for deployment. 9 - 10 - ## Delivery Strategy 11 - 12 - Build in four phases: 13 - 14 - 1. **MVP** — ingestion, graph backfill, keyword search, deployment, operational tooling 15 - 2. **Semantic Search** — embeddings, vector retrieval 16 - 3. **Hybrid Search** — weighted merge of keyword + semantic 17 - 4. **Quality Polish** — ranking refinement, advanced filters, analytics 18 - 19 - Ship keyword search before embeddings. That gives a testable, inspectable baseline before introducing model behavior. 20 - Within MVP, run graph backfill before calling the environment search-ready for users. 21 - 22 - ## Phases 23 - 24 - | Phase | Title | Document | Status | 25 - | ----- | --------------- | ------------------------------------------ | --------------------------------------------------------------------- | 26 - | 1 | MVP | [phase-1-mvp.md](phase-1-mvp.md) | In progress (M0–M2 complete; backfill scheduled before public launch) | 27 - | 2 | Semantic Search | [phase-2-semantic.md](phase-2-semantic.md) | Not started | 28 - | 3 | Hybrid Search | [phase-3-hybrid.md](phase-3-hybrid.md) | Not started | 29 - | 4 | Quality Polish | [phase-4-quality.md](phase-4-quality.md) | Not started | 30 - 31 - ## MVP Complete When 32 - 33 - - Tap ingests tracked `sh.tangled.*` records 34 - - Documents normalize into a stable store 35 - - Keyword search works publicly 36 - - Index-backed profile summaries can fill public API gaps such as followers 37 - - API and indexer are deployed on Railway 38 - - Restart does not lose sync position 39 - - Reindex exists for repair 40 - - Graph backfill populates initial content from seed users 41 - - A static search site with API docs is publicly accessible
-388
docs/api/tasks/phase-1-mvp.md
··· 1 - --- 2 - title: "Phase 1 — MVP" 3 - updated: 2026-03-22 4 - --- 5 - 6 - # Phase 1 — MVP 7 - 8 - Get a searchable product online: ingestion, keyword search, deployment, and operational tooling. 9 - 10 - ## MVP Complete When 11 - 12 - - Tap ingests tracked `sh.tangled.*` records 13 - - Documents normalize into a stable store 14 - - Keyword search works publicly 15 - - API and indexer are deployed on Railway 16 - - Restart does not lose sync position 17 - - Reindex exists for repair 18 - - Graph backfill populates initial content from seed users 19 - - A static search site with API docs is publicly accessible 20 - 21 - ## M0 — Repository Bootstrap ✅ 22 - 23 - Executable layout, local tooling, and development conventions (completed 2026-03-22). 24 - 25 - ## M1 — Database Schema and Store Layer ✅ 26 - 27 - refs: [specs/03-data-model.md](../specs/03-data-model.md) 28 - 29 - Implemented the Turso/libSQL schema and Go store package for document persistence. 30 - 31 - ## M2 — Normalization Layer ✅ 32 - 33 - refs: [specs/02-tangled-lexicons.md](../specs/02-tangled-lexicons.md), [specs/04-data-pipeline.md](../specs/04-data-pipeline.md) 34 - 35 - Translate `sh.tangled.*` records into internal search documents. 36 - 37 - ## M3 — Tap Client and Ingestion Loop 38 - 39 - refs: [specs/04-data-pipeline.md](../specs/04-data-pipeline.md), [specs/01-architecture.md](../specs/01-architecture.md) 40 - 41 - ### Goal 42 - 43 - Connect the indexer to Tap (on Railway) and process live events into the store. 44 - 45 - ### Deliverables 46 - 47 - - Tap WebSocket client package (`internal/tapclient/`) 48 - - Event decode layer (record events + identity events) 49 - - Ingestion loop with retry/backoff 50 - - Cursor persistence coupled to successful DB commits 51 - - Identity event handler (DID → handle cache) 52 - 53 - ### Tasks 54 - 55 - - [x] Define Tap event DTOs matching the documented event shape: 56 - 57 - ```go 58 - type TapEvent struct { 59 - ID int64 `json:"id"` 60 - Type string `json:"type"` // "record" or "identity" 61 - Record *TapRecord `json:"record"` 62 - Identity *TapIdentity `json:"identity"` 63 - } 64 - type TapRecord struct { 65 - Live bool `json:"live"` 66 - Rev string `json:"rev"` 67 - DID string `json:"did"` 68 - Collection string `json:"collection"` 69 - RKey string `json:"rkey"` 70 - Action string `json:"action"` // "create", "update", "delete" 71 - CID string `json:"cid"` 72 - Record json.RawMessage `json:"record"` 73 - } 74 - type TapIdentity struct { 75 - DID string `json:"did"` 76 - Handle string `json:"handle"` 77 - IsActive bool `json:"isActive"` 78 - Status string `json:"status"` 79 - } 80 - ``` 81 - 82 - - [x] Implement WebSocket client: 83 - - Connect to `TAP_URL` (e.g., `wss://tap.railway.internal/channel`) 84 - - HTTP Basic auth with `admin:TAP_AUTH_PASSWORD` 85 - - Auto-reconnect with exponential backoff 86 - - Ack protocol: send event `id` back after successful processing 87 - - [x] Implement ingestion loop: 88 - 1. Receive event from WebSocket 89 - 2. If `type == "identity"` → update handle cache, ack, continue 90 - 3. If `type == "record"` → check collection allowlist 91 - 4. Map `action` to operation (create/update → upsert, delete → tombstone) 92 - 5. Decode `record.record` via adapter registry 93 - 6. Normalize to `Document` 94 - 7. Upsert to store 95 - 8. Schedule embedding job if eligible ([Phase 2](phase-2-semantic.md)) 96 - 9. Persist cursor (event ID) after successful DB commit 97 - 10. Ack the event 98 - - [x] Implement collection allowlist from `INDEXED_COLLECTIONS` config 99 - - [x] Handle state events (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) → update `record_state` 100 - - [x] Handle normalization failures: log, skip, advance cursor 101 - - [x] Handle DB failures: retry with backoff, do not advance cursor 102 - 103 - ### Exit Criteria 104 - 105 - The system continuously ingests and persists `sh.tangled.*` records from Tap. 106 - 107 - ## M4 — Graph Backfill from Seed Users 108 - 109 - refs: [specs/07-graph-backfill.md](../specs/07-graph-backfill.md) 110 - 111 - ### Goal 112 - 113 - Bootstrap the index with historical Tangled content by discovering and backfilling users from a curated seed set. 114 - 115 - ### Deliverables 116 - 117 - - `twister backfill` CLI command 118 - - Seed file parser and documented seed-file format 119 - - Graph fan-out discovery (follows and collaborators) 120 - - Tap `/repos/add` integration for discovered users 121 - - Deduplication against already-tracked repos 122 - - Dry-run mode and progress logging 123 - - Basic operator runbook for first bootstrap and repeat runs 124 - 125 - ### Tasks 126 - 127 - - [x] Implement `backfill` subcommand with flags: 128 - - `--seeds <file>` — required seed file path 129 - - `--max-hops <n>` — depth limit for fan-out (default: 2) 130 - - `--dry-run` — print the discovery plan without mutating Tap 131 - - `--concurrency <n>` — parallel discovery workers (default: 5) 132 - - `--batch-size <n>` — DIDs per `/repos/add` request 133 - - `--batch-delay <duration>` — delay between Tap registration batches 134 - - [x] Implement seed file parsing: 135 - - One DID or handle per line 136 - - `#` comments allowed 137 - - Blank lines ignored 138 - - Handles resolved to DIDs before graph expansion 139 - - [x] Decide and document the initial seed file location for operators: 140 - - Repository-managed example file for format/reference 141 - - Deployment-specific runtime file or mounted secret for real runs 142 - - Implemented: `docs/api/seeds.txt` and `packages/api/internal/backfill/doc.go` 143 - - [x] Implement graph discovery: 144 - 1. Start from hop-0 seed users 145 - 2. Fetch `sh.tangled.graph.follow` records and collect subject DIDs 146 - 3. Fetch repo collaborators by inspecting repos, issues, PRs, and comments 147 - 4. Enqueue newly discovered DIDs with hop metadata 148 - 5. Stop expanding beyond `max-hops` 149 - - [x] Track discovery metadata for logs: 150 - - source DID 151 - - hop depth 152 - - discovery reason (`seed`, `follow`, `collaborator`) 153 - - [x] Integrate with Tap admin endpoints: 154 - - `GET /info/:did` to skip already-tracked repos when practical 155 - - `POST /repos/add` to register new DIDs for backfill 156 - - [x] Make the command safe to re-run: 157 - - in-memory visited DID set during crawl 158 - - tolerate duplicate `/repos/add` 159 - - rely on index upsert idempotency for re-delivered records 160 - - [x] Add operator-friendly logging: 161 - - seed count 162 - - users discovered per hop 163 - - already-tracked vs newly-submitted DIDs 164 - - batch progress 165 - - final totals 166 - - [x] Add a short runbook covering: 167 - - first bootstrap against an empty database 168 - - repeat run after expanding the seed list 169 - - dry-run before production mutation 170 - - Implemented: `packages/api/internal/backfill/doc.go` 171 - 172 - ### Exit Criteria 173 - 174 - Operators can bootstrap an empty environment to a usable historical baseline before public rollout. 175 - 176 - ## M5 — Keyword Search API 177 - 178 - refs: [specs/05-search.md](../specs/05-search.md) 179 - 180 - ### Goal 181 - 182 - Expose a usable public search API backed by Turso's Tantivy-backed FTS. 183 - 184 - ### Deliverables 185 - 186 - - HTTP server (net/http) 187 - - `GET /healthz` — liveness 188 - - `GET /readyz` — readiness (DB connectivity) 189 - - `GET /search` — keyword search with configurable mode 190 - - `GET /search/keyword` — keyword-only search 191 - - `GET /documents/{id}` — document lookup 192 - - Search repository layer (FTS queries isolated from handlers) 193 - - Pagination, filtering, snippets 194 - 195 - ### Tasks 196 - 197 - - [x] Set up HTTP server with net/http router 198 - - [x] Implement `/healthz` (always 200) and `/readyz` (SELECT 1 against DB) 199 - - [x] Implement search repository with FTS queries: 200 - 201 - ```sql 202 - SELECT id, title, summary, repo_name, author_handle, collection, record_type, 203 - created_at, updated_at, 204 - fts_score(title, body, summary, repo_name, author_handle, tags_json, ?) AS score, 205 - fts_highlight(body, '<mark>', '</mark>', ?) AS body_snippet 206 - FROM documents 207 - WHERE fts_match(title, body, summary, repo_name, author_handle, tags_json, ?) 208 - AND deleted_at IS NULL 209 - ORDER BY score DESC 210 - LIMIT ? OFFSET ?; 211 - ``` 212 - 213 - - [x] Implement request validation: 214 - - `q` required, non-empty 215 - - `limit` 1–100, default 20 216 - - `offset` >= 0, default 0 217 - - Reject unknown parameters with 400 218 - - [x] Implement filters (as WHERE clauses): 219 - - `collection` → `d.collection = ?` 220 - - `type` → `d.record_type = ?` 221 - - `author` → `d.author_handle = ?` or `d.did = ?` 222 - - `repo` → `d.repo_name = ?` 223 - - [x] Implement `/documents/{id}` — full document response 224 - - [x] Implement stable JSON response contract (see spec 05-search.md) 225 - - [x] Exclude tombstoned documents (`deleted_at IS NOT NULL`) by default 226 - - [x] Add request logging middleware (method, path, status, duration) 227 - - [x] Add CORS headers if needed 228 - 229 - ### Exit Criteria 230 - 231 - A user can search Tangled content reliably with keyword search. 232 - 233 - ## M5a — Search Site ✅ 234 - 235 - refs: [specs/09-search-site.md](../specs/09-search-site.md) 236 - 237 - ### Goal 238 - 239 - Ship a static site that doubles as public API documentation and a live search demo. Alpine.js via CDN for reactivity, no build step. 240 - 241 - ### Deliverables 242 - 243 - - `internal/view/` package exporting `Handler() http.Handler` 244 - - Embedded templates (`templates/`) and static assets (`static/`) via `//go:embed` 245 - - Search page (`/`) wired to `GET /search` with result cards, filters, and pagination 246 - - API docs pages (`/docs/*`) covering search, documents, and health endpoints 247 - - Dark-mode-only styling with Google Sans fonts and minimal CSS tokens 248 - 249 - ### Tasks 250 - 251 - - [x] Create `internal/view/` package with `view.go`, `templates/`, and `static/` directories 252 - - [x] Implement `Handler()` that returns an `http.Handler` with routes for all pages and `/static/*` 253 - - [x] Embed templates and static assets via `//go:embed`; parse templates once at init 254 - - [x] Use a shared `layout.html` template for the shell (head, nav, footer) 255 - - [x] Mount `view.Handler()` in the `api` package router as a fallback after API routes 256 - - [x] Build search page: 257 - - Text input + submit 258 - - Fetch `GET /search` with relative path (same origin) 259 - - Render result cards with type badge, title, snippet (preserve `<mark>`), author, repo, relative time 260 - - "Load more" pagination via offset 261 - - Filter bar: type, language, author (reflected in URL query params) 262 - - Empty and error states 263 - - [x] Build API docs pages: 264 - - `/docs` — overview (base URL, response shape, no auth) 265 - - `/docs/search` — `GET /search` params, filters, example curl, example response 266 - - `/docs/documents` — `GET /documents/{id}` request/response 267 - - `/docs/health` — `GET /healthz`, `GET /readyz` 268 - - [x] Implement `style.css` with design tokens (`--bg`, `--surface`, `--border`, `--accent`, etc.) 269 - - [x] Load Google Sans and Google Sans Mono via Google Fonts `<link>` 270 - - [x] Result card links open canonical Tangled URLs in new tab 271 - - [x] Verify total site weight under 50 KB (excluding fonts and Alpine CDN) — 21 KB total 272 - 273 - ### Exit Criteria 274 - 275 - A user can search Tangled content and read API docs from a public URL without installing anything. 276 - 277 - ## M6 — Railway Deployment ✅ 278 - 279 - refs: [specs/06-operations.md](../specs/06-operations.md), [deploy.md](../deploy.md) 280 - 281 - ### Goal 282 - 283 - Deploy the API and indexer as Railway services alongside Tap. 284 - 285 - ### Deliverables 286 - 287 - - Finalized Dockerfile 288 - - Railway project with services: `api`, `indexer` 289 - - Health checks configured per service 290 - - Secrets/env vars set 291 - - Production startup commands documented 292 - 293 - ### Tasks 294 - 295 - - [x] Finalize Dockerfile (multi-stage, CGO_ENABLED=0, Alpine runtime) 296 - - [x] Create Railway services: 297 - - `api` — start command: `twister api` 298 - - `indexer` — start command: `twister indexer` 299 - - [x] Configure environment variables per service: 300 - - Shared: `TURSO_DATABASE_URL`, `TURSO_AUTH_TOKEN`, `LOG_LEVEL`, `LOG_FORMAT` 301 - - API: `HTTP_BIND_ADDR`, `SEARCH_DEFAULT_LIMIT`, `SEARCH_MAX_LIMIT` 302 - - Indexer: `TAP_URL` (reference Tap service domain), `TAP_AUTH_PASSWORD`, `INDEXED_COLLECTIONS` 303 - - [x] Configure health checks: 304 - - API: HTTP check on `/healthz` port 8080 305 - - Indexer: HTTP check on `/health` port 9090 306 - - [x] Use Railway internal networking for indexer → Tap connection 307 - - [x] Connect GitHub repo for autodeploy 308 - - [x] Test graceful shutdown on redeploy (SIGTERM handling) 309 - - [x] Document deploy steps 310 - 311 - ### Exit Criteria 312 - 313 - The system runs as a deployed service with health-checked processes on Railway. 314 - 315 - ## M7 — Reindex and Repair ✅ 316 - 317 - refs: [specs/05-search.md](../specs/05-search.md) 318 - 319 - ### Goal 320 - 321 - Make the system recoverable and operable with repair tools. 322 - 323 - ### Deliverables 324 - 325 - - `twister reindex` command with scoping options 326 - - Dry-run mode 327 - - Admin reindex endpoint 328 - - Progress logging and error summary 329 - 330 - ### Tasks 331 - 332 - - [x] Implement `reindex` subcommand with flags: 333 - - `--collection` — reindex one collection 334 - - `--did` — reindex one DID's documents 335 - - `--document` — reindex one document by ID 336 - - `--dry-run` — show intended work without writes 337 - - No flags → reindex all 338 - - [x] Implement reindex logic: 339 - 1. Select documents matching scope 340 - 2. For each document, re-run normalization from stored fields (or re-fetch if source available) 341 - 3. Update FTS-relevant fields 342 - 4. Upsert back to store 343 - 5. Run `OPTIMIZE INDEX idx_documents_fts` after bulk reindex to merge Tantivy segments 344 - 6. Log progress (N/total, errors) 345 - - [x] Implement `POST /admin/reindex` endpoint (behind `ENABLE_ADMIN_ENDPOINTS` + `ADMIN_AUTH_TOKEN`) 346 - - [x] Add error summary output on completion 347 - - [x] Exit non-zero on unrecoverable failures 348 - 349 - ### Exit Criteria 350 - 351 - Operators can repair bad indexes without rebuilding everything manually. 352 - 353 - ## M8 — Observability 354 - 355 - refs: [specs/06-operations.md](../specs/06-operations.md) 356 - 357 - ### Goal 358 - 359 - Make the system diagnosable in production. 360 - 361 - ### Deliverables 362 - 363 - - Structured slog fields across all services 364 - - Error classification 365 - - Ingestion lag visibility 366 - - Periodic state logs 367 - - Operator documentation 368 - 369 - ### Tasks 370 - 371 - - [ ] Standardize slog fields across all packages: 372 - - `service`, `event_name`, `event_id`, `did`, `collection`, `rkey`, `document_id`, `cursor`, `error_class`, `duration_ms` 373 - - [ ] Add error classification (normalize_error, db_error, tap_error, embed_error) 374 - - [ ] Add periodic state logs in indexer: 375 - - Current cursor position 376 - - Events processed since last log 377 - - Documents in store (count) 378 - - [ ] Add request logging in API (method, path, status, duration, query) 379 - - [ ] Add search latency logging per query mode 380 - - [ ] Write operator documentation: 381 - - Restart procedure 382 - - Reindex procedure 383 - - Backfill notes 384 - - Failure triage guide 385 - 386 - ### Exit Criteria 387 - 388 - The system is maintainable without guesswork.
-167
docs/api/tasks/phase-2-semantic.md
··· 1 - --- 2 - title: "Phase 2 — Semantic Search" 3 - updated: 2026-03-23 4 - --- 5 - 6 - # Phase 2 — Semantic Search 7 - 8 - Add embedding generation and vector-based retrieval on top of the keyword baseline, using self-hosted Ollama for embeddings instead of external API services. 9 - 10 - ## M8 — Ollama Sidecar and Embedding Pipeline 11 - 12 - refs: [specs/01-architecture.md](../specs/01-architecture.md), [specs/03-data-model.md](../specs/03-data-model.md), [specs/05-search.md](../specs/05-search.md) 13 - 14 - ### Goal 15 - 16 - Deploy Ollama as a Railway sidecar and add asynchronous embedding generation without blocking ingestion. 17 - 18 - ### Deliverables 19 - 20 - - Ollama Railway service running nomic-embed-text-v1.5 (or EmbeddingGemma) 21 - - `embedding_jobs` table operational (schema from M1) 22 - - `embed-worker` subcommand 23 - - Ollama-backed embedding provider (with interface for future alternatives) 24 - - Retry and dead-letter behavior 25 - - `twister reembed` command 26 - 27 - ### Tasks 28 - 29 - - [ ] Deploy Ollama on Railway: 30 - - Use the nomic-embed Railway template as a starting point 31 - - Configure as internal service (no public URL) 32 - - Pre-pull `nomic-embed-text` model on startup 33 - - Health check: `GET /api/tags` on port 11434 34 - - Resource budget: 1–2 GB RAM, 1–2 vCPU 35 - - [ ] Define embedding provider interface: 36 - 37 - ```go 38 - type EmbeddingProvider interface { 39 - Embed(ctx context.Context, texts []string) ([][]float32, error) 40 - Model() string 41 - Dimension() int 42 - } 43 - ``` 44 - 45 - - [ ] Implement Ollama provider using the official Go client: 46 - 47 - ```go 48 - import "github.com/ollama/ollama/api" 49 - 50 - // OllamaProvider calls Ollama's /api/embed endpoint 51 - // over Railway internal networking (ollama.railway.internal:11434) 52 - type OllamaProvider struct { 53 - client *api.Client 54 - model string // "nomic-embed-text" 55 - dim int // 768 56 - } 57 - ``` 58 - 59 - - Configure via `OLLAMA_URL` env var (default: `http://ollama.railway.internal:11434`) 60 - - Support batch embedding (Ollama accepts multiple inputs per request) 61 - - Timeout per request (default: 30s) 62 - - Connection health check on startup 63 - - [ ] Implement embedding input text composition (see spec 04-data-pipeline.md, section 5): 64 - `title\nrepo_name\nauthor_handle\ntags\nsummary\nbody` 65 - - [ ] Add job enqueueing: on document upsert, insert `embedding_jobs` row with `status=pending` 66 - - [ ] Implement `embed-worker` loop: 67 - 1. Poll for `pending` jobs (batch by `EMBEDDING_BATCH_SIZE`, default: 32) 68 - 2. Compose input text per document 69 - 3. Call Ollama provider 70 - 4. Store vectors in `document_embeddings` with `vector32(?)` 71 - 5. Mark job `completed` 72 - 6. On failure: increment `attempts`, set `last_error`, backoff 73 - 7. After max attempts: mark `dead` 74 - - [ ] Create DiskANN vector index (see spec 03 for tuning params): 75 - ```sql 76 - CREATE INDEX idx_embeddings_vec ON document_embeddings( 77 - libsql_vector_idx(embedding, 'metric=cosine') 78 - ); 79 - ``` 80 - - [ ] Implement `reembed` command (re-generate all embeddings, useful for model migration) 81 - - [ ] Skip deleted documents in embedding pipeline 82 - - [ ] Add health check endpoint for embed-worker (port 9091) 83 - - [ ] Add Ollama connectivity check to embed-worker readiness probe 84 - 85 - ### Model Selection Notes 86 - 87 - **nomic-embed-text-v1.5** is the default recommendation: 88 - - 137M parameters, 768-dimension vectors 89 - - Matryoshka support (can truncate to 64/128/256/512 dims for storage tradeoff) 90 - - 8192 token context window 91 - - ~262 MB at F16 quantization, ~500 MB RAM at runtime 92 - - Battle-tested with llama.cpp/Ollama, Railway template exists 93 - 94 - **EmbeddingGemma** is the quality alternative: 95 - - 308M parameters, 768-dimension vectors 96 - - Best MTEB scores for models under 500M parameters 97 - - <200 MB quantized, similar RAM footprint 98 - - Released Sept 2025, less deployment track record 99 - 100 - **all-minilm** is the budget fallback: 101 - - 23M parameters, 384-dimension vectors (requires schema change) 102 - - ~46 MB model, minimal resources 103 - - Suitable for testing or cost-constrained environments 104 - 105 - ### Verification 106 - 107 - - [ ] Ollama service starts on Railway and responds to health checks 108 - - [ ] Creating a new searchable document enqueues an embedding job 109 - - [ ] Worker processes the job and stores a vector in `document_embeddings` 110 - - [ ] Failed embedding calls retry with bounded attempts 111 - - [ ] Keyword search still works when embed-worker or Ollama is down 112 - - [ ] `reembed` regenerates embeddings for all eligible documents 113 - - [ ] Ollama connectivity failure is surfaced in embed-worker health check 114 - 115 - ### Exit Criteria 116 - 117 - Embeddings are produced asynchronously via self-hosted Ollama and stored durably in Turso. 118 - 119 - ## M9 — Semantic Search 120 - 121 - refs: [specs/05-search.md](../specs/05-search.md) 122 - 123 - ### Goal 124 - 125 - Expose vector-based semantic retrieval. 126 - 127 - ### Deliverables 128 - 129 - - `GET /search/semantic` endpoint 130 - - Query-time embedding (convert query text → vector via Ollama) 131 - - Vector similarity search via `vector_top_k` 132 - - Response parity with keyword search 133 - 134 - ### Tasks 135 - 136 - - [ ] Implement query embedding: call Ollama provider with user's query text 137 - - [ ] Cache query embeddings for identical queries within a short TTL (optional, reduces Ollama load) 138 - - [ ] Implement semantic search repository: 139 - 140 - ```sql 141 - SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle, 142 - d.collection, d.record_type, d.created_at, d.updated_at 143 - FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v 144 - JOIN document_embeddings e ON e.rowid = v.id 145 - JOIN documents d ON d.id = e.document_id 146 - WHERE d.deleted_at IS NULL; 147 - ``` 148 - 149 - - [ ] Normalize distance to relevance score: `score = 1.0 - (distance / 2.0)` 150 - - [ ] Apply same filters as keyword search (collection, author, repo, type) 151 - - [ ] Add timeout and cost controls (limit vector search to reasonable K) 152 - - [ ] Wire `/search/semantic` handler 153 - - [ ] Return `matched_by: ["semantic"]` in results 154 - - [ ] Graceful degradation: if Ollama is unreachable, return 503 for semantic search while keyword search remains available 155 - 156 - ### Verification 157 - 158 - - [ ] Semantically similar queries retrieve expected documents even with little lexical overlap 159 - - [ ] Documents without embeddings are omitted from semantic results 160 - - [ ] Semantic search returns the same JSON schema as keyword search 161 - - [ ] Latency is acceptable under small test load 162 - - [ ] Filters work correctly with semantic results 163 - - [ ] Semantic search degrades gracefully when Ollama is down 164 - 165 - ### Exit Criteria 166 - 167 - The API supports true semantic search over Tangled documents, powered entirely by self-hosted infrastructure.
-51
docs/api/tasks/phase-3-hybrid.md
··· 1 - --- 2 - title: "Phase 3 — Hybrid Search" 3 - updated: 2026-03-22 4 - --- 5 - 6 - # Phase 3 — Hybrid Search 7 - 8 - Merge lexical and semantic search into the default high-quality retrieval mode. 9 - 10 - ## M10 — Hybrid Search 11 - 12 - refs: [specs/05-search.md](../specs/05-search.md) 13 - 14 - ### Deliverables 15 - 16 - - `GET /search/hybrid` endpoint 17 - - Weighted score blending (keyword 0.65 + semantic 0.35) 18 - - Score normalization 19 - - Result deduplication 20 - - `matched_by` metadata showing which modes contributed 21 - 22 - ### Tasks 23 - 24 - - [ ] Implement hybrid search orchestrator: 25 - 1. Fetch top N keyword results (N=50 or configurable) 26 - 2. Fetch top N semantic results 27 - 3. Normalize keyword scores (min-max within result set) 28 - 4. Semantic scores already normalized (0–1) 29 - 5. Merge on `document_id` 30 - 6. For documents in both sets: `hybrid_score = 0.65 * keyword + 0.35 * semantic` 31 - 7. For documents in one set: use available score (other = 0) 32 - 8. Sort by hybrid_score descending 33 - 9. Deduplicate 34 - 10. Apply limit/offset 35 - - [ ] Populate `matched_by` field: `["keyword"]`, `["semantic"]`, or `["keyword", "semantic"]` 36 - - [ ] Make weights configurable via `HYBRID_KEYWORD_WEIGHT` / `HYBRID_SEMANTIC_WEIGHT` 37 - - [ ] Wire `/search/hybrid` handler 38 - - [ ] Make `/search?mode=hybrid` work 39 - 40 - ### Verification 41 - 42 - - [ ] Hybrid returns documents found by either source 43 - - [ ] Duplicates are merged correctly (no duplicate IDs in results) 44 - - [ ] Exact-match queries still favor lexical relevance 45 - - [ ] Exploratory natural-language queries improve over keyword-only results 46 - - [ ] Score ordering is stable across repeated runs on the same corpus 47 - - [ ] `matched_by` accurately reflects which modes produced each result 48 - 49 - ### Exit Criteria 50 - 51 - Hybrid search becomes the preferred default search mode.
-47
docs/api/tasks/phase-4-quality.md
··· 1 - --- 2 - title: "Phase 4 — Ranking and Quality Polish" 3 - updated: 2026-03-22 4 - --- 5 - 6 - # Phase 4 — Ranking and Quality Polish 7 - 8 - Improve search quality without changing the core architecture. 9 - 10 - ## M11 — Ranking and Quality Polish 11 - 12 - refs: [specs/05-search.md](../specs/05-search.md) 13 - 14 - ### Deliverables 15 - 16 - - Boosted field weighting refinement 17 - - Recency boost 18 - - Collection-aware ranking 19 - - Better snippets/highlights 20 - - Issue/PR state filtering 21 - - Star count as ranking signal 22 - - Optional query analytics 23 - 24 - ### Tasks 25 - 26 - - [ ] Tune FTS index weights based on real query results 27 - - [ ] Add small recency boost to ranking (e.g., decay function on `created_at`) 28 - - [ ] Add collection-aware ranking adjustments (repos ranked differently from comments) 29 - - [ ] Index `sh.tangled.repo.issue.comment` and `sh.tangled.repo.pull.comment` (P2 collections) 30 - - [ ] Aggregate `sh.tangled.feed.star` counts per repo and use as ranking signal 31 - - [ ] Implement `state` filter (open/closed/merged) using `record_state` table 32 - - [ ] Improve snippets: better truncation, multi-field highlights 33 - - [ ] Add curated relevance test fixtures (expected queries → expected top results) 34 - - [ ] Run `OPTIMIZE INDEX idx_documents_fts` as maintenance task 35 - - [ ] Optional: log queries for analytics (anonymized) 36 - 37 - ### Verification 38 - 39 - - [ ] Exact repo lookups reliably rank the repo first 40 - - [ ] Recent active content gets a reasonable small boost without overwhelming exact relevance 41 - - [ ] Snippets show useful matched context 42 - - [ ] Ranking regression tests catch obvious degradations 43 - - [ ] State filter correctly excludes closed/merged items when requested 44 - 45 - ### Exit Criteria 46 - 47 - Search quality is noticeably improved and more predictable.
-80
docs/app/specs/README.md
··· 1 - # Twisted — Tangled Mobile Companion 2 - 3 - A mobile-first Tangled client for iOS, Android, and web. Built with Ionic Vue, Capacitor, and the `@atcute` AT Protocol client stack. 4 - 5 - ## What is Tangled 6 - 7 - [Tangled](https://tangled.org) is a Git hosting and collaboration platform built on the [AT Protocol](https://atproto.com). Identity, social graph (follows, stars, reactions), repos, issues, and PRs are all AT Protocol records stored on users' Personal Data Servers. Git hosting runs on **knots** — headless servers exposing XRPC APIs. The **appview** at `tangled.org` aggregates and renders the network view. 8 - 9 - - Docs: <https://docs.tangled.org> 10 - - Lexicon namespace: `sh.tangled.*` 11 - - Source: <https://tangled.org/tangled.org/core> 12 - 13 - ## What Twisted Does 14 - 15 - **Reader and social companion** for Tangled. Focused on direct browsing, indexed discovery, and lightweight interactions. 16 - 17 - - Browse repos, files, READMEs, issues, PRs 18 - - Jump to profiles and repos from a known AT Protocol handle 19 - - Search indexed repos and profiles through the Twister API 20 - - Use index-backed graph summaries where the public API is incomplete 21 - - Sign in via AT Protocol OAuth 22 - - Star repos, follow users, react to content 23 - - Offline-capable with cached data 24 - 25 - Out of scope: repo creation, git push/pull, CI/CD, full code review authoring. 26 - 27 - ## Technology 28 - 29 - | Layer | Choice | 30 - | ----------- | --------------------------------------------------------------------------------------------- | 31 - | Framework | Vue 3 + TypeScript | 32 - | UI | Ionic Vue | 33 - | Native | Capacitor (iOS, Android, Web) | 34 - | State | Pinia | 35 - | Async data | TanStack Query (Vue) | 36 - | AT Protocol | `@atcute/client` (XRPC), `@atcute/oauth-browser-client` (OAuth), `@atcute/tangled` (lexicons) | 37 - 38 - ## Architecture 39 - 40 - Three layers, strict dependency direction (presentation → domain → data): 41 - 42 - **Presentation** — Ionic pages, Vue components, composables, Pinia stores. 43 - **Domain** — Normalized models (`UserSummary`, `RepoDetail`, `ActivityItem`, etc.), action policies, pagination. 44 - **Data** — `@atcute/client` XRPC calls, `@atcute/tangled` type definitions, local cache, and the Twister API for search/index-backed summaries. 45 - 46 - Protocol isolation: no Vue component imports `@atcute/*` directly. All API access flows through `src/services/`. 47 - 48 - ## Tangled API Surface 49 - 50 - Three distinct data hosts: 51 - 52 - | Host | Protocol | Data | 53 - | ---------------------------------- | ---------------------------------- | ----------------------------------------------------------------- | 54 - | Knots (`us-west.tangled.sh`, etc.) | XRPC at `/xrpc/sh.tangled.*` | Git data: trees, blobs, commits, branches, diffs, tags | 55 - | User's PDS | XRPC at `/xrpc/com.atproto.repo.*` | AT Protocol records: repos, issues, PRs, stars, follows, profiles | 56 - | Twister API | HTTP JSON | Global search and index-backed graph/profile summaries | 57 - 58 - The appview (`tangled.org`) serves HTML — it's the web UI, not a JSON API. The mobile client talks to knots and PDS servers directly for canonical detail and uses the Twister API for cross-network discovery. 59 - 60 - Repo param format: `did:plc:xxx/repoName`. 61 - 62 - ## Phases 63 - 64 - | Phase | Focus | Spec | Tasks | 65 - | ----- | ------------------------------------------------------------------------ | ------------------------------------ | ------------------------------------ | 66 - | 1 | Project shell, tabs, mock data, design system | [phase-1.md](phase-1.md) | [../tasks/phase-1.md](../tasks/phase-1.md) | 67 - | 2 | Public browsing — repos, files, profiles, issues, PRs | [phase-2.md](phase-2.md) | [../tasks/phase-2.md](../tasks/phase-2.md) | 68 - | 3 | Index-backed search and handle-first public browsing | [phase-3.md](phase-3.md) | [../tasks/phase-3.md](../tasks/phase-3.md) | 69 - | 4 | OAuth sign-in, star, follow, react, personalized feed | [phase-4.md](phase-4.md) | [../tasks/phase-4.md](../tasks/phase-4.md) | 70 - | 5 | Offline persistence, performance, bundle optimization | [phase-5.md](phase-5.md) | [../tasks/phase-5.md](../tasks/phase-5.md) | 71 - | 6 | Write features, project service integration, push notifications | [phase-6.md](phase-6.md) | [../tasks/phase-6.md](../tasks/phase-6.md) | 72 - | 7 | Real-time Jetstream feed, custom feeds, forking, labels, interdiff | [phase-7.md](phase-7.md) | [../tasks/phase-7.md](../tasks/phase-7.md) | 73 - 74 - ## Key Design Decisions 75 - 76 - 1. **`@atcute` end-to-end** for all AT Protocol interaction — no mixing client stacks. 77 - 2. **Tangled lexicon handling in one module boundary** (`src/services/tangled/`) — don't scatter `sh.tangled.*` awareness across pages. 78 - 3. **Read-first** — the primary product is a fast reader. Social mutations are a controlled second layer. 79 - 4. **Use the project API sparingly and intentionally.** Search and index-backed graph gaps belong there; canonical repo detail stays on Tangled's public APIs. 80 - 5. **Mobile-first, not desktop-forge-first** — prioritize readability, direct browsing, and small focused actions before broader discovery surfaces.
-180
docs/app/specs/phase-1.md
··· 1 - # Phase 1 — Project Shell & Design System 2 - 3 - ## Goal 4 - 5 - Scaffold the Ionic Vue project with tab navigation, placeholder pages, mock data, and reusable UI primitives. Nothing touches the network. The result is a clickable prototype that validates navigation, layout, and component design before any API integration. 6 - 7 - ## Technology Stack 8 - 9 - | Layer | Choice | 10 - | -------------- | ----------------------- | 11 - | Framework | Vue 3 + TypeScript | 12 - | UI kit | Ionic Vue | 13 - | Native runtime | Capacitor | 14 - | State | Pinia | 15 - | Async data | TanStack Query (Vue) | 16 - | Routing | Vue Router (Ionic tabs) | 17 - 18 - ## Navigation Structure 19 - 20 - Five-tab layout: 21 - 22 - 1. **Home** — trending repos, recent activity, personalized content (auth) 23 - 2. **Explore** — search repos/users, filters 24 - 3. **Repo** — deep-link target for repository detail (not a persistent tab icon — navigated to from Home/Explore/Activity) 25 - 4. **Activity** — global feed (anon), social graph feed (auth) 26 - 5. **Profile** — auth state, user card, follows, starred repos, settings 27 - 28 - > Repo is a routed detail destination, not a standing tab. The tab bar shows Home, Explore, Activity, Profile. Repo pages are pushed onto the Home/Explore/Activity stacks. 29 - 30 - ## Directory Layout 31 - 32 - ```sh 33 - src/ 34 - app/ 35 - router/ # route definitions, tab guards 36 - boot/ # app-level setup (query client, plugins) 37 - providers/ # provide/inject wrappers 38 - core/ 39 - config/ # env, feature flags 40 - errors/ # error types and normalization 41 - storage/ # storage abstraction (IndexedDB / Capacitor Secure Storage) 42 - query/ # TanStack Query client config, persister setup 43 - auth/ # auth state machine, session store 44 - services/ 45 - atproto/ # @atcute/client wrapper, identity helpers 46 - tangled/ # Tangled API: endpoints, adapters, normalizers, queries, mutations 47 - domain/ 48 - models/ # UserSummary, RepoSummary, RepoDetail, etc. 49 - feed/ # feed-specific types and helpers 50 - repo/ # repo-specific types and helpers 51 - profile/ # profile-specific types and helpers 52 - features/ 53 - home/ 54 - explore/ 55 - repo/ 56 - activity/ 57 - profile/ 58 - components/ 59 - common/ # cards, buttons, loaders, empty states, error boundaries 60 - repo/ # repo card, file tree item, README viewer 61 - feed/ # activity card, feed list 62 - profile/ # user card, follow button 63 - ``` 64 - 65 - ## Domain Models 66 - 67 - ```ts 68 - export type UserSummary = { 69 - did: string; 70 - handle: string; 71 - displayName?: string; 72 - avatar?: string; 73 - bio?: string; 74 - followerCount?: number; 75 - followingCount?: number; 76 - }; 77 - 78 - export type RepoSummary = { 79 - atUri: string; 80 - ownerDid: string; 81 - ownerHandle: string; 82 - name: string; 83 - description?: string; 84 - primaryLanguage?: string; 85 - stars?: number; 86 - forks?: number; 87 - updatedAt?: string; 88 - knot: string; 89 - }; 90 - 91 - export type RepoDetail = RepoSummary & { 92 - readme?: string; 93 - defaultBranch?: string; 94 - languages?: Record<string, number>; 95 - collaborators?: UserSummary[]; 96 - topics?: string[]; 97 - }; 98 - 99 - export type RepoFile = { 100 - path: string; 101 - name: string; 102 - type: "file" | "dir" | "submodule"; 103 - size?: number; 104 - lastCommitMessage?: string; 105 - }; 106 - 107 - export type PullRequestSummary = { 108 - atUri: string; 109 - title: string; 110 - authorDid: string; 111 - authorHandle: string; 112 - status: "open" | "merged" | "closed"; 113 - createdAt: string; 114 - updatedAt?: string; 115 - sourceBranch: string; 116 - targetBranch: string; 117 - roundCount?: number; 118 - }; 119 - 120 - export type IssueSummary = { 121 - atUri: string; 122 - title: string; 123 - authorDid: string; 124 - authorHandle: string; 125 - state: "open" | "closed"; 126 - createdAt: string; 127 - commentCount?: number; 128 - }; 129 - 130 - export type ActivityItem = { 131 - id: string; 132 - kind: 133 - | "repo_created" 134 - | "repo_starred" 135 - | "user_followed" 136 - | "pr_opened" 137 - | "pr_merged" 138 - | "issue_opened" 139 - | "issue_closed"; 140 - actorDid: string; 141 - actorHandle: string; 142 - targetUri?: string; 143 - targetName?: string; 144 - createdAt: string; 145 - }; 146 - ``` 147 - 148 - ## Repo Detail Page Structure 149 - 150 - Segmented tab layout within the repo detail view: 151 - 152 - | Segment | Content | 153 - | -------- | ------------------------------------------------------------------------------------ | 154 - | Overview | owner/repo header, description, topics, social action buttons, README preview, stats | 155 - | Files | directory tree, file viewer (syntax-highlighted) | 156 - | Issues | issue list with state filters | 157 - | PRs | pull request list with status filters | 158 - 159 - ## Design System Primitives 160 - 161 - Build these reusable components during this phase: 162 - 163 - - **RepoCard** — compact repo summary for lists 164 - - **UserCard** — avatar + handle + bio snippet 165 - - **ActivityCard** — icon + actor + verb + target + timestamp 166 - - **FileTreeItem** — icon (file/dir) + name + last commit message 167 - - **EmptyState** — icon + message + optional action button 168 - - **ErrorBoundary** — catch + retry UI 169 - - **SkeletonLoader** — content placeholder shimmer for each card type 170 - - **MarkdownRenderer** — render README content (Phase 2 will wire to real data) 171 - 172 - ## Mock Data 173 - 174 - Create `src/mocks/` with factory functions returning typed domain models. All Phase 1 screens render from these factories. Mock data must be realistic — use real-looking handles (`alice.tngl.sh`), repo names, and timestamps. 175 - 176 - ## Performance Targets 177 - 178 - - Shell first-paint under 2s on mid-range device 179 - - Tab switches feel instant (no layout shift) 180 - - Skeleton loaders shown within 100ms of navigation
-120
docs/app/specs/phase-2.md
··· 1 - # Phase 2 — Public Tangled Browsing 2 - 3 - ## Goal 4 - 5 - Replace mock data on the shippable public-browsing surface with live Tangled API calls. Users can browse repos, profiles, file trees, README content, issues, and pull requests without signing in. Public entry points are intentionally scoped down for now: Home is a known-handle jump surface, while Explore and Activity remain clearly labeled placeholders until their dedicated work lands. 6 - 7 - ## Protocol Stack 8 - 9 - | Package | Version | Role | 10 - | ----------------- | ------- | ---------------------------------------------- | 11 - | `@atcute/client` | ^4.2.1 | XRPC HTTP client — `query()` and `procedure()` | 12 - | `@atcute/tangled` | ^1.0.17 | `sh.tangled.*` lexicon type definitions | 13 - 14 - All protocol access goes through `src/services/tangled/`. No Vue component may import `@atcute/*` directly. 15 - 16 - ## Architecture: Protocol Isolation 17 - 18 - ```sh 19 - Vue component 20 - → composable (useRepoDetail, useFileTree, ...) 21 - → TanStack Query hook 22 - → service function (services/tangled/queries.ts) 23 - → @atcute/client XRPC call 24 - → normalizer (services/tangled/normalizers.ts) 25 - → domain model 26 - ``` 27 - 28 - ### Service Layer Responsibilities 29 - 30 - **`services/atproto/client.ts`** — singleton `XRPC` client instance, base URL config, error interceptor. 31 - 32 - **`services/tangled/endpoints.ts`** — typed wrappers around XRPC queries: 33 - 34 - | Endpoint | Params | Returns | 35 - | ---------------------------------- | ------------------------------------------- | ------------------- | 36 - | `sh.tangled.repo.tree` | `repo: did:plc:xxx/name`, `ref`, `path?` | directory listing | 37 - | `sh.tangled.repo.blob` | `repo`, `ref`, `path` | file content | 38 - | `sh.tangled.repo.log` | `repo`, `ref`, `path?`, `limit?`, `cursor?` | commit history | 39 - | `sh.tangled.repo.branches` | `repo`, `limit?`, `cursor?` | branch list | 40 - | `sh.tangled.repo.tags` | `repo` | tag list | 41 - | `sh.tangled.repo.getDefaultBranch` | `repo` | default branch name | 42 - | `sh.tangled.repo.diff` | `repo`, `ref` | diff output | 43 - | `sh.tangled.repo.compare` | `repo`, `rev1`, `rev2` | comparison | 44 - | `sh.tangled.repo.languages` | `repo` | language breakdown | 45 - 46 - The `repo` param format is `did:plc:xxx/repoName`. The XRPC calls go to the repo's **knot** hostname (e.g., `us-west.tangled.sh`), not to `tangled.org`. 47 - 48 - **`services/tangled/normalizers.ts`** — transform raw lexicon responses into domain models (`RepoSummary`, `RepoDetail`, `RepoFile`, etc.). 49 - 50 - **`services/tangled/queries.ts`** — TanStack Query wrapper functions with cache keys, stale times, and error handling. 51 - 52 - ## Appview vs Knot Routing 53 - 54 - Tangled has two API surfaces: 55 - 56 - | Surface | Host | Protocol | Used for | 57 - | ------- | -------------------------- | --------------------------- | ------------------------------------------------- | 58 - | Appview | `tangled.org` | HTTP (HTML, HTMX) | Profile pages, repo listings, timeline, search | 59 - | Knots | `us-west.tangled.sh`, etc. | XRPC (`/xrpc/sh.tangled.*`) | Git data — trees, blobs, commits, branches, diffs | 60 - 61 - For Phase 2, git data comes from knots via XRPC. Profile and repo metadata come from PDS records queried through `com.atproto.repo.getRecord` and `com.atproto.repo.listRecords`, not from the HTML appview. The service layer must route requests to the correct host based on the operation. 62 - 63 - ## Features 64 - 65 - ### Repository Browsing 66 - 67 - - List repos for a user (from their PDS records or appview) 68 - - Repo overview: metadata, description, topics, default branch, language stats 69 - - README rendering: fetch blob for `README.md` from default branch, render markdown 70 - - File tree: navigate directories, open files 71 - - File viewer: syntax-highlighted source display 72 - - Commit log: paginated history for a ref/path 73 - - Branch list with default branch indicator 74 - 75 - ### Profile Browsing 76 - 77 - - View user profile: avatar, bio, links, pronouns, location, pinned repos 78 - - Profile data comes from `sh.tangled.actor.profile` record (key: `self`) on the user's PDS 79 - - List user's repos 80 - 81 - ### Public Discovery (scoped down) 82 - 83 - - Home acts as the temporary public entry point: enter a known AT Protocol handle, then jump to profile or browse that handle's repos 84 - - Explore remains visible as a placeholder for future search work, but should not pretend global search already exists 85 - - Activity remains visible as a placeholder for future feed work, but should not pretend a public timeline already exists 86 - - Unsupported global search/trending behavior should be omitted or clearly labeled as future work, never filled with silent mock data 87 - 88 - ### Pull Requests (read-only) 89 - 90 - - List PRs for a repo with status filter (open/closed/merged) 91 - - PR detail: title, body, author, source/target branches, round count 92 - - PR comments list 93 - 94 - ### Issues (read-only) 95 - 96 - - List issues for a repo with state filter (open/closed) 97 - - Issue detail: title, body, author 98 - - Issue comments (threaded — `replyTo` field) 99 - 100 - ## Caching Strategy 101 - 102 - | Data | Stale time | Cache time | 103 - | ------------- | ---------- | ---------- | 104 - | Repo metadata | 5 min | 30 min | 105 - | File tree | 2 min | 10 min | 106 - | File content | 5 min | 30 min | 107 - | Commit log | 2 min | 10 min | 108 - | Profile | 10 min | 60 min | 109 - | README | 5 min | 30 min | 110 - 111 - Use TanStack Query's `staleTime` and `gcTime`. Add a query persister (IndexedDB-backed) for offline reads. 112 - 113 - ## Error Handling 114 - 115 - Normalize these failure modes at the service layer: 116 - 117 - - Network unreachable → offline banner, serve from cache 118 - - 404 from knot → "Repository not found" or "File not found" 119 - - XRPC error responses → map to typed app errors 120 - - Malformed response → log + generic error state
-68
docs/app/specs/phase-3.md
··· 1 - # Phase 3 — Indexed Search and Honest Discovery 2 - 3 - ## Goal 4 - 5 - Introduce global discovery through the Twister project index while preserving honest product boundaries. Home continues to support direct known-handle browsing, Explore becomes index-backed search, and Activity remains a clearly labeled in-progress surface. 6 - 7 - ## Current Product Shape 8 - 9 - ### Home 10 - 11 - Home is the temporary public entry point for unauthenticated browsing: 12 - 13 - - Enter a known AT Protocol handle 14 - - Open that user's profile directly 15 - - Resolve the handle to DID + PDS via AT Protocol identity 16 - - List that user's public Tangled repos inline and open one directly 17 - 18 - This keeps public browsing fully real while still giving the app a lightweight direct-entry path. 19 - 20 - ### Explore 21 - 22 - Explore becomes the network-level discovery surface: 23 - 24 - - Global repo search via the Twister index 25 - - Global profile search via the Twister index 26 - - Empty state should clearly distinguish "index unavailable" from "no results" 27 - - Search results route into the existing profile and repo detail screens 28 - 29 - ### Activity 30 - 31 - Activity also remains a tab-level placeholder: 32 - 33 - - No public timeline yet 34 - - No curated public feed fallback 35 - - Empty state should explicitly say activity is in progress 36 - 37 - ## Identity and Routing 38 - 39 - The app now uses two read paths: 40 - 41 - 1. **Direct handle browsing** 42 - Resolve `handle -> DID` via `com.atproto.identity.resolveHandle` 43 - Fetch the DID document and extract the PDS endpoint 44 - Query the user's PDS for `sh.tangled.repo` records via `com.atproto.repo.listRecords` 45 - 2. **Indexed discovery** 46 - Query the Twister API for global search results 47 - Open the selected profile or repo in the existing screens 48 - Continue detail fetching from Tangled's public APIs 49 - 50 - The Twister API is additive, not authoritative for repo detail. It fills discovery and graph gaps; knots and PDSes remain the source of truth for detail screens. 51 - 52 - ## UI Expectations 53 - 54 - - Home shows one handle input plus explicit actions for profile jump and repo browsing 55 - - Home shows loading, invalid-handle, no-repos, and resolved-repo-list states 56 - - Explore shows a working search form, loading state, index-unavailable state, and no-results state 57 - - Activity shows a static in-progress empty state 58 - - Profile may show index-backed follower/following summaries when available 59 - 60 - ## Deferred Work 61 - 62 - The following work is intentionally deferred out of this phase: 63 - 64 - - Trending or suggested discovery sections 65 - - Public activity feed ingestion, pagination, and caching 66 - - Jetstream or appview timeline investigation 67 - 68 - These capabilities will be revisited after the baseline search and graph-summary integration is stable.
-167
docs/app/specs/phase-4.md
··· 1 - # Phase 4 — OAuth & Social Features 2 - 3 - ## Goal 4 - 5 - Add AT Protocol OAuth sign-in and authenticated social actions: follow, star, react. Signed-in users get a personalized feed. 6 - 7 - ## Authentication 8 - 9 - ### Package 10 - 11 - `@atcute/oauth-browser-client` ^3.0.0 — minimal browser OAuth client for AT Protocol. 12 - 13 - ### OAuth Flow 14 - 15 - 1. User enters handle or DID 16 - 2. Resolve handle → DID → PDS → authorization server metadata 17 - 3. Initiate OAuth with PKCE + DPoP (P-256) 18 - 4. Redirect to authorization server 19 - 5. Callback with auth code 20 - 6. Exchange code for access + refresh tokens 21 - 7. Store session, bind to XRPC client 22 - 23 - ### Key Functions 24 - 25 - | Function | Purpose | 26 - | -------------------------- | ------------------------------------------------------------ | 27 - | `configureOAuth(opts)` | One-time setup: client metadata URL, redirect URI | 28 - | `getSession(did)` | Resume existing session (returns `Session` with `dpopFetch`) | 29 - | `listStoredSessions()` | List all stored accounts | 30 - | `deleteStoredSession(did)` | Remove stored session | 31 - 32 - ### Session Object 33 - 34 - A `Session` provides: 35 - 36 - - `did` — authenticated user's DID 37 - - `dpopFetch` — a `fetch` wrapper that auto-attaches DPoP + access token headers 38 - - Token refresh is handled internally 39 - 40 - ### Client Metadata 41 - 42 - The mobile app needs its own OAuth client metadata hosted at a public URL: 43 - 44 - ```json 45 - { 46 - "client_id": "https://your-app-domain/oauth/client-metadata.json", 47 - "client_name": "Twisted", 48 - "client_uri": "https://your-app-domain", 49 - "redirect_uris": ["https://your-app-domain/oauth/callback"], 50 - "grant_types": ["authorization_code", "refresh_token"], 51 - "response_types": ["code"], 52 - "token_endpoint_auth_method": "none", 53 - "application_type": "web", 54 - "dpop_bound_access_tokens": true, 55 - "scope": "atproto repo:sh.tangled.graph.follow repo:sh.tangled.feed.star repo:sh.tangled.feed.reaction repo:sh.tangled.actor.profile" 56 - } 57 - ``` 58 - 59 - Request only the scopes needed for Phase 4 social features. Expand scopes in later phases as write features are added. 60 - 61 - ### Capacitor Considerations 62 - 63 - - Web: standard redirect flow works 64 - - iOS/Android via Capacitor: use `App.addListener('appUrlOpen')` to capture the OAuth callback via deep link or custom URL scheme 65 - - Session storage: abstract behind `core/storage/` — use `localStorage` on web, Capacitor Secure Storage plugin on native 66 - 67 - ### Auth State Machine 68 - 69 - ```sh 70 - idle → authenticating → authenticated 71 - → error 72 - authenticated → refreshing → authenticated 73 - → expired → idle 74 - authenticated → logging_out → idle 75 - ``` 76 - 77 - Store in Pinia (`core/auth/`). Expose via `useAuth()` composable. 78 - 79 - ## Social Actions 80 - 81 - All social actions create or delete AT Protocol records on the user's PDS via the XRPC `com.atproto.repo.createRecord` / `com.atproto.repo.deleteRecord` procedures. The `dpopFetch` from the session handles auth. 82 - 83 - ### Star a Repo 84 - 85 - Create record: 86 - 87 - ```json 88 - { 89 - "repo": "did:plc:user", 90 - "collection": "sh.tangled.feed.star", 91 - "record": { 92 - "$type": "sh.tangled.feed.star", 93 - "subject": "at://did:plc:owner/sh.tangled.repo/tid", 94 - "createdAt": "2026-03-22T00:00:00Z" 95 - } 96 - } 97 - ``` 98 - 99 - Unstar: delete the record by its `rkey`. 100 - 101 - ### Follow a User 102 - 103 - Create record: 104 - 105 - ```json 106 - { 107 - "repo": "did:plc:user", 108 - "collection": "sh.tangled.graph.follow", 109 - "record": { 110 - "$type": "sh.tangled.graph.follow", 111 - "subject": "did:plc:target", 112 - "createdAt": "2026-03-22T00:00:00Z" 113 - } 114 - } 115 - ``` 116 - 117 - Unfollow: delete the record by its `rkey`. 118 - 119 - ### React to Content 120 - 121 - Create record: 122 - 123 - ```json 124 - { 125 - "repo": "did:plc:user", 126 - "collection": "sh.tangled.feed.reaction", 127 - "record": { 128 - "$type": "sh.tangled.feed.reaction", 129 - "subject": "at://did:plc:owner/sh.tangled.repo.pull/tid", 130 - "reaction": "thumbsup", 131 - "createdAt": "2026-03-22T00:00:00Z" 132 - } 133 - } 134 - ``` 135 - 136 - Available reactions: `thumbsup`, `thumbsdown`, `laugh`, `tada`, `confused`, `heart`, `rocket`, `eyes`. 137 - 138 - ### Optimistic Updates 139 - 140 - All mutations use TanStack Query's `useMutation` with optimistic updates: 141 - 142 - 1. Immediately update the cache (star count +1, follow state toggled) 143 - 2. Fire the mutation 144 - 3. On error, roll back the cache and show a toast 145 - 146 - ## Personalized Feed 147 - 148 - When signed in, the Activity tab shows a filtered feed based on: 149 - 150 - - Users the signed-in user follows 151 - - Repos the signed-in user has starred 152 - 153 - Implementation depends on what the appview provides. If no personalized endpoint exists, filter the global feed client-side based on the user's follow/star records. 154 - 155 - ## Profile Tab (Authenticated) 156 - 157 - When signed in, the Profile tab shows: 158 - 159 - - User's avatar, handle, bio, location, pronouns, links 160 - - Pinned repos 161 - - Stats (selected from: merged PRs, open PRs, open issues, repo count, star count) 162 - - Starred repos list 163 - - Following/followers lists 164 - - Edit profile (avatar, bio, links, pinned repos) 165 - - Settings 166 - - Logout 167 - - Account switcher (multiple account support via `listStoredSessions`)
-73
docs/app/specs/phase-5.md
··· 1 - # Phase 5 — Offline & Performance Polish 2 - 3 - ## Goal 4 - 5 - Make the app feel native. Cached data loads instantly, offline mode is graceful, and navigation is smooth on mid-range devices. 6 - 7 - ## Offline Strategy 8 - 9 - ### Query Persistence 10 - 11 - Use TanStack Query's `persistQueryClient` with an IndexedDB adapter: 12 - 13 - - Persist all query cache to IndexedDB on each update (debounced) 14 - - On app launch, hydrate TanStack Query cache from IndexedDB before rendering 15 - - Stale-while-revalidate: show persisted data immediately, refresh in background 16 - 17 - ### What to Persist 18 - 19 - | Data | Max cached items | TTL | 20 - | ------------------------------ | ---------------- | ------ | 21 - | Repo metadata | 200 | 7 days | 22 - | File trees | 50 | 3 days | 23 - | File content (recently viewed) | 100 | 3 days | 24 - | README content | 100 | 7 days | 25 - | User profiles | 100 | 7 days | 26 - | Activity feed pages | 10 pages | 1 day | 27 - | Search results | 20 queries | 1 day | 28 - 29 - ### Offline Detection 30 - 31 - - Listen to `navigator.onLine` + `online`/`offline` events 32 - - Show a persistent banner when offline: "You're offline — showing cached data" 33 - - Disable mutation buttons (star, follow) when offline 34 - - Queue mutations for retry when back online (optional, simple queue) 35 - 36 - ### Sensitive Data 37 - 38 - - Auth tokens: Capacitor Secure Storage on native, encrypted `localStorage` wrapper on web 39 - - Never persist tokens in IndexedDB alongside query cache 40 - - Clear auth storage on logout 41 - 42 - ## Performance Optimizations 43 - 44 - ### Navigation 45 - 46 - - Prefetch repo detail data on repo card hover/long-press 47 - - Keep previous tab's scroll position and data in memory (Ionic's `ion-router-outlet` + `keep-alive`) 48 - - Use `<ion-virtual-scroll>` or a virtualized list for long lists (repos, activity feed) 49 - 50 - ### Images 51 - 52 - - Lazy-load avatars with `loading="lazy"` or Intersection Observer 53 - - Use `avatar.tangled.sh` CDN URLs with size params if available 54 - - Placeholder avatar component with initials fallback 55 - 56 - ### Bundle 57 - 58 - - Route-level code splitting per feature folder 59 - - Tree-shake unused Ionic components 60 - - Measure and optimize with Lighthouse 61 - 62 - ### Rendering 63 - 64 - - Skeleton screens for every data-driven view (already built in Phase 1) 65 - - Debounce search input (already in Phase 3) 66 - - Throttle scroll-based pagination triggers 67 - 68 - ## Testing Focus 69 - 70 - - Offline → online transition: verify data refreshes without duplicates 71 - - Large repo file trees: ensure virtual scroll handles 1000+ items 72 - - Low-bandwidth simulation: verify skeleton → content transitions 73 - - Memory pressure: verify cache eviction works and app doesn't grow unbounded
-75
docs/app/specs/phase-6.md
··· 1 - # Phase 6 — Write Features & Project Services 2 - 3 - ## Goal 4 - 5 - Add authenticated write operations (create issues, comment on PRs/issues, edit profile) and extend the Twister project services only where the client should not or cannot do the work directly. 6 - 7 - ## Why Project Services 8 - 9 - Some operations are awkward or unsafe from a browser client: 10 - 11 - - **Token hardening**: DPoP keys in browser storage are less secure than server-held credentials 12 - - **Unstable procedures**: Tangled's API may change — a backend adapter isolates the mobile client from churn 13 - - **Push notifications**: require server-side registration and delivery 14 - - **Personalized feeds**: server-side aggregation is more efficient than client-side filtering 15 - - **Graph gaps**: follower lists/counts and other cross-network summaries may require index-backed derivation 16 - - **Rate limiting**: backend can batch and deduplicate requests 17 - 18 - ### Service Scope 19 - 20 - Thin service layer — not a replacement for Tangled's public APIs. Use it for cross-network aggregation, search, notifications, and operations the SPA should not own. 21 - 22 - | Endpoint | Purpose | 23 - | ------------------------------------ | --------------------------------------------------- | 24 - | `POST /auth/session` | OAuth token exchange and session management | 25 - | `GET /feed/personalized` | Pre-filtered activity feed for the user | 26 - | `GET /search`, `GET /profiles/:did/summary` | Search and index-backed graph/profile summaries | 27 - | `POST /notifications/register` | Push notification device registration | 28 - | Passthrough for stable XRPC calls | Avoid duplicating what the client already does well | 29 - 30 - ## Write Features 31 - 32 - ### Create Issue 33 - 34 - - Screen: issue creation form within repo detail 35 - - Fields: title (required), body (markdown), mentions 36 - - Creates `sh.tangled.repo.issue` record on user's PDS 37 - - Optimistic: add to local issue list, remove on failure 38 - 39 - ### Comment on Issue / PR 40 - 41 - - Screen: comment input at bottom of issue/PR detail 42 - - Creates `sh.tangled.repo.issue.comment` or `sh.tangled.repo.pull.comment` record 43 - - Supports `replyTo` for threaded issue comments 44 - - Supports `mentions` (DID array) and `references` (AT-URI array) 45 - 46 - ### Edit Profile 47 - 48 - - Screen: profile edit form 49 - - Updates `sh.tangled.actor.profile` record (key: `self`) 50 - - Fields: avatar (image upload, max 1MB, png/jpeg), bio (max 256 graphemes), links (max 5 URIs), location (max 40 graphemes), pronouns (max 40 chars), pinned repos (max 6 AT-URIs), display stats (max 2 from: merged-pr-count, closed-pr-count, open-pr-count, open-issue-count, closed-issue-count, repo-count, star-count), bluesky cross-posting toggle 51 - 52 - ### Issue State Management 53 - 54 - - Close/reopen issues by creating `sh.tangled.repo.issue.state` records 55 - - State values: `sh.tangled.repo.issue.state.open`, `sh.tangled.repo.issue.state.closed` 56 - 57 - ## Push Notifications 58 - 59 - - Register device token with project services 60 - - Project services subscribe to Jetstream or indexed events relevant to the user 61 - - Deliver via APNs (iOS) / FCM (Android) 62 - - Notification types: PR activity on your repos, issue comments, new followers, stars 63 - 64 - ## Expanded OAuth Scopes 65 - 66 - Phase 6 requires additional scopes beyond Phase 4: 67 - 68 - ```sh 69 - repo:sh.tangled.repo.issue 70 - repo:sh.tangled.repo.issue.comment 71 - repo:sh.tangled.repo.issue.state 72 - repo:sh.tangled.repo.pull.comment 73 - ``` 74 - 75 - Handle scope upgrades gracefully — re-authorize if the user's existing session lacks required scopes.
-85
docs/app/specs/phase-7.md
··· 1 - # Phase 7 — Real-Time Feed & Advanced Features 2 - 3 - ## Goal 4 - 5 - Add real-time event streaming, custom feed logic, and advanced social coding features. This phase makes the app feel alive. 6 - 7 - ## Jetstream Integration 8 - 9 - ### Package 10 - 11 - `@atcute/jetstream` — subscribe to the AT Protocol event stream. 12 - 13 - ### Architecture 14 - 15 - Connect to a Jetstream relay and filter for `sh.tangled.*` collections: 16 - 17 - ```sh 18 - Jetstream WebSocket 19 - → filter: sh.tangled.* events 20 - → normalize into ActivityItem 21 - → merge into TanStack Query feed cache 22 - → reactive UI update 23 - ``` 24 - 25 - ### Connection Management 26 - 27 - - Connect on app foreground, disconnect on background 28 - - Reconnect with exponential backoff 29 - - Track cursor position for gap-fill on reconnect 30 - - Battery-aware: reduce polling frequency on low battery (Capacitor Battery API) 31 - 32 - ### Live Indicators 33 - 34 - - Repo detail: show "new commits" banner when ref updates arrive 35 - - Activity feed: show "X new items" pill, tap to scroll to top and reveal 36 - - PR detail: live status updates (open → merged) 37 - 38 - ## Custom Feeds 39 - 40 - Allow users to create saved feed configurations: 41 - 42 - - "My repos" — activity on repos I own 43 - - "Watching" — activity on repos I starred 44 - - "Team" — activity from users I follow 45 - - Custom filters: by repo, by user, by event type 46 - 47 - Feeds are stored locally in IndexedDB. If project services exist, they can optionally sync server-side for push notification filtering. 48 - 49 - ## Advanced Features 50 - 51 - ### Repo Forking 52 - 53 - - Fork button on repo detail (requires `rpc:sh.tangled.repo.create` scope) 54 - - Fork status indicator via `sh.tangled.repo.forkStatus` (up-to-date, fast-forwardable, conflict, missing branch) 55 - - Sync fork via `sh.tangled.repo.forkSync` 56 - 57 - ### Label Support 58 - 59 - - Display labels on issues and PRs 60 - - Apply/remove labels (requires label scopes) 61 - - Color-coded label chips 62 - 63 - ### Reaction Picker 64 - 65 - - Expand reaction support beyond star/follow 66 - - Emoji picker for: thumbsup, thumbsdown, laugh, tada, confused, heart, rocket, eyes 67 - - Show reaction counts on PRs, issues, comments 68 - 69 - ### PR Interdiff 70 - 71 - - View diff between PR rounds (round N vs round N+1) 72 - - Useful for code review on mobile — see what changed since last review 73 - 74 - ### Knot Information 75 - 76 - - Show which knot hosts a repo 77 - - Knot version and status 78 - - Useful for debugging and transparency 79 - 80 - ## Testing 81 - 82 - - WebSocket reliability under network transitions (WiFi → cellular) 83 - - Feed deduplication when Jetstream replays events 84 - - Memory usage with long-running WebSocket connections 85 - - Battery impact measurement on mobile
-64
docs/app/tasks/phase-1.md
··· 1 - # Phase 1 Tasks — Project Shell & Design System 2 - 3 - ## Scaffold 4 - 5 - - [x] Create Ionic Vue project with TypeScript (`ionic start twisted tabs --type vue`) 6 - - [x] Configure Capacitor for iOS and Android 7 - - [x] Set up path aliases (`@/` → `src/`) 8 - - [x] Install and configure Pinia 9 - - [x] Install and configure TanStack Query for Vue 10 - - [x] Create the directory structure per spec (`app/`, `core/`, `services/`, `domain/`, `features/`, `components/`) 11 - 12 - ## Routing & Navigation 13 - 14 - - [x] Define five-tab layout: Home, Explore, Activity, Profile (visible tabs) + Repo (pushed route) 15 - - [x] Configure Vue Router with Ionic tab routing 16 - - [x] Add route definitions for all Phase 1 placeholder pages 17 - - [ ] Verify tab-to-tab navigation preserves scroll position and component state 18 - 19 - ## Domain Models 20 - 21 - - [x] Create `domain/models/user.ts` — `UserSummary` type 22 - - [x] Create `domain/models/repo.ts` — `RepoSummary`, `RepoDetail`, `RepoFile` types 23 - - [x] Create `domain/models/pull-request.ts` — `PullRequestSummary` type 24 - - [x] Create `domain/models/issue.ts` — `IssueSummary` type 25 - - [x] Create `domain/models/activity.ts` — `ActivityItem` type 26 - 27 - ## Mock Data 28 - 29 - - [x] Use realistic data: fetch `desertthunder.dev` to create mock data for repo names, timestamps within last 30 days 30 - - [x] Create `src/mocks/users.ts` — factory for `UserSummary` instances 31 - - [x] Create `src/mocks/repos.ts` — factory for `RepoSummary` and `RepoDetail` instances 32 - - [x] Create `src/mocks/pull-requests.ts` — factory for `PullRequestSummary` instances 33 - - [x] Create `src/mocks/issues.ts` — factory for `IssueSummary` instances 34 - - [x] Create `src/mocks/activity.ts` — factory for `ActivityItem` instances 35 - 36 - ## Design System Components 37 - 38 - - [x] `components/common/RepoCard.vue` — compact repo summary (name, owner, description, language, stars) 39 - - [x] `components/common/UserCard.vue` — avatar + handle + bio snippet 40 - - [x] `components/common/ActivityCard.vue` — icon + actor + verb + target + relative timestamp 41 - - [x] `components/common/EmptyState.vue` — icon + message + optional action button 42 - - [x] `components/common/ErrorBoundary.vue` — catch errors, show retry UI 43 - - [x] `components/common/SkeletonLoader.vue` — shimmer placeholders (variants: card, list-item, profile) 44 - - [x] `components/repo/FileTreeItem.vue` — file/dir icon + name 45 - - [x] `components/repo/MarkdownRenderer.vue` — render markdown to HTML (stub with basic styling) 46 - 47 - ## Feature Pages (placeholder with mock data) 48 - 49 - - [x] `features/home/HomePage.vue` — trending repos list, recent activity list 50 - - [x] `features/explore/ExplorePage.vue` — search bar (non-functional), repo/user tabs, repo list 51 - - [x] `features/repo/RepoDetailPage.vue` — segmented layout: Overview, Files, Issues, PRs 52 - - [x] `features/repo/RepoOverview.vue` — header, description, README placeholder, stats 53 - - [x] `features/repo/RepoFiles.vue` — file tree list from mock data 54 - - [x] `features/repo/RepoIssues.vue` — issue list from mock data 55 - - [x] `features/repo/RepoPRs.vue` — PR list from mock data 56 - - [x] `features/activity/ActivityPage.vue` — filter chips + activity card list 57 - - [x] `features/profile/ProfilePage.vue` — sign-in prompt (unauthenticated state) 58 - 59 - ## Quality 60 - 61 - - [ ] Verify all pages render with skeleton loaders before mock data appears 62 - - [ ] Verify tab switches don't cause layout shift 63 - - [ ] Run Lighthouse on the web build — target first-paint under 2s 64 - - [ ] Verify iOS and Android builds compile and launch via Capacitor
-67
docs/app/tasks/phase-2.md
··· 1 - # Phase 2 Tasks — Public Tangled Browsing 2 - 3 - ## Protocol Setup 4 - 5 - - [x] Install `@atcute/client` and `@atcute/tangled` 6 - - [x] Create `services/atproto/client.ts` — singleton XRPC client with configurable base URL 7 - - [x] Add error interceptor that normalizes XRPC errors into typed app errors 8 - - [x] Create `core/errors/tangled.ts` — error types: NotFound, NetworkError, MalformedResponse, RateLimited 9 - 10 - ## API Validation 11 - 12 - - [x] Probe `tangled.org` for JSON API endpoints — returns HTML only (no JSON API); all metadata via PDS 13 - - [x] Confirm knot XRPC endpoints work from browser (CORS check against knot) — `Access-Control-Allow-Origin: *` confirmed on `knot1.tangled.sh`; knot hostname comes from `sh.tangled.repo` PDS record, not a fixed host 14 - - [x] Document which data comes from knots vs appview vs PDS (see endpoints.ts header comment) 15 - - [x] Test `com.atproto.repo.getRecord` for fetching user profiles and repo records from PDS — confirmed working on `bsky.social` 16 - 17 - ## Service Layer 18 - 19 - - [x] Create `services/tangled/endpoints.ts` — typed wrappers for each XRPC query 20 - - [x] Create `services/tangled/normalizers.ts` — transform raw responses → domain models 21 - - [x] Create `services/tangled/queries.ts` — TanStack Query hooks with cache keys and stale times 22 - - [x] Implement knot routing: determine correct knot hostname for a given repo 23 - 24 - ## Repository Browsing 25 - 26 - - [x] Wire `RepoDetailPage` to live repo data (metadata from PDS record + git data from knot) 27 - - [x] Implement repo overview: description, topics, default branch, language breakdown 28 - - [x] Implement README fetch: `sh.tangled.repo.blob` for `README.md` on default branch 29 - - [x] Wire `MarkdownRenderer` to render real README content 30 - - [x] Implement file tree: `sh.tangled.repo.tree` → navigate directories 31 - - [x] Implement file viewer: `sh.tangled.repo.blob` → syntax-highlighted display 32 - - [x] Implement commit log: `sh.tangled.repo.log` with cursor pagination 33 - - [x] Implement branch list: `sh.tangled.repo.branches` 34 - 35 - ## Profile Browsing 36 - 37 - - [x] Fetch user profile from PDS: `com.atproto.repo.getRecord` for `sh.tangled.actor.profile` 38 - - [x] Display profile: avatar (via `avatar.tangled.sh`), bio, links, location, pronouns, pinned repos 39 - - [x] List user's repos: fetch `sh.tangled.repo` records from user's PDS 40 - - [x] Wire `UserCard` component to real data 41 - 42 - ## Issues (read-only) 43 - 44 - - [x] Fetch issues for a repo from PDS records (`listIssueRecords` + `listIssueStateRecords` from owner's PDS) 45 - - [x] Display issue list with state filter (open/closed) 46 - - [x] Issue detail view: title, body, author, state 47 - - [x] Issue comments: fetch `sh.tangled.repo.issue.comment` records, render threaded 48 - 49 - ## Pull Requests (read-only) 50 - 51 - - [x] Fetch PRs for a repo from PDS records (`listPullRecords` + `listPullStatusRecords` from owner's PDS) 52 - - [x] Display PR list with status filter (open/closed/merged) 53 - - [x] PR detail view: title, body, author, source/target branches 54 - - [x] PR comments: fetch `sh.tangled.repo.pull.comment` records 55 - 56 - ## Caching 57 - 58 - - [x] Configure TanStack Query stale/gc times per data type (see spec) 59 - - [x] Set up IndexedDB query persister for offline reads 60 - 61 - ## Quality 62 - 63 - - [x] Replace default mock-backed Home/Explore/Activity surfaces with scoped-down curated live discovery/activity 64 - - [ ] Verify stale-while-revalidate behavior: cached data shows immediately, refreshes in background 65 - - [ ] Test with real Tangled repos (e.g., `tangled.org/core`) 66 - - [ ] Verify error states render correctly: 404, network failure, empty repos 67 - - [ ] Test on slow network (throttled devtools) — verify skeleton → content transition
-60
docs/app/tasks/phase-3.md
··· 1 - # Phase 3 Tasks — Search & Activity Feed 2 - 3 - ## Search API Discovery 4 - 5 - - [ ] Probe `tangled.org` for search endpoints (try `/search?q=`, `/api/search`, check network tab on live site) 6 - - [ ] If JSON search exists, document the request/response format 7 - - [ ] If no JSON search, decide: curated discovery now + backend search later, or HTML scraping 8 - 9 - ## Search Implementation 10 - 11 - - [ ] Create `services/tangled/search.ts` — search service (real endpoint or fallback) 12 - - [ ] Implement debounced search input (300ms) in Explore tab 13 - - [ ] Implement segmented search results: Repos tab, Users tab 14 - - [ ] Implement search result rendering with `RepoCard` and `UserCard` 15 - - [ ] Implement empty search state with suggestions 16 - - [ ] Persist recent searches in local storage (max 20) 17 - - [ ] Clear search history action 18 - 19 - ## Discovery Sections (if search API unavailable) 20 - 21 - - [ ] Implement "Trending repos" section (source TBD — may require appview scraping or curated list) 22 - - [ ] Implement "Recently created repos" section 23 - - [ ] Implement "Suggested users" section 24 - - [ ] Wire discovery sections into Home and Explore tabs 25 - 26 - ## Activity Feed — Data Source 27 - 28 - - [ ] Investigate `tangled.org/timeline` for JSON variant (check with Accept headers) 29 - - [ ] If no JSON timeline, evaluate `@atcute/jetstream` for real-time feed 30 - - [ ] If neither works, implement polling-based feed from known users' PDS records 31 - - [ ] Document chosen approach and any limitations 32 - 33 - ## Activity Feed — Implementation 34 - 35 - - [ ] Create `services/tangled/feed.ts` — feed data source 36 - - [ ] Create normalizer: raw AT Protocol events → `ActivityItem` domain model 37 - - [ ] Implement `ActivityPage` with real feed data 38 - - [ ] Implement filter chips: All, Repos, PRs, Issues, Social 39 - - [ ] Implement infinite scroll with cursor-based pagination 40 - - [ ] Implement pull-to-refresh 41 - - [ ] Implement tap-to-navigate: activity card → repo/profile/PR/issue detail 42 - 43 - ## Feed Caching 44 - 45 - - [ ] Cache last 100 feed items in IndexedDB via query persister 46 - - [ ] Show cached feed immediately on tab switch 47 - - [ ] Stale time: 1 minute 48 - - [ ] Verify feed persists across app restarts 49 - 50 - ## Home Tab 51 - 52 - - [ ] Wire Home tab to real data: trending repos + recent activity 53 - - [ ] Add "personalized" section placeholder (shows sign-in prompt when unauthenticated) 54 - 55 - ## Quality 56 - 57 - - [ ] Test search with various queries — verify results are relevant 58 - - [ ] Test activity feed with pull-to-refresh and pagination 59 - - [ ] Test offline: cached feed shows, search degrades gracefully 60 - - [ ] Verify no duplicate items in feed after refresh
-84
docs/app/tasks/phase-4.md
··· 1 - # Phase 4 Tasks — OAuth & Social Features 2 - 3 - ## OAuth Setup 4 - 5 - - [ ] Install `@atcute/oauth-browser-client` 6 - - [ ] Host OAuth client metadata JSON at a public URL & configure for local dev 7 - - [ ] Create `core/auth/oauth.ts` — call `configureOAuth()` with client metadata URL and redirect URI 8 - - [ ] Create `core/auth/session.ts` — session management: get, list, delete stored sessions 9 - - [ ] Create `core/auth/store.ts` — Pinia auth store with state machine (idle → authenticating → authenticated → error) 10 - 11 - ## Login Flow 12 - 13 - - [ ] Create `features/profile/LoginPage.vue` — handle input field + "Sign in" button 14 - - [ ] Implement handle → DID resolution 15 - - [ ] Implement OAuth redirect initiation 16 - - [ ] Create `/oauth/callback` route to handle redirect back 17 - - [ ] Implement token exchange on callback 18 - - [ ] Store session and update auth store 19 - - [ ] Redirect to Profile tab after successful login 20 - - [ ] Handle auth errors: invalid handle, OAuth denied, network failure 21 - 22 - ## Session Management 23 - 24 - - [ ] Implement session restoration on app launch (call `getSession()` for stored DID) 25 - - [ ] Implement automatic token refresh (handled by `@atcute/oauth-browser-client` internally) 26 - - [ ] Implement logout: clear session, reset auth store, redirect to Home 27 - - [ ] Implement account switcher: `listStoredSessions()`, switch between accounts 28 - 29 - ## Capacitor Deep Links 30 - 31 - - [ ] Configure custom URL scheme for OAuth callback on iOS/Android 32 - - [ ] Add `App.addListener('appUrlOpen')` handler to capture callback 33 - - [ ] Test OAuth flow on iOS simulator and Android emulator 34 - 35 - ## Auth-Aware XRPC Client 36 - 37 - - [ ] Create authenticated XRPC client that uses `session.dpopFetch` for requests 38 - - [ ] Service layer: use authenticated client for mutations, public client for queries 39 - - [ ] Handle 401/expired session: trigger re-auth flow 40 - 41 - ## Social Actions — Star 42 - 43 - - [ ] Create `services/tangled/mutations.ts` — mutation functions 44 - - [ ] Implement `starRepo(repoAtUri)` — creates `sh.tangled.feed.star` record on user's PDS 45 - - [ ] Implement `unstarRepo(rkey)` — deletes star record 46 - - [ ] Add star/unstar button to `RepoDetailPage` overview 47 - - [ ] Optimistic update: toggle star state and count immediately, rollback on error 48 - - [ ] Track user's existing stars to show correct initial state 49 - 50 - ## Social Actions — Follow 51 - 52 - - [ ] Implement `followUser(targetDid)` — creates `sh.tangled.graph.follow` record 53 - - [ ] Implement `unfollowUser(rkey)` — deletes follow record 54 - - [ ] Add follow/unfollow button to profile pages and user cards 55 - - [ ] Optimistic update: toggle follow state immediately 56 - - [ ] Track user's existing follows to show correct initial state 57 - 58 - ## Social Actions — React 59 - 60 - - [ ] Implement `addReaction(subjectUri, reaction)` — creates `sh.tangled.feed.reaction` record 61 - - [ ] Implement `removeReaction(rkey)` — deletes reaction record 62 - - [ ] Add reaction button/picker to PR and issue detail views 63 - - [ ] Show reaction counts grouped by type 64 - 65 - ## Profile Tab (Authenticated) 66 - 67 - - [ ] Wire Profile tab to show current user's profile data 68 - - [ ] Show pinned repos, stats, starred repos, following list 69 - - [ ] Add logout button 70 - - [ ] Add account switcher UI 71 - 72 - ## Personalized Feed 73 - 74 - - [ ] When signed in, filter activity feed to show activity from followed users and starred repos 75 - - [ ] Add "For You" / "Global" toggle on Activity tab 76 - - [ ] If appview provides a personalized endpoint, use it; otherwise filter client-side 77 - 78 - ## Quality 79 - 80 - - [ ] Test full OAuth flow: login → browse → star → follow → logout 81 - - [ ] Test session restoration after app restart 82 - - [ ] Test on web, iOS simulator, Android emulator 83 - - [ ] Test error cases: denied OAuth, expired session, failed mutation 84 - - [ ] Verify optimistic updates roll back correctly on mutation failure
-64
docs/app/tasks/phase-5.md
··· 1 - # Phase 5 Tasks — Offline & Performance Polish 2 - 3 - ## Query Persistence 4 - 5 - - [ ] Set up `persistQueryClient` with IndexedDB adapter 6 - - [ ] Configure persistence: debounced writes, max cache size, TTL per data type 7 - - [ ] Hydrate query cache from IndexedDB before first render 8 - - [ ] Verify: kill app → relaunch → cached data appears immediately without network 9 - 10 - ## Offline Detection 11 - 12 - - [ ] Create `core/network/status.ts` — reactive online/offline state (composable) 13 - - [ ] Show persistent offline banner when `navigator.onLine` is false 14 - - [ ] Disable mutation buttons (star, follow, react) when offline 15 - - [ ] Show toast when network returns: "Back online — refreshing" 16 - 17 - ## Secure Storage 18 - 19 - - [ ] Abstract auth token storage behind `core/storage/secure.ts` 20 - - [ ] Web: encrypted localStorage wrapper 21 - - [ ] Native: Capacitor Secure Storage plugin 22 - - [ ] Verify tokens are never stored in IndexedDB query cache 23 - - [ ] Clear secure storage on logout 24 - 25 - ## Cache Eviction 26 - 27 - - [ ] Implement max-item limits per data type (repos: 200, files: 100, profiles: 100) 28 - - [ ] Implement TTL eviction (remove entries older than their configured TTL) 29 - - [ ] Run eviction on app launch and periodically (every 30 min) 30 - - [ ] Measure IndexedDB size and log warnings if approaching limits 31 - 32 - ## Navigation Performance 33 - 34 - - [ ] Verify Ionic `keep-alive` preserves tab state and scroll position 35 - - [ ] Implement data prefetch on repo card visibility (Intersection Observer) 36 - - [ ] Test tab switch speed — should feel instant with cached data 37 - - [ ] Profile and fix any layout shifts during navigation 38 - 39 - ## List Virtualization 40 - 41 - - [ ] Replace flat lists with virtualized scroll for: repo lists, activity feed, file trees 42 - - [ ] Test with 1000+ item lists — verify smooth scrolling 43 - - [ ] Verify scroll position restoration when navigating back 44 - 45 - ## Image Optimization 46 - 47 - - [ ] Lazy-load all avatars 48 - - [ ] Add initials fallback for missing avatars 49 - - [ ] Use appropriate image sizes from `avatar.tangled.sh` 50 - 51 - ## Bundle Optimization 52 - 53 - - [ ] Add route-level code splitting (lazy imports per feature) 54 - - [ ] Tree-shake unused Ionic components (configure Ionic's component imports) 55 - - [ ] Measure bundle size — target under 500KB initial JS 56 - - [ ] Run Lighthouse audit — target 90+ performance score on mobile 57 - 58 - ## Quality 59 - 60 - - [ ] Test offline → online transition: data refreshes without duplicates 61 - - [ ] Test low-bandwidth (3G throttle): skeleton → content transitions are smooth 62 - - [ ] Test memory usage over extended use: navigate many repos, check heap doesn't grow unbounded 63 - - [ ] Test on real iOS and Android devices (not just simulators) 64 - - [ ] Measure and document cold start time, tab switch time, scroll performance
-85
docs/app/tasks/phase-6.md
··· 1 - # Phase 6 Tasks — Write Features & Project Services 2 - 3 - ## Project Services Setup 4 - 5 - - [ ] Decide which write and notification operations belong in `packages/api` versus a separate service 6 - - [ ] Implement health and readiness endpoints for all public client-facing services 7 - - [ ] Configure CORS for the mobile app's origins 8 - - [ ] Document the mobile-facing service contract in `docs/api` 9 - 10 - ## Project Services — Auth Proxy 11 - 12 - - [ ] Implement OAuth token exchange endpoint (if moving auth server-side) 13 - - [ ] Implement session endpoint that returns user info 14 - - [ ] Decide: keep client-side OAuth or migrate to service-mediated auth 15 - 16 - ## Project Services — Search and Graph 17 - 18 - - [ ] Implement `GET /search` endpoint for repo/profile discovery 19 - - [ ] Return enough repo/profile metadata for the mobile client to render result cards directly 20 - - [ ] Implement `GET /profiles/:did/summary` for follower/following counts and other graph-derived gaps 21 - - [ ] Wire mobile client's search and profile summary services to these endpoints 22 - 23 - ## Project Services — Personalized Feed 24 - 25 - - [ ] Implement `GET /feed/personalized` — aggregate activity for the user's follows and stars 26 - - [ ] Index relevant events from Jetstream 27 - - [ ] Wire mobile client's feed to the project service endpoint when authenticated 28 - 29 - ## Create Issue 30 - 31 - - [ ] Create `features/repo/CreateIssuePage.vue` — title + body (markdown) form 32 - - [ ] Implement `createIssue()` mutation in `services/tangled/mutations.ts` 33 - - [ ] Create `sh.tangled.repo.issue` record on user's PDS 34 - - [ ] Optimistic update: add issue to local list 35 - - [ ] Navigate to new issue detail on success 36 - 37 - ## Comment on Issue 38 - 39 - - [ ] Add comment input to issue detail view 40 - - [ ] Implement `createIssueComment()` mutation 41 - - [ ] Create `sh.tangled.repo.issue.comment` record with `replyTo` support for threading 42 - - [ ] Optimistic update: append comment to list 43 - 44 - ## Comment on PR 45 - 46 - - [ ] Add comment input to PR detail view 47 - - [ ] Implement `createPRComment()` mutation 48 - - [ ] Create `sh.tangled.repo.pull.comment` record 49 - 50 - ## Issue State Management 51 - 52 - - [ ] Add close/reopen button to issue detail (author and repo owner only) 53 - - [ ] Implement `closeIssue()` / `reopenIssue()` — create `sh.tangled.repo.issue.state` record 54 - - [ ] Optimistic update: toggle state badge 55 - 56 - ## Edit Profile 57 - 58 - - [ ] Create `features/profile/EditProfilePage.vue` 59 - - [ ] Implement avatar upload (max 1MB, png/jpeg) via blob upload + record update 60 - - [ ] Implement bio edit (max 256 graphemes) 61 - - [ ] Implement links edit (max 5 URIs) 62 - - [ ] Implement location, pronouns, pinned repos, stats selection, bluesky toggle 63 - - [ ] Update `sh.tangled.actor.profile` record (key: `self`) 64 - 65 - ## Scope Upgrade 66 - 67 - - [ ] Detect when user's session lacks scopes needed for write operations 68 - - [ ] Prompt user to re-authorize with expanded scopes 69 - - [ ] Handle scope upgrade flow gracefully (no data loss) 70 - 71 - ## Push Notifications (if services exist) 72 - 73 - - [ ] Implement `POST /notifications/register` — register device token 74 - - [ ] Configure Capacitor Push Notifications plugin 75 - - [ ] Register device token on login 76 - - [ ] Services: subscribe to events relevant to user, deliver via APNs/FCM 77 - - [ ] Handle notification tap → deep link to relevant content 78 - 79 - ## Quality 80 - 81 - - [ ] Test issue creation end-to-end: create → verify on tangled.org 82 - - [ ] Test commenting on issues and PRs 83 - - [ ] Test profile editing: avatar upload, bio change 84 - - [ ] Test scope upgrade flow 85 - - [ ] Verify mutations work offline-queued (if implemented) or show appropriate offline errors
-68
docs/app/tasks/phase-7.md
··· 1 - # Phase 7 Tasks — Real-Time Feed & Advanced Features 2 - 3 - ## Jetstream Integration 4 - 5 - - [ ] Install `@atcute/jetstream` 6 - - [ ] Create `services/atproto/jetstream.ts` — WebSocket connection manager 7 - - [ ] Filter events for `sh.tangled.*` collections 8 - - [ ] Normalize events into `ActivityItem` domain model 9 - - [ ] Merge live events into TanStack Query feed cache 10 - - [ ] Implement connection lifecycle: connect on foreground, disconnect on background 11 - - [ ] Implement reconnection with exponential backoff and cursor tracking 12 - - [ ] Add battery-aware throttling (Capacitor Battery API) 13 - 14 - ## Live UI Indicators 15 - 16 - - [ ] Activity feed: "X new items" pill at top, tap to reveal 17 - - [ ] Repo detail: "New commits available" banner on ref update events 18 - - [ ] PR detail: live status badge updates (open → merged) 19 - - [ ] Issue detail: live comment count updates 20 - 21 - ## Custom Feeds 22 - 23 - - [ ] Create `domain/feed/custom-feed.ts` — feed configuration model 24 - - [ ] Implement feed builder UI: name + filter rules (by repo, user, event type) 25 - - [ ] Store custom feeds in IndexedDB 26 - - [ ] Render custom feed as a selectable tab/option on Activity page 27 - - [ ] "My repos" preset, "Watching" preset, "Team" preset 28 - 29 - ## Repo Forking 30 - 31 - - [ ] Add fork button to repo detail (authenticated only) 32 - - [ ] Implement fork creation via `sh.tangled.repo.create` with `source` field 33 - - [ ] Show fork status badge: up-to-date, fast-forwardable, conflict, missing branch 34 - - [ ] Implement "Sync fork" action via `sh.tangled.repo.forkSync` 35 - 36 - ## Labels 37 - 38 - - [ ] Fetch label definitions for a repo 39 - - [ ] Display color-coded label chips on issues and PRs 40 - - [ ] Implement label filtering on issue/PR lists 41 - - [ ] Add/remove labels on issues and PRs (authenticated, with label scopes) 42 - 43 - ## Expanded Reactions 44 - 45 - - [ ] Add reaction picker component: thumbsup, thumbsdown, laugh, tada, confused, heart, rocket, eyes 46 - - [ ] Show grouped reaction counts on PRs, issues, and comments 47 - - [ ] Add/remove reactions with optimistic updates 48 - 49 - ## PR Interdiff 50 - 51 - - [ ] Detect PR round count 52 - - [ ] Add round selector to PR detail 53 - - [ ] Fetch and display diff between selected rounds 54 - - [ ] Use `sh.tangled.repo.compare` for cross-round comparison 55 - 56 - ## Knot Info 57 - 58 - - [ ] Show knot hostname on repo detail 59 - - [ ] Fetch knot version via `sh.tangled.knot.version` 60 - - [ ] Display knot status/health indicator 61 - 62 - ## Quality 63 - 64 - - [ ] Test Jetstream under network transitions (WiFi → cellular → offline → online) 65 - - [ ] Verify no duplicate events after reconnection with cursor 66 - - [ ] Measure battery impact of WebSocket connection on iOS and Android 67 - - [ ] Test memory usage with long-running Jetstream connection 68 - - [ ] Load test: simulate high-frequency events, verify UI stays responsive
-77
docs/qa.md
··· 1 - --- 2 - title: "QA Checklist" 3 - updated: 2026-03-23 4 - --- 5 - 6 - # QA Checklist 7 - 8 - ## Ingestion (end-to-end) 9 - 10 - Walk a record through the full pipeline: Tap event → indexer → store → searchable. 11 - 12 - - [ ] Indexer connects to Tap via WebSocket and begins processing events 13 - - [ ] Creating a tracked record on Tangled produces a row in `documents` 14 - - [ ] Updating that record changes the existing row (new CID) 15 - - [ ] Deleting that record tombstones the row (`deleted_at` set) 16 - - [ ] Tombstoned documents do not appear in search results 17 - - [ ] Identity events update the handle cache; new documents show resolved handles 18 - - [ ] Unsupported collections are silently skipped (no errors logged) 19 - - [ ] Connection drop triggers automatic reconnect and resumes from last cursor 20 - 21 - ## Cursor durability 22 - 23 - - [ ] Kill the indexer mid-stream, restart — processing resumes without duplicating documents 24 - - [ ] Redeploy the indexer — cursor is persisted before shutdown, no gap or replay 25 - 26 - ## Backfill 27 - 28 - Run `twister backfill` against a small seed file and verify the discovery graph. 29 - 30 - - [ ] Seed file with known Tangled users produces a non-empty discovery graph 31 - - [ ] `--max-hops 1` limits discovery to direct follows/collaborators only 32 - - [ ] `--dry-run` logs the plan but does not call Tap mutation endpoints 33 - - [ ] Already-tracked DIDs are reported and not re-submitted 34 - - [ ] Re-running the same seeds is idempotent 35 - - [ ] After backfill + Tap sync, search returns historical content that wasn't there before 36 - 37 - ## Search API 38 - 39 - - [ ] `GET /search?q=<repo-name>` returns the expected repo as top result 40 - - [ ] Searching by title keyword returns expected documents 41 - - [ ] Searching by author handle returns their content 42 - - [ ] `collection`, `type`, `author`, `repo` filters restrict results correctly 43 - - [ ] Pagination: `offset=0&limit=5` then `offset=5&limit=5` return disjoint result sets 44 - - [ ] Missing `q` param returns 400 with error JSON 45 - - [ ] Unknown query param returns 400 46 - - [ ] `GET /documents/{id}` returns the full document; 404 for missing or tombstoned 47 - - [ ] `GET /healthz` returns 200 48 - - [ ] `GET /readyz` returns 503 when DB is unreachable 49 - 50 - ## Deployment (Railway) 51 - 52 - - [ ] API service healthy and routable at public URL 53 - - [ ] Indexer service healthy on `:9090/health` 54 - - [ ] A new Tangled record ingested post-deploy becomes searchable within seconds 55 - - [ ] Redeploying the API preserves availability (health-check-gated rollout) 56 - - [ ] Restarting the indexer does not lose sync position 57 - - [ ] Environment variables match the documented set in `docs/api/deploy.md` 58 - 59 - ## Mobile — Navigation & Shell 60 - 61 - - [ ] All five tabs render and switch without layout shift 62 - - [ ] Tab-to-tab navigation preserves scroll position and component state 63 - - [ ] Pages show skeleton loaders before data appears 64 - - [ ] iOS and Android builds compile and launch via Capacitor 65 - 66 - ## Mobile — Live Tangled Browsing 67 - 68 - - [ ] Repo detail page loads metadata from PDS + git data from knot 69 - - [ ] README renders via markdown renderer 70 - - [ ] File tree navigates directories; file viewer shows syntax-highlighted content 71 - - [ ] Commit log paginates with cursor 72 - - [ ] Profile page shows avatar, bio, and repos from PDS 73 - - [ ] Issue list filters by state (open/closed); detail shows body + threaded comments 74 - - [ ] PR list filters by status; detail shows source/target branches + comments 75 - - [ ] Stale-while-revalidate: cached data shows immediately, refreshes in background 76 - - [ ] Error states render correctly: 404, network failure, empty repo 77 - - [ ] Slow network: skeleton → content transition is smooth (test with throttled devtools)
+161
docs/reference/api.md
··· 1 + --- 2 + title: API Service Reference 3 + updated: 2026-03-24 4 + --- 5 + 6 + Twister is a Go service that indexes Tangled content and serves a search API. It connects to the AT Protocol ecosystem via Tap (firehose consumer), XRPC (direct record lookups), and Constellation (backlink queries), storing indexed data in Turso/libSQL with FTS5 full-text search. 7 + 8 + ## Architecture 9 + 10 + The service is a single Go binary with multiple subcommands, each running a different runtime mode. All modes share the same database and configuration layer. 11 + 12 + **Runtime modes:** 13 + 14 + | Command | Purpose | 15 + | ------------- | ------------------------------------------------------------------- | 16 + | `api` (serve) | HTTP search API server | 17 + | `indexer` | Consumes Tap firehose events, normalizes and indexes records | 18 + | `backfill` | Discovers users from seed files, registers them with Tap | 19 + | `enrich` | Backfills missing metadata (repo names, handles, web URLs) via XRPC | 20 + | `reindex` | Re-syncs all documents into the FTS index | 21 + | `healthcheck` | One-shot liveness probe for container orchestration | 22 + 23 + The `embed-worker` and `reembed` commands exist as stubs for the upcoming semantic search pipeline (Nomic Embed Text v1.5 deployed via Railway template). 24 + 25 + All commands accept a `--local` flag that switches to a local SQLite file and text-format logging for development. 26 + 27 + ## HTTP API 28 + 29 + The API server binds to `:8080` by default (configurable via `HTTP_BIND_ADDR`). CORS is open (`*` origin, GET/OPTIONS). 30 + 31 + ### Search 32 + 33 + **`GET /search`** — Main search endpoint. Routes to keyword, semantic, or hybrid based on `mode` parameter. 34 + 35 + **`GET /search/keyword`** — Full-text search via FTS5 with BM25 scoring. 36 + 37 + Parameters: 38 + 39 + - `q` (required) — Query string 40 + - `limit` (1–100, default 20) — Results per page 41 + - `offset` (default 0) — Pagination offset 42 + - `collection` — Filter by AT Protocol collection NSID 43 + - `type` — Filter by record type (repo, issue, pull, profile, string) 44 + - `author` — Filter by handle or DID 45 + - `repo` — Filter by repo name or DID 46 + - `language` — Filter by primary language 47 + - `from`, `to` — Date range (ISO 8601) 48 + - `state` — Filter issues/PRs by state (open, closed, merged) 49 + - `mode` — Search mode (keyword, semantic, hybrid) 50 + 51 + Response includes query metadata, total count, and an array of results each containing: ID, collection, record type, title, summary, body snippet (with `<mark>` highlights), score, repo name, author handle, DID, AT-URI, web URL, and timestamps. 52 + 53 + **`GET /documents/{id}`** — Fetch a single document by stable ID. 54 + 55 + ### Health 56 + 57 + - **`GET /healthz`** — Liveness probe, always 200 58 + - **`GET /readyz`** — Readiness probe, pings database 59 + 60 + ### Admin 61 + 62 + When `ENABLE_ADMIN_ENDPOINTS=true` with a configured `ADMIN_AUTH_TOKEN`: 63 + 64 + - **`POST /admin/reindex`** — Trigger FTS re-sync 65 + 66 + ### Static Content 67 + 68 + The API also serves a search site with live search and API documentation at `/` and `/docs*`, built with Alpine.js (no build step, embedded in `internal/view/`). 69 + 70 + ## Database 71 + 72 + Turso (libSQL) with the following tables: 73 + 74 + **documents** — Core search index. Each record gets a stable ID of `did|collection|rkey`. Stores title, body, summary, metadata (repo name, author handle, web URL, language, tags), and timestamps. Soft-deleted via `deleted_at`. 75 + 76 + **documents_fts** — FTS5 virtual table for full-text search over title, body, summary, repo name, author handle, and tags. Uses `unicode61` tokenizer with tuned BM25 weights (title weighted highest at 2.5, then author handle at 2.0, summary at 1.5). 77 + 78 + **sync_state** — Cursor tracking for the Tap consumer. Stores consumer name, current cursor, high water mark, and last update time. Enables crash-safe resume. 79 + 80 + **identity_handles** — DID-to-handle cache. Updated from Tap identity events and XRPC lookups. 81 + 82 + **record_state** — Issue and PR state cache (open/closed/merged). Keyed by subject AT-URI. 83 + 84 + **document_embeddings** — Vector storage (768-dim F32_BLOB with DiskANN cosine index). Schema ready but not yet populated. 85 + 86 + **embedding_jobs** — Async embedding job queue. Schema ready but worker not yet active. 87 + 88 + ## Indexing Pipeline 89 + 90 + The indexer connects to Tap via WebSocket, consuming AT Protocol record events in real-time. For each event: 91 + 92 + 1. Filter against the configured collection allowlist (supports wildcards like `sh.tangled.*`) 93 + 2. Route to the appropriate normalizer based on collection 94 + 3. Normalize into a document (extract title, body, summary, metadata) 95 + 4. Optionally enrich via XRPC (resolve author handle, repo name, web URL) 96 + 5. Upsert into the database (auto-syncs FTS) 97 + 6. Advance cursor and acknowledge to Tap 98 + 99 + The indexer resumes from its last cursor on restart (no duplicate processing). It logs status every 30 seconds and uses exponential backoff (1s–5s) for transient failures. 100 + 101 + ## Record Normalizers 102 + 103 + Each AT Protocol collection has a dedicated normalizer that extracts searchable content: 104 + 105 + | Collection | Record Type | Searchable | Content | 106 + | ------------------------------- | ------------- | ------------------------ | --------------------------- | 107 + | `sh.tangled.repo` | repo | Yes (if named) | Name, description, topics | 108 + | `sh.tangled.repo.issue` | issue | Yes | Title, body, repo reference | 109 + | `sh.tangled.repo.pull` | pull | Yes | Title, body, target branch | 110 + | `sh.tangled.repo.issue.comment` | issue_comment | Yes (if has body) | Comment body | 111 + | `sh.tangled.repo.pull.comment` | pull_comment | Yes (if has body) | Comment body | 112 + | `sh.tangled.string` | string | Yes (if has content) | Filename, contents | 113 + | `sh.tangled.actor.profile` | profile | Yes (if has description) | Profile description | 114 + | `sh.tangled.graph.follow` | follow | No | Graph edge only | 115 + 116 + State records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) update the `record_state` table rather than creating documents. 117 + 118 + ## XRPC Client 119 + 120 + The built-in XRPC client provides typed access to AT Protocol endpoints with caching (1-hour TTL for DID docs and repo names): 121 + 122 + - DID resolution via PLC Directory (`did:plc:`) or `.well-known/did.json` (`did:web:`) 123 + - Identity resolution (PDS endpoint + handle from DID document) 124 + - Record fetching (`com.atproto.repo.getRecord`, `com.atproto.repo.listRecords`) 125 + - Repo name resolution from `sh.tangled.repo` records 126 + - Web URL construction for Tangled entities 127 + 128 + ## Backfill 129 + 130 + The backfill command discovers users from a seed file and registers them with Tap for indexing. Discovery fans out via follow graphs and repo collaborators up to a configurable hop depth (default 2). Supports dry-run mode, configurable concurrency and batch sizes, and is idempotent. 131 + 132 + ## Configuration 133 + 134 + All configuration is via environment variables (with `.env` file support): 135 + 136 + | Variable | Default | Purpose | 137 + | -------------------------- | ----------------------- | ---------------------------------------------- | 138 + | `TURSO_DATABASE_URL` | — | Database connection (required) | 139 + | `TURSO_AUTH_TOKEN` | — | Auth token (required for remote) | 140 + | `TAP_URL` | — | Tap WebSocket URL | 141 + | `TAP_AUTH_PASSWORD` | — | Tap admin password | 142 + | `INDEXED_COLLECTIONS` | all | Collection allowlist (CSV, supports wildcards) | 143 + | `HTTP_BIND_ADDR` | `:8080` | API server bind address | 144 + | `INDEXER_HEALTH_ADDR` | `:9090` | Indexer health probe address | 145 + | `LOG_LEVEL` | info | debug/info/warn/error | 146 + | `LOG_FORMAT` | json | json or text | 147 + | `ENABLE_ADMIN_ENDPOINTS` | false | Enable admin routes | 148 + | `ADMIN_AUTH_TOKEN` | — | Bearer token for admin | 149 + | `ENABLE_INGEST_ENRICHMENT` | true | XRPC enrichment at ingest time | 150 + | `PLC_DIRECTORY_URL` | `https://plc.directory` | PLC Directory | 151 + | `XRPC_TIMEOUT` | 15s | XRPC HTTP timeout | 152 + 153 + ## Deployment 154 + 155 + Deployed on Railway with three services: 156 + 157 + - **api** — HTTP server (port 8080, health at `/healthz`) 158 + - **indexer** — Tap consumer (health at `:9090/healthz`) 159 + - **tap** — Tap instance (external dependency) 160 + 161 + All services share the same Turso database. The API and indexer are separate deployments of the same binary with different subcommands.
+78
docs/reference/app.md
··· 1 + --- 2 + title: Mobile App Reference 3 + updated: 2026-03-24 4 + --- 5 + 6 + Twisted is an Ionic Vue mobile app for browsing Tangled, a git hosting platform built on the AT Protocol. It targets iOS and Android via Capacitor (no web target). 7 + 8 + ## Tech Stack 9 + 10 + - **Vue 3** with TypeScript and Composition API 11 + - **Ionic Vue** for native-feeling UI components 12 + - **Capacitor** for iOS/Android builds 13 + - **Pinia** for state management 14 + - **TanStack Query** for async data with caching 15 + - **@atcute/client** and **@atcute/tangled** for AT Protocol XRPC 16 + 17 + TypeScript files use `.js` extensions in imports. Package management via pnpm. 18 + 19 + ## Architecture 20 + 21 + Three-layer design: 22 + 23 + **Presentation** — Vue components and pages using Ionic's component library. Five-tab navigation: Home, Explore, Activity, Profile (visible tabs) plus Repo (pushed route). Repo detail uses segmented tabs: Overview, Files, Issues, PRs. 24 + 25 + **Domain** — TypeScript types modeling the app's data: UserSummary, RepoSummary, RepoDetail, RepoFile, PullRequestSummary, IssueSummary, ActivityItem. These are app-internal representations, decoupled from API response shapes. 26 + 27 + **Data** — Service layer that fetches from external sources and normalizes into domain types. The flow is: Vue component → composable → TanStack Query hook → service function → XRPC call → normalizer → domain model. 28 + 29 + ## Directory Structure 30 + 31 + ```sh 32 + src/ 33 + app/ — App shell, router, global config 34 + core/ — Shared utilities, constants 35 + services/ — API clients and data fetching 36 + atproto/ — @atcute client setup, error handling 37 + tangled/ — Endpoints, normalizers, TanStack Query hooks 38 + domain/ — TypeScript type definitions 39 + features/ — Feature modules (home, explore, repo, etc.) 40 + components/ — Shared UI components 41 + ``` 42 + 43 + ## Data Sources 44 + 45 + The app reads from multiple sources depending on what's needed: 46 + 47 + - **Knots** (Tangled XRPC servers) — Git data: file trees, blobs, commits, branches, diffs. Each repo is hosted on a specific knot. 48 + - **PDS** (Personal Data Servers) — AT Protocol records: profiles, issues, PRs, comments, stars, follows. Accessed via `com.atproto.repo.getRecord` and `com.atproto.repo.listRecords`. 49 + - **Twister API** — Search and index-backed summaries (when available). 50 + - **Constellation** — Social signal counts and backlinks (stars, followers, reactions). 51 + 52 + Knots serve XRPC endpoints for git operations. The appview at `tangled.org` returns HTML only (no JSON API), so the app goes directly to knots for git data and PDS for AT Protocol records. 53 + 54 + ## Completed Features 55 + 56 + ### Navigation & Shell (Phase 1) 57 + 58 + Five-tab layout with Vue Router, skeleton loaders, placeholder pages. Design system components: RepoCard, UserCard, ActivityCard, FileTreeItem, EmptyState, ErrorBoundary, SkeletonLoader, MarkdownRenderer. 59 + 60 + ### Public Browsing (Phase 2) 61 + 62 + All read-only browsing works without authentication: 63 + 64 + **Repository browsing** — Metadata display, README rendering (markdown), file tree navigation, file viewer with syntax context, commit log with pagination, branch listing. 65 + 66 + **Profile browsing** — Avatar, bio, links fetched from PDS. User's repos listed. 67 + 68 + **Issues** — List view with open/closed state filter, detail view with threaded comments. 69 + 70 + **Pull Requests** — List view with status filter (open/closed/merged), detail view with comments. 71 + 72 + **Caching** — TanStack Query configured with per-data-type stale times. Persistence via Dexie (IndexedDB) — works in Capacitor's WebView on device and in the browser during local dev. 73 + 74 + ## Routing 75 + 76 + The app resolves identities through AT Protocol: handle → DID (via PDS resolution) → records. For repo git data, the knot hostname is extracted from the repo's DID document. 77 + 78 + Home tab currently provides direct handle-based browsing: enter a known handle to view their profile and repos. This works without any index or search dependency.
+138
docs/reference/lexicons.md
··· 1 + --- 2 + title: Tangled Lexicons 3 + updated: 2026-03-24 4 + --- 5 + 6 + Tangled defines its AT Protocol record types under the `sh.tangled.*` namespace. These are the records stored on users' Personal Data Servers (PDS) and consumed by the indexing pipeline. 7 + 8 + ## Searchable Records 9 + 10 + ### sh.tangled.repo 11 + 12 + Repository metadata. Created when a user registers a repo with Tangled. 13 + 14 + - `name` (string, required) — Repository name 15 + - `description` (string) — Short description 16 + - `createdAt` (datetime) — Creation timestamp 17 + - `knot` (string) — Knot DID hosting the git data 18 + - `topics` (array of strings) — Tags/topics 19 + 20 + ### sh.tangled.repo.issue 21 + 22 + Issue on a repository. 23 + 24 + - `repo` (at-uri, required) — Reference to the parent repo record 25 + - `title` (string, required) — Issue title 26 + - `body` (string) — Issue body (markdown) 27 + - `createdAt` (datetime) 28 + 29 + ### sh.tangled.repo.pull 30 + 31 + Pull request on a repository. 32 + 33 + - `repo` (at-uri, required) — Reference to the parent repo record 34 + - `title` (string, required) — PR title 35 + - `body` (string) — PR body (markdown) 36 + - `head` (string) — Source branch 37 + - `base` (string) — Target branch 38 + - `createdAt` (datetime) 39 + 40 + ### sh.tangled.string 41 + 42 + Code snippet or gist. 43 + 44 + - `filename` (string) — File name with extension 45 + - `contents` (string, required) — Code content 46 + - `language` (string) — Programming language 47 + - `createdAt` (datetime) 48 + 49 + ### sh.tangled.actor.profile 50 + 51 + User profile information. 52 + 53 + - `displayName` (string) — Display name 54 + - `description` (string) — Bio/about text 55 + - `avatar` (blob) — Profile image 56 + - `pronouns` (string) — Pronouns 57 + - `location` (string) — Location 58 + - `links` (array of strings) — External links 59 + - `pinnedRepos` (array of at-uri) — Pinned repository references 60 + 61 + ## Interaction Records 62 + 63 + ### sh.tangled.feed.star 64 + 65 + Star/favorite on a repository. 66 + 67 + - `subject` (object, required) — `{ uri: at-uri, cid: cid }` referencing the starred repo 68 + 69 + ### sh.tangled.graph.follow 70 + 71 + Follow relationship between users. 72 + 73 + - `subject` (did, required) — DID of the followed user 74 + - `createdAt` (datetime) 75 + 76 + ### sh.tangled.feed.reaction 77 + 78 + Emoji reaction on content. 79 + 80 + - `subject` (object, required) — `{ uri: at-uri, cid: cid }` referencing the target 81 + - `emoji` (string, required) — Reaction emoji 82 + 83 + ## State Records 84 + 85 + ### sh.tangled.repo.issue.state 86 + 87 + Tracks whether an issue is open or closed. 88 + 89 + - `issue` (at-uri, required) — Reference to the issue 90 + - `state` (string, required) — `open` or `closed` 91 + 92 + ### sh.tangled.repo.pull.status 93 + 94 + Tracks pull request lifecycle. 95 + 96 + - `pull` (at-uri, required) — Reference to the PR 97 + - `status` (string, required) — `open`, `closed`, or `merged` 98 + 99 + ## Comment Records 100 + 101 + ### sh.tangled.repo.issue.comment 102 + 103 + Comment on an issue. 104 + 105 + - `issue` (at-uri, required) — Reference to the parent issue 106 + - `body` (string, required) — Comment body (markdown) 107 + - `parent` (at-uri) — Parent comment for threading 108 + - `createdAt` (datetime) 109 + 110 + ### sh.tangled.repo.pull.comment 111 + 112 + Comment on a pull request. 113 + 114 + - `pull` (at-uri, required) — Reference to the parent PR 115 + - `body` (string, required) — Comment body (markdown) 116 + - `parent` (at-uri) — Parent comment for threading 117 + - `createdAt` (datetime) 118 + 119 + ## Infrastructure Records 120 + 121 + ### sh.tangled.knot.member 122 + 123 + Knot membership record. 124 + 125 + - `knot` (did, required) — Knot DID 126 + - `permission` (string) — Permission level 127 + 128 + ### sh.tangled.knot.version 129 + 130 + Knot software version metadata. 131 + 132 + ## Stable ID Format 133 + 134 + Documents in the search index use the stable ID format: `did|collection|rkey` (e.g., `did:plc:abc123|sh.tangled.repo|repo-name`). This ensures idempotent upserts regardless of CID changes. 135 + 136 + ## AT-URI Format 137 + 138 + Records are addressed as `at://did/collection/rkey` (e.g., `at://did:plc:abc123/sh.tangled.repo/repo-name`).
+126
docs/roadmap.md
··· 1 + --- 2 + title: Roadmap 3 + updated: 2026-03-24 4 + --- 5 + 6 + ## API: Constellation Integration 7 + 8 + Add a Constellation client to the Go API for enriching search results with social signals. 9 + 10 + - [ ] Constellation XRPC client (`internal/constellation/`) with `getBacklinksCount` and `getBacklinks` 11 + - [ ] User-agent header with project name and contact 12 + - [ ] Enrich search results with star counts from Constellation 13 + - [ ] Profile summary endpoint (`GET /profiles/{did}/summary`) with follower/following counts from Constellation 14 + - [ ] Cache Constellation responses with short TTL (star/follower counts change infrequently) 15 + 16 + ## API: Semantic Search Pipeline 17 + 18 + Nomic Embed Text v1.5 via Railway template, async embedding pipeline. 19 + 20 + - [ ] Deploy nomic-embed Railway template (`POST /api/embeddings` with Bearer auth) 21 + - [ ] Embedding client in Go API (`internal/embedding/`) calling the Nomic service 22 + - [ ] Embed-worker: consume `embedding_jobs` queue, generate 768-dim vectors, store in `document_embeddings` 23 + - [ ] `GET /search/semantic` endpoint using DiskANN vector_top_k 24 + - [ ] Reembed command for bulk re-generation 25 + 26 + ## API: Hybrid Search 27 + 28 + Combine keyword and semantic results. 29 + 30 + - [ ] Score normalization (keyword BM25 → [0,1], semantic cosine → [0,1]) 31 + - [ ] Weighted merge (0.65 keyword + 0.35 semantic, configurable) 32 + - [ ] Deduplication by document ID 33 + - [ ] `matched_by` metadata in results 34 + 35 + ## API: Search Quality 36 + 37 + - [ ] Field weight tuning based on real queries 38 + - [ ] Recency boost for recently updated content 39 + - [ ] Star count ranking signal (via Constellation) 40 + - [ ] State filtering defaults (exclude closed issues) 41 + - [ ] Better snippets with longer context 42 + - [ ] Relevance test fixtures 43 + 44 + ## API: Observability 45 + 46 + - [ ] Structured metrics: ingestion rate, search latency, embedding throughput 47 + - [ ] Dashboard or log-based monitoring 48 + 49 + ## App: Search & Discovery 50 + 51 + Wire the Explore tab to the search API and add activity feed. 52 + 53 + **Depends on:** API: Constellation Integration 54 + 55 + - [ ] Search service pointing at Twister API 56 + - [ ] Constellation service for star/follower counts 57 + - [ ] Debounced search on Explore tab with segmented results 58 + - [ ] Recent search history (local) 59 + - [ ] Graceful fallback when search API unavailable 60 + - [ ] Activity feed data source investigation (Jetstream vs polling) 61 + - [ ] Activity tab with filters, infinite scroll, pull-to-refresh 62 + - [ ] Home tab: surface recently viewed repos/profiles 63 + 64 + ## App: Authentication & Social 65 + 66 + Bluesky OAuth and authenticated actions. 67 + 68 + **Depends on:** App: Search & Discovery (for Constellation service), API: Constellation Integration 69 + 70 + - [ ] OAuth setup with `@atcute/oauth-browser-client` 71 + - [ ] Login page, OAuth flow, callback handling 72 + - [ ] Capacitor deep link configuration 73 + - [ ] Session management (restore, refresh, logout, account switcher) 74 + - [ ] Auth-aware XRPC client using dpopFetch 75 + - [ ] Star repos (write to PDS, count from Constellation) 76 + - [ ] Follow users (write to PDS, count from Constellation) 77 + - [ ] React to content (write to PDS, count from Constellation) 78 + - [ ] Authenticated profile tab (pinned repos, stats, starred, following) 79 + - [ ] Personalized feed ("For You" / "Global" toggle) 80 + 81 + ## App: Write Features 82 + 83 + **Depends on:** App: Authentication & Social 84 + 85 + - [ ] Create issue (title + markdown body) 86 + - [ ] Comment on issues and PRs (threaded) 87 + - [ ] Close/reopen issues 88 + - [ ] Edit profile (bio, links, avatar, pinned repos) 89 + - [ ] OAuth scope upgrade flow 90 + 91 + ## App: Offline & Performance 92 + 93 + **Depends on:** App: Search & Discovery (for cache persistence of search/feed data) 94 + 95 + - [ ] Dexie setup with database schema (query cache + pinned content tables) 96 + - [ ] TanStack Query persister backed by Dexie 97 + - [ ] Pinned content store (save/unsave files for offline reading) 98 + - [ ] Pinned files UI (list, pin/unpin actions on file viewer, last-fetched timestamp) 99 + - [ ] Offline detection and banner 100 + - [ ] Secure token storage (Capacitor Secure Storage) 101 + - [ ] Cache eviction (per-type limits and TTL, pinned content exempt) 102 + - [ ] List virtualization for large datasets 103 + - [ ] Lazy-load avatars, prefetch on hover 104 + - [ ] Code splitting and bundle optimization (target <500KB JS) 105 + 106 + ## App: Real-Time & Advanced 107 + 108 + **Depends on:** App: Authentication & Social, App: Offline & Performance 109 + 110 + - [ ] Jetstream integration for live `sh.tangled.*` events 111 + - [ ] Live UI indicators (new commits, new feed items, PR status) 112 + - [ ] Custom feed presets ("My repos", "Watching", "Team") 113 + - [ ] Repo forking 114 + - [ ] Labels (display, filter, manage) 115 + - [ ] Expanded reactions with emoji picker 116 + - [ ] PR interdiff (compare rounds) 117 + - [ ] Knot info display 118 + 119 + ## App: Push Notifications 120 + 121 + **Depends on:** App: Authentication & Social 122 + 123 + - [ ] Register device token on login 124 + - [ ] Subscribe to relevant events 125 + - [ ] Deliver via APNs/FCM 126 + - [ ] Handle notification taps (deep link to relevant screen)
+176
docs/specs/app-features.md
··· 1 + --- 2 + title: App Features 3 + updated: 2026-03-24 4 + --- 5 + 6 + ## Search & Discovery 7 + 8 + **Depends on:** Search API (Twister), Constellation API 9 + 10 + ### Search 11 + 12 + - Create search service pointing at Twister API 13 + - Debounced search input on Explore tab 14 + - Segmented results: repos, users, issues/PRs 15 + - Recent search history (local storage, clearable) 16 + - Graceful fallback when search API is unavailable 17 + 18 + ### Discovery Sections 19 + 20 + - Explore tab shows search prominently 21 + - Optional: trending repos or recently active repos (if data supports it) 22 + - Profile summaries enriched with Constellation data (star counts, follower counts) 23 + 24 + ### Home Tab 25 + 26 + - Handle-based direct browsing (already works) 27 + - Surface recently viewed repos/profiles from local history 28 + - Optional: personalized suggestions for signed-in users (later) 29 + 30 + ### Activity Feed 31 + 32 + - Investigate data sources: Jetstream, polling PDS, or Twister-aggregated feed 33 + - Activity tab shows recent events from followed users and starred repos 34 + - Filters by event type (commits, issues, PRs, stars) 35 + - Infinite scroll with pull-to-refresh 36 + 37 + ## Authentication & Social 38 + 39 + **Depends on:** Bluesky OAuth, Constellation API 40 + 41 + ### OAuth Sign-In 42 + 43 + - Install `@atcute/oauth-browser-client` 44 + - Host client metadata JSON with required scopes 45 + - Login page: handle input → resolution → OAuth redirect → callback 46 + - Capacitor deep link handling for native redirect 47 + - Session restoration on app launch, automatic token refresh 48 + - Logout, account switcher for multiple accounts 49 + - Auth state: idle → authenticating → authenticated → error 50 + 51 + ### Social Actions 52 + 53 + All social actions are AT Protocol record writes to the user's PDS. Counts come from Constellation. 54 + 55 + - **Star:** Create/delete `sh.tangled.feed.star` record. Show star count via Constellation `getBacklinksCount`. 56 + - **Follow:** Create/delete `sh.tangled.graph.follow` record. Show follower count via Constellation. 57 + - **React:** Create `sh.tangled.feed.reaction` record. Show reaction counts via Constellation. 58 + - Optimistic UI updates via TanStack Query mutation + cache invalidation. 59 + 60 + ### Authenticated Profile 61 + 62 + - Profile tab shows current user's data when signed in 63 + - Pinned repos, stats (repos, stars, followers via Constellation) 64 + - Starred repos list 65 + - Following/followers lists (via Constellation `getBacklinks`) 66 + - Settings and logout 67 + 68 + ### Personalized Feed 69 + 70 + - Filter activity feed to followed users and starred repos 71 + - "For You" / "Global" toggle on activity tab 72 + 73 + ## Write Features 74 + 75 + **Depends on:** Authentication 76 + 77 + ### Issues 78 + 79 + - Create issue: title + markdown body, posted as `sh.tangled.repo.issue` record 80 + - Comment on issue: threaded comments as `sh.tangled.repo.issue.comment` records 81 + - Close/reopen: create `sh.tangled.repo.issue.state` record 82 + 83 + ### Pull Requests 84 + 85 + - Comment on PR: `sh.tangled.repo.pull.comment` records 86 + 87 + ### Profile Editing 88 + 89 + - Edit bio, links, location, pronouns, pinned repos 90 + - Avatar upload (max 1MB, png/jpeg) 91 + - Cross-posting toggle 92 + - Posted as updated `sh.tangled.actor.profile` record 93 + 94 + ### OAuth Scope Upgrade 95 + 96 + - Detect when an action requires a scope not yet granted 97 + - Prompt user to re-authorize with expanded scopes 98 + 99 + ## Offline & Performance 100 + 101 + ### Local Storage 102 + 103 + All local persistence uses **Dexie** over IndexedDB. This works natively in Capacitor's WebView on both iOS and Android, and in the browser during local development — no platform branching or plugins needed. 104 + 105 + Three storage layers, each with a distinct purpose: 106 + 107 + - **TanStack Query persister (Dexie-backed)** — Automatic cache persistence. Previously-viewed data hydrates on launch and serves from cache when offline. Subject to normal cache eviction (stale times, GC). 108 + - **Pinned content store (Dexie)** — User-initiated "save for offline" storage for files, READMEs, and other reference content. Exempt from cache eviction — only the user removes pinned items. Stores file content, metadata, repo handle, pinned timestamp. 109 + - **Capacitor Preferences** — Small key-value settings (theme, recent search history, feed preferences). 110 + - **Capacitor Secure Storage** — Auth tokens only. Never in Dexie or the query cache. 111 + 112 + ### Offline Behavior 113 + 114 + - TanStack Query serves cached data when offline (stale-while-revalidate) 115 + - Pinned files always available regardless of connectivity 116 + - Offline detection via `navigator.onLine`, persistent banner 117 + - Mutations disabled when offline 118 + - Background refresh when connectivity returns 119 + 120 + ### Pinned Files 121 + 122 + Users can pin/save references to files for offline reading: 123 + 124 + - Pin action on file viewer saves content + metadata to the Dexie pinned store 125 + - Pinned files list accessible from profile or a dedicated section 126 + - Content persists until the user explicitly unpins 127 + - Pinned items show last-fetched timestamp; refresh when online 128 + 129 + ### Cache Management 130 + 131 + - Per-type limits: repo metadata (200 items/7 days), file trees (50/3 days), profiles (100/7 days), search results (20/1 day) 132 + - Eviction on app launch and periodically 133 + - Pinned content exempt from eviction 134 + - Measure and cap IndexedDB usage 135 + 136 + ### Performance 137 + 138 + - Prefetch on hover/visibility for likely navigation targets 139 + - Virtualized lists for large datasets (1000+ items) 140 + - Lazy-load avatars with initials fallback 141 + - Route-level code splitting 142 + - Tree-shake Ionic components 143 + - Target: under 500KB JS, shell first-paint under 2s 144 + 145 + ## Real-Time & Advanced 146 + 147 + **Depends on:** Authentication, Activity Feed 148 + 149 + ### Jetstream Integration 150 + 151 + - Connect to Jetstream for real-time `sh.tangled.*` events 152 + - Filter and normalize into ActivityItem, merge into TanStack Query cache 153 + - Connect on foreground, disconnect on background 154 + - Cursor tracking for gap-free resume 155 + - Battery-aware throttling 156 + 157 + ### Live UI Indicators 158 + 159 + - "New commits" banner in repo detail 160 + - "X new items" pill on activity feed 161 + - Live status updates on PR detail 162 + - Issue comment count updates 163 + 164 + ### Custom Feeds 165 + 166 + - Presets: "My repos", "Watching", "Team" 167 + - Feed builder UI for custom filters 168 + - Local storage in IndexedDB 169 + 170 + ### Advanced Features 171 + 172 + - **Repo forking:** Create repo with source field, show fork status, sync action 173 + - **Labels:** Display color-coded chips, filter by label, add/remove with auth 174 + - **Expanded reactions:** Emoji picker, grouped counts, add/remove 175 + - **PR interdiff:** Compare rounds via `sh.tangled.repo.compare` 176 + - **Knot info:** Show hostname, version, health status on repo detail
+154
docs/specs/data-sources.md
··· 1 + --- 2 + title: Data Sources & Integration 3 + updated: 2026-03-24 4 + --- 5 + 6 + Twisted pulls data from four external sources and authenticates users via Bluesky OAuth. Each source has a distinct role — no single source is authoritative for everything. 7 + 8 + ## Source Overview 9 + 10 + | Source | What it provides | Access pattern | 11 + | ------------------------ | ------------------------------------------------------------------------------ | ---------------------------------------------------------- | 12 + | **Tangled XRPC (Knots)** | Git data — file trees, blobs, commits, branches, diffs, tags | Direct XRPC calls to the knot hosting each repo | 13 + | **AT Protocol (PDS)** | User records — profiles, repos, issues, PRs, comments, stars, follows | `com.atproto.repo.getRecord` / `listRecords` on user's PDS | 14 + | **Constellation** | Social signals — star counts, follower counts, reaction counts, backlink lists | Public JSON API at `constellation.microcosm.blue` | 15 + | **Tap** | Real-time firehose of AT Protocol record events for indexing | WebSocket consumer, feeds our search index | 16 + 17 + ## Constellation 18 + 19 + [Constellation](https://constellation.microcosm.blue) is a public, self-hosted index of AT Protocol backlinks. It answers "who linked to this?" across the entire network — making it the right source for aggregated social signals instead of maintaining our own counters. 20 + 21 + ### Key Endpoints 22 + 23 + **`GET /xrpc/blue.microcosm.links.getBacklinks`** — Get records linking to a target. 24 + 25 + - `subject` (required) — The target (AT-URI, DID, or URL) 26 + - `source` (required) — Collection and path, e.g. `sh.tangled.feed.star:subject.uri` 27 + - `did` — Filter to specific users (repeatable) 28 + - `limit` — Default 16, max 100 29 + - `reverse` — Reverse ordering 30 + 31 + **`GET /xrpc/blue.microcosm.links.getBacklinksCount`** — Count of links to a target. 32 + 33 + - `subject`, `source` — Same as above 34 + 35 + **`GET /xrpc/blue.microcosm.links.getManyToManyCounts`** — Secondary link counts in many-to-many relationships. 36 + 37 + - `subject`, `source`, `pathToOther` (required) 38 + - `did`, `otherSubject`, `limit` (optional) 39 + 40 + ### Usage in Twisted 41 + 42 + | Need | Constellation call | 43 + | ------------------------- | ---------------------------------------------------------------------------------------- | 44 + | Star count for a repo | `getBacklinksCount(subject=repo_at_uri, source=sh.tangled.feed.star:subject.uri)` | 45 + | Who starred a repo | `getBacklinks(subject=repo_at_uri, source=sh.tangled.feed.star:subject.uri)` | 46 + | Follower count for a user | `getBacklinksCount(subject=user_did, source=sh.tangled.graph.follow:subject)` | 47 + | Who follows a user | `getBacklinks(subject=user_did, source=sh.tangled.graph.follow:subject)` | 48 + | Reaction count on content | `getBacklinksCount(subject=content_at_uri, source=sh.tangled.feed.reaction:subject.uri)` | 49 + 50 + This replaces the need to index and count interaction records ourselves. Our Tap pipeline still indexes interaction records for search and graph discovery, but Constellation is the source of truth for counts and lists. 51 + 52 + ### Integration Notes 53 + 54 + - No authentication required. Constellation asks for a user-agent header with project name and contact. 55 + - Responses are paginated via cursor. Plan for multiple pages when listing (e.g., all followers). 56 + - The API is read-only — social actions (star, follow, react) are still AT Protocol record writes to the user's PDS. 57 + 58 + ## Tangled XRPC (Knots) 59 + 60 + Knots are Tangled's git hosting servers. Each repo lives on a specific knot, identified by the knot DID in the repo's AT Protocol record. 61 + 62 + ### Endpoints Used 63 + 64 + - `sh.tangled.repo.tree` — File tree for a ref 65 + - `sh.tangled.repo.blob` — File content 66 + - `sh.tangled.repo.log` — Commit history 67 + - `sh.tangled.repo.branches` / `sh.tangled.repo.tags` — Refs 68 + - `sh.tangled.repo.getDefaultBranch` — Default branch name 69 + - `sh.tangled.repo.diff` / `sh.tangled.repo.compare` — Diffs 70 + - `sh.tangled.repo.languages` — Language breakdown 71 + - `sh.tangled.knot.version` — Knot software version 72 + 73 + ### Routing 74 + 75 + The app resolves which knot hosts a repo by reading the repo's AT Protocol record (which contains the knot DID), then resolving the knot DID to its service endpoint. XRPC calls go directly to that knot. 76 + 77 + The Tangled appview at `tangled.org` serves HTML only — there is no JSON API at the appview level. 78 + 79 + ## AT Protocol (PDS) 80 + 81 + Standard AT Protocol record access for reading and writing user data. 82 + 83 + ### Read Operations 84 + 85 + - `com.atproto.repo.getRecord` — Fetch a single record by collection + rkey 86 + - `com.atproto.repo.listRecords` — List records in a collection with pagination 87 + 88 + Used for: profiles, repo metadata, issues, PRs, comments, stars, follows, reactions. 89 + 90 + ### Write Operations (Authenticated) 91 + 92 + - `com.atproto.repo.createRecord` — Create a new record (star, follow, react, issue, comment) 93 + - `com.atproto.repo.deleteRecord` — Delete a record (unstar, unfollow) 94 + 95 + All writes go to the authenticated user's PDS using their OAuth session. 96 + 97 + ### Identity Resolution 98 + 99 + - Handle → DID via `com.atproto.identity.resolveHandle` 100 + - DID → DID document via PLC Directory (`plc.directory`) or `.well-known/did.json` 101 + - DID document → PDS endpoint (from `#atprotoPersonalDataServer` service) 102 + 103 + ## Tap (Firehose) 104 + 105 + Tap provides a filtered firehose of AT Protocol events. Our indexer consumes Tap via WebSocket, indexing records into the search database. 106 + 107 + ### What We Index via Tap 108 + 109 + - Repos, issues, PRs, comments, strings, profiles — for full-text search 110 + - Follows — for graph discovery during backfill 111 + - Issue state and PR status changes — for state filtering in search 112 + 113 + ### What We Don't Need to Count via Tap 114 + 115 + Stars, followers, reactions — Constellation handles counts and lists. We still process these events for graph discovery but don't need to maintain our own counters. 116 + 117 + ### Tap Protocol 118 + 119 + - WebSocket connection with cursor-based resume 120 + - Events contain: operation (create/update/delete), DID, collection, rkey, CID, record payload 121 + - Acks required after processing each event 122 + - Backfill via `/repos/add` endpoint to request historical data for specific users 123 + 124 + ## Bluesky OAuth 125 + 126 + Authentication uses AT Protocol OAuth via `@atcute/oauth-browser-client`. 127 + 128 + ### Flow 129 + 130 + 1. User enters their handle 131 + 2. App resolves handle → DID → PDS → authorization server metadata 132 + 3. App initiates OAuth with requested scopes 133 + 4. User authorizes in browser, redirected back to app 134 + 5. App exchanges code for tokens 135 + 6. Session provides `dpopFetch` for authenticated XRPC calls 136 + 137 + ### Scopes 138 + 139 + The app requests scopes for: 140 + 141 + - `sh.tangled.feed.star` — Star/unstar repos 142 + - `sh.tangled.graph.follow` — Follow/unfollow users 143 + - `sh.tangled.feed.reaction` — Add reactions 144 + - `sh.tangled.actor.profile` — Edit profile 145 + - `sh.tangled.repo.issue` / `sh.tangled.repo.issue.comment` — Create issues and comments 146 + - `sh.tangled.repo.pull.comment` — Comment on PRs 147 + 148 + ### Capacitor Integration 149 + 150 + On native platforms, OAuth callback uses a deep link URL scheme registered with Capacitor. The app listens via `App.addListener('appUrlOpen', ...)` to catch the redirect. 151 + 152 + ### Session Management 153 + 154 + Tokens are stored in secure storage (encrypted localStorage on web, Capacitor Secure Storage on native). Sessions auto-refresh. The app supports multiple accounts with an account switcher.
+116
docs/specs/search.md
··· 1 + --- 2 + title: Search 3 + updated: 2026-03-24 4 + --- 5 + 6 + Search lets users find repos, issues, PRs, profiles, and code snippets across the Tangled network. The API supports three modes with progressive capability. 7 + 8 + ## Modes 9 + 10 + ### Keyword Search (Implemented) 11 + 12 + Full-text search powered by SQLite FTS5 with BM25 scoring. Queries are tokenized, matched against title, body, summary, repo name, author handle, and tags. Results are ranked by relevance with field-specific weights (title highest, then author handle, summary, body). 13 + 14 + Snippets are generated from the body field with match terms wrapped in `<mark>` tags. 15 + 16 + ### Semantic Search (Planned) 17 + 18 + Vector similarity search using **Nomic Embed Text v1.5**, deployed on Railway via the [nomic-embed template](https://railway.com/deploy/nomic-embed). The template runs Ollama behind an authenticated Caddy proxy. 19 + 20 + **Embedding service:** 21 + 22 + - Model: `nomic-embed-text:latest` (8192-token context, 768-dimensional vectors, Matryoshka support for variable dimensionality) 23 + - Endpoint: `POST /api/embeddings` with Bearer token auth 24 + - Request: `{ "model": "nomic-embed-text:latest", "prompt": "text to embed" }` 25 + - Deployed as a separate Railway service alongside the API and indexer 26 + 27 + **Pipeline:** 28 + 29 + - The embed-worker consumes the `embedding_jobs` queue, calls the Nomic Embed service, and stores 768-dim vectors in the `document_embeddings` table 30 + - Documents are embedded asynchronously after indexing — the embed-worker runs independently of the ingestion loop 31 + - Search queries are embedded at request time (single prompt, low latency) 32 + - Vectors are matched via DiskANN cosine similarity index in Turso 33 + 34 + ### Hybrid Search (Planned) 35 + 36 + Weighted combination of keyword and semantic results. Default blend: 0.65 keyword + 0.35 semantic (configurable). Scores are normalized to [0, 1] before blending. Results are deduplicated by document ID with the higher score retained. Each result includes a `matched_by` field indicating which mode(s) contributed. 37 + 38 + ## API Contract 39 + 40 + **`GET /search`** — Unified endpoint, routes by `mode` parameter. 41 + 42 + ### Parameters 43 + 44 + | Param | Required | Default | Description | 45 + | ------------ | -------- | ------- | ------------------------------------- | 46 + | `q` | Yes | — | Query string | 47 + | `mode` | No | keyword | keyword, semantic, or hybrid | 48 + | `limit` | No | 20 | Results per page (1–100) | 49 + | `offset` | No | 0 | Pagination offset | 50 + | `collection` | No | — | Filter by collection NSID | 51 + | `type` | No | — | Filter by record type | 52 + | `author` | No | — | Filter by handle or DID | 53 + | `repo` | No | — | Filter by repo name or DID | 54 + | `language` | No | — | Filter by primary language | 55 + | `from` | No | — | Created after (ISO 8601) | 56 + | `to` | No | — | Created before (ISO 8601) | 57 + | `state` | No | — | Issue/PR state (open, closed, merged) | 58 + 59 + ### Response 60 + 61 + ```json 62 + { 63 + "query": "tangled vue", 64 + "mode": "keyword", 65 + "total": 42, 66 + "limit": 20, 67 + "offset": 0, 68 + "results": [ 69 + { 70 + "id": "did:plc:abc|sh.tangled.repo|my-repo", 71 + "collection": "sh.tangled.repo", 72 + "record_type": "repo", 73 + "title": "my-repo", 74 + "summary": "A Vue component library", 75 + "body_snippet": "...building <mark>Vue</mark> components for <mark>Tangled</mark>...", 76 + "score": 4.82, 77 + "matched_by": ["keyword"], 78 + "repo_name": "my-repo", 79 + "author_handle": "alice.bsky.social", 80 + "did": "did:plc:abc", 81 + "at_uri": "at://did:plc:abc/sh.tangled.repo/my-repo", 82 + "web_url": "https://tangled.sh/alice.bsky.social/my-repo", 83 + "created_at": "2026-01-15T10:00:00Z", 84 + "updated_at": "2026-03-20T14:30:00Z" 85 + } 86 + ] 87 + } 88 + ``` 89 + 90 + ## Pragmatic Search Strategy 91 + 92 + Indexing via Tap is useful but has proven unreliable for maintaining complete, up-to-date coverage. The approach: 93 + 94 + 1. **Keyword search is the foundation.** It works now and covers the primary use case — finding repos, issues, and people by name or content. 95 + 96 + 2. **Constellation supplements search results.** Star counts and follower counts from Constellation can be used as ranking signals without needing to index interaction records ourselves. 97 + 98 + 3. **Semantic search is additive.** It improves discovery for vague queries but isn't required for the app to be useful. It ships when the embedding pipeline is stable. 99 + 100 + 4. **Graceful degradation.** The mobile app treats the search API as optional. If Twister is unavailable, handle-based direct browsing still works. Search results link into the same browsing screens. 101 + 102 + ## Quality Improvements (Planned) 103 + 104 + - Field weight tuning based on real query patterns 105 + - Recency boost for recently updated content 106 + - Collection-aware ranking (repos weighted higher for short queries) 107 + - Star count as a ranking signal (via Constellation) 108 + - State filtering (exclude closed issues by default) 109 + - Better snippet generation with longer context windows 110 + - Relevance test fixtures for regression testing 111 + 112 + ## Mobile Integration 113 + 114 + The app calls the search API from the Explore tab. Results are displayed in segmented views (repos, users, issues/PRs). Each result links to the corresponding browsing screen (repo detail, profile, issue detail). 115 + 116 + When the search API is unavailable, the Explore tab shows an appropriate state rather than breaking. The Home tab's handle-based browsing is fully independent of search.
+7
docs/todo.md
··· 1 + --- 2 + title: Parking Lot 3 + updated: 2026-03-24 4 + --- 5 + 6 + - Constellation requests would be a good opportunity to dispatch a job to index what 7 + was requested.