search for standard sites pub-search.waow.tech/
search zig blog atproto

Compare changes

Choose any two refs to compare.

+13 -32
README.md
··· 1 - # pub search 1 + # leaflet-search 2 2 3 3 by [@zzstoatzz.io](https://bsky.app/profile/zzstoatzz.io) 4 4 5 - search ATProto publishing platforms ([leaflet](https://leaflet.pub), [pckt](https://pckt.blog), and others using [standard.site](https://standard.site)). 6 - 7 - **live:** [pub-search.waow.tech](https://pub-search.waow.tech) 5 + search for [leaflet](https://leaflet.pub). 8 6 9 - > formerly "leaflet-search" - generalized to support multiple publishing platforms 7 + **live:** [leaflet-search.pages.dev](https://leaflet-search.pages.dev) 10 8 11 9 ## how it works 12 10 13 - 1. **tap** syncs content from ATProto firehose (signals on `pub.leaflet.document`, filters `pub.leaflet.*` + `site.standard.*`) 11 + 1. **tap** syncs leaflet content from the network 14 12 2. **backend** indexes content into SQLite FTS5 via [Turso](https://turso.tech), serves search API 15 13 3. **site** static frontend on Cloudflare Pages 16 14 ··· 19 17 search is also exposed as an MCP server for AI agents like Claude Code: 20 18 21 19 ```bash 22 - claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}' 20 + claude mcp add-json leaflet '{"type": "http", "url": "https://leaflet-search-by-zzstoatzz.fastmcp.app/mcp"}' 23 21 ``` 24 22 25 23 see [mcp/README.md](mcp/README.md) for local setup and usage details. ··· 27 25 ## api 28 26 29 27 ``` 30 - GET /search?q=<query>&tag=<tag>&platform=<platform> # full-text search 31 - GET /similar?uri=<at-uri> # find similar documents 32 - GET /tags # list all tags with counts 33 - GET /popular # popular search queries 34 - GET /stats # document/publication counts 35 - GET /health # health check 28 + GET /search?q=<query>&tag=<tag> # full-text search with query, tag, or both 29 + GET /similar?uri=<at-uri> # find similar documents via vector embeddings 30 + GET /tags # list all tags with counts 31 + GET /popular # popular search queries 32 + GET /stats # document/publication counts 33 + GET /health # health check 36 34 ``` 37 35 38 - search returns three entity types: `article` (document in a publication), `looseleaf` (standalone document), `publication` (newsletter itself). each result includes a `platform` field (leaflet, pckt, etc). tag and platform filtering apply to documents only. 36 + search returns three entity types: `article` (document in a publication), `looseleaf` (standalone document), `publication` (newsletter itself). tag filtering applies to documents only. 39 37 40 38 `/similar` uses [Voyage AI](https://voyageai.com) embeddings with brute-force cosine similarity (~0.15s for 3500 docs). 41 39 42 - ## configuration 43 - 44 - the backend is fully configurable via environment variables: 45 - 46 - | variable | default | description | 47 - |----------|---------|-------------| 48 - | `APP_NAME` | `leaflet-search` | name shown in startup logs | 49 - | `DASHBOARD_URL` | `https://pub-search.waow.tech/dashboard.html` | redirect target for `/dashboard` | 50 - | `TAP_HOST` | `leaflet-search-tap.fly.dev` | TAP websocket host | 51 - | `TAP_PORT` | `443` | TAP websocket port | 52 - | `PORT` | `3000` | HTTP server port | 53 - | `TURSO_URL` | - | Turso database URL (required) | 54 - | `TURSO_TOKEN` | - | Turso auth token (required) | 55 - | `VOYAGE_API_KEY` | - | Voyage AI API key (for embeddings) | 56 - 57 - the backend indexes multiple ATProto platforms - currently `pub.leaflet.*` and `site.standard.*` collections. platform is stored per-document and returned in search results. 58 - 59 40 ## [stack](https://bsky.app/profile/zzstoatzz.io/post/3mbij5ip4ws2a) 60 41 61 42 - [Fly.io](https://fly.io) hosts backend + tap 62 43 - [Turso](https://turso.tech) cloud SQLite with vector support 63 44 - [Voyage AI](https://voyageai.com) embeddings (voyage-3-lite) 64 - - [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) syncs content from ATProto firehose 45 + - [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) syncs leaflet content from ATProto firehose 65 46 - [Zig](https://ziglang.org) HTTP server, search API, content indexing 66 47 - [Cloudflare Pages](https://pages.cloudflare.com) static frontend 67 48
+2 -2
backend/build.zig.zon
··· 13 13 .hash = "zql-0.0.1-alpha-xNRI4IRNAABUb9gLat5FWUaZDD5HvxAxet_-elgR_A_y", 14 14 }, 15 15 .zat = .{ 16 - .url = "https://tangled.sh/zat.dev/zat/archive/main", 17 - .hash = "zat-0.1.0-5PuC7heIAQA4j2UVmJT-oivQh5AwZTrFQ-NC4CJi2-_R", 16 + .url = "https://tangled.sh/zzstoatzz.io/zat/archive/main", 17 + .hash = "zat-0.1.0-5PuC7ntmAQA9_8rALQwWad2riXWTY9p_ohVOD54_Y-2c", 18 18 }, 19 19 }, 20 20 .paths = .{
+18 -27
backend/src/dashboard.zig
··· 7 7 const TagJson = struct { tag: []const u8, count: i64 }; 8 8 const TimelineJson = struct { date: []const u8, count: i64 }; 9 9 const PubJson = struct { name: []const u8, basePath: []const u8, count: i64 }; 10 - const PlatformJson = struct { platform: []const u8, count: i64 }; 11 10 12 11 /// All data needed to render the dashboard 13 12 pub const Data = struct { 14 13 started_at: i64, 15 14 searches: i64, 16 15 publications: i64, 17 - documents: i64, 16 + articles: i64, 17 + looseleafs: i64, 18 18 tags_json: []const u8, 19 19 timeline_json: []const u8, 20 20 top_pubs_json: []const u8, 21 - platforms_json: []const u8, 22 21 }; 23 22 24 23 // all dashboard queries batched into one request ··· 31 30 \\ (SELECT service_started_at FROM stats WHERE id = 1) as started_at 32 31 ; 33 32 34 - const PLATFORMS_SQL = 35 - \\SELECT platform, COUNT(*) as count 33 + const DOC_TYPES_SQL = 34 + \\SELECT 35 + \\ SUM(CASE WHEN publication_uri != '' THEN 1 ELSE 0 END) as articles, 36 + \\ SUM(CASE WHEN publication_uri = '' OR publication_uri IS NULL THEN 1 ELSE 0 END) as looseleafs 36 37 \\FROM documents 37 - \\GROUP BY platform 38 - \\ORDER BY count DESC 39 38 ; 40 39 41 40 const TAGS_SQL = ··· 70 69 // batch all 5 queries into one HTTP request 71 70 var batch = client.queryBatch(&.{ 72 71 .{ .sql = STATS_SQL }, 73 - .{ .sql = PLATFORMS_SQL }, 72 + .{ .sql = DOC_TYPES_SQL }, 74 73 .{ .sql = TAGS_SQL }, 75 74 .{ .sql = TIMELINE_SQL }, 76 75 .{ .sql = TOP_PUBS_SQL }, ··· 82 81 const started_at = if (stats_row) |r| r.int(4) else 0; 83 82 const searches = if (stats_row) |r| r.int(2) else 0; 84 83 const publications = if (stats_row) |r| r.int(1) else 0; 85 - const documents = if (stats_row) |r| r.int(0) else 0; 84 + 85 + // extract doc types (query 1) 86 + const doc_row = batch.getFirst(1); 87 + const articles = if (doc_row) |r| r.int(0) else 0; 88 + const looseleafs = if (doc_row) |r| r.int(1) else 0; 86 89 87 90 return .{ 88 91 .started_at = started_at, 89 92 .searches = searches, 90 93 .publications = publications, 91 - .documents = documents, 94 + .articles = articles, 95 + .looseleafs = looseleafs, 92 96 .tags_json = try formatTagsJson(alloc, batch.get(2)), 93 97 .timeline_json = try formatTimelineJson(alloc, batch.get(3)), 94 98 .top_pubs_json = try formatPubsJson(alloc, batch.get(4)), 95 - .platforms_json = try formatPlatformsJson(alloc, batch.get(1)), 96 99 }; 97 100 } 98 101 ··· 126 129 return try output.toOwnedSlice(); 127 130 } 128 131 129 - fn formatPlatformsJson(alloc: Allocator, rows: []const db.Row) ![]const u8 { 130 - var output: std.Io.Writer.Allocating = .init(alloc); 131 - errdefer output.deinit(); 132 - var jw: json.Stringify = .{ .writer = &output.writer }; 133 - try jw.beginArray(); 134 - for (rows) |row| try jw.write(PlatformJson{ .platform = row.text(0), .count = row.int(1) }); 135 - try jw.endArray(); 136 - return try output.toOwnedSlice(); 137 - } 138 - 139 132 /// Generate dashboard data as JSON for API endpoint 140 133 pub fn toJson(alloc: Allocator, data: Data) ![]const u8 { 141 134 var output: std.Io.Writer.Allocating = .init(alloc); ··· 153 146 try jw.objectField("publications"); 154 147 try jw.write(data.publications); 155 148 156 - try jw.objectField("documents"); 157 - try jw.write(data.documents); 149 + try jw.objectField("articles"); 150 + try jw.write(data.articles); 158 151 159 - try jw.objectField("platforms"); 160 - try jw.beginWriteRaw(); 161 - try jw.writer.writeAll(data.platforms_json); 162 - jw.endWriteRaw(); 152 + try jw.objectField("looseleafs"); 153 + try jw.write(data.looseleafs); 163 154 164 155 // use beginWriteRaw/endWriteRaw for pre-formatted JSON arrays 165 156 try jw.objectField("tags");
+1 -39
backend/src/db/schema.zig
··· 44 44 \\CREATE VIRTUAL TABLE IF NOT EXISTS publications_fts USING fts5( 45 45 \\ uri UNINDEXED, 46 46 \\ name, 47 - \\ description, 48 - \\ base_path 47 + \\ description 49 48 \\) 50 49 , &.{}); 51 50 ··· 128 127 client.exec("UPDATE documents SET platform = 'leaflet' WHERE platform IS NULL", &.{}) catch {}; 129 128 client.exec("UPDATE documents SET source_collection = 'pub.leaflet.document' WHERE source_collection IS NULL", &.{}) catch {}; 130 129 131 - // multi-platform support for publications 132 - client.exec("ALTER TABLE publications ADD COLUMN platform TEXT DEFAULT 'leaflet'", &.{}) catch {}; 133 - client.exec("ALTER TABLE publications ADD COLUMN source_collection TEXT DEFAULT 'pub.leaflet.publication'", &.{}) catch {}; 134 - client.exec("UPDATE publications SET platform = 'leaflet' WHERE platform IS NULL", &.{}) catch {}; 135 - client.exec("UPDATE publications SET source_collection = 'pub.leaflet.publication' WHERE source_collection IS NULL", &.{}) catch {}; 136 - 137 130 // vector embeddings column already added by backfill script 138 - 139 - // dedupe index: same (did, rkey) across collections = same document 140 - // e.g., pub.leaflet.document/abc and site.standard.document/abc are the same content 141 - client.exec("CREATE UNIQUE INDEX IF NOT EXISTS idx_documents_did_rkey ON documents(did, rkey)", &.{}) catch {}; 142 - client.exec("CREATE UNIQUE INDEX IF NOT EXISTS idx_publications_did_rkey ON publications(did, rkey)", &.{}) catch {}; 143 - 144 - // backfill platform from source_collection for records indexed before platform detection fix 145 - client.exec("UPDATE documents SET platform = 'leaflet' WHERE platform = 'unknown' AND source_collection LIKE 'pub.leaflet.%'", &.{}) catch {}; 146 - client.exec("UPDATE documents SET platform = 'pckt' WHERE platform = 'unknown' AND source_collection LIKE 'blog.pckt.%'", &.{}) catch {}; 147 - 148 - // detect platform from publication basePath (site.standard.* is a lexicon, not a platform) 149 - // pckt uses site.standard.* lexicon but basePath contains pckt.blog 150 - client.exec( 151 - \\UPDATE documents SET platform = 'pckt' 152 - \\WHERE platform IN ('standardsite', 'unknown') 153 - \\AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%pckt.blog%') 154 - , &.{}) catch {}; 155 - 156 - // leaflet also uses site.standard.* lexicon, detect by basePath 157 - client.exec( 158 - \\UPDATE documents SET platform = 'leaflet' 159 - \\WHERE platform IN ('standardsite', 'unknown') 160 - \\AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%leaflet.pub%') 161 - , &.{}) catch {}; 162 - 163 - // URL path field for documents (e.g., "/001" for zat.dev) 164 - // used to build full URL: publication.url + document.path 165 - client.exec("ALTER TABLE documents ADD COLUMN path TEXT", &.{}) catch {}; 166 - 167 - // note: publications_fts was rebuilt with base_path column via scripts/rebuild-pub-fts 168 - // new publications will include base_path via insertPublication in indexer.zig 169 131 }
+33 -46
backend/src/extractor.zig
··· 4 4 const Allocator = mem.Allocator; 5 5 const zat = @import("zat"); 6 6 7 - /// Detected platform from collection name 8 - /// Note: pckt and other platforms use site.standard.* collections. 9 - /// Platform detection from collection only distinguishes leaflet (custom lexicon) 10 - /// from site.standard users. Actual platform (pckt vs others) is detected later 11 - /// from publication basePath. 7 + /// Detected platform from content.$type 12 8 pub const Platform = enum { 13 9 leaflet, 14 - standardsite, // pckt and others using site.standard.* lexicon 10 + pckt, 11 + offprint, 15 12 unknown, 16 13 17 - pub fn fromCollection(collection: []const u8) Platform { 18 - if (mem.startsWith(u8, collection, "pub.leaflet.")) return .leaflet; 19 - if (mem.startsWith(u8, collection, "site.standard.")) return .standardsite; 14 + pub fn fromContentType(content_type: []const u8) Platform { 15 + if (mem.startsWith(u8, content_type, "pub.leaflet.")) return .leaflet; 16 + if (mem.startsWith(u8, content_type, "blog.pckt.")) return .pckt; 17 + if (mem.startsWith(u8, content_type, "app.offprint.")) return .offprint; 20 18 return .unknown; 21 19 } 22 20 23 - /// Internal name (for DB storage) 24 21 pub fn name(self: Platform) []const u8 { 25 22 return @tagName(self); 26 23 } 27 - 28 - /// Display name (for UI) 29 - pub fn displayName(self: Platform) []const u8 { 30 - return @tagName(self); 31 - } 32 24 }; 33 25 34 26 /// Extracted document data ready for indexing. ··· 42 34 tags: [][]const u8, 43 35 platform: Platform, 44 36 source_collection: []const u8, 45 - path: ?[]const u8, // URL path from record (e.g., "/001" for zat.dev) 46 37 47 38 pub fn deinit(self: *ExtractedDocument) void { 48 39 self.allocator.free(self.content); ··· 63 54 .{ "pub.leaflet.blocks.code", {} }, 64 55 }); 65 56 66 - /// Detect platform from collection name 67 - pub fn detectPlatform(collection: []const u8) Platform { 68 - return Platform.fromCollection(collection); 57 + /// Detect platform from record's content.$type field 58 + pub fn detectPlatform(record: json.ObjectMap) Platform { 59 + const content = record.get("content") orelse return .unknown; 60 + if (content != .object) return .unknown; 61 + 62 + const type_val = content.object.get("$type") orelse return .unknown; 63 + if (type_val != .string) return .unknown; 64 + 65 + return Platform.fromContentType(type_val.string); 69 66 } 70 67 71 68 /// Extract document content from a record. ··· 76 73 collection: []const u8, 77 74 ) !ExtractedDocument { 78 75 const record_val: json.Value = .{ .object = record }; 79 - const platform = detectPlatform(collection); 76 + const platform = detectPlatform(record); 80 77 81 78 // extract required fields 82 79 const title = zat.json.getString(record_val, "title") orelse return error.MissingTitle; ··· 84 81 // extract optional fields 85 82 const created_at = zat.json.getString(record_val, "publishedAt") orelse 86 83 zat.json.getString(record_val, "createdAt"); 87 - 88 - // publication/site can be a string (direct URI) or strongRef object ({uri, cid}) 89 - // zat.json.getString supports paths like "publication.uri" 90 84 const publication_uri = zat.json.getString(record_val, "publication") orelse 91 - zat.json.getString(record_val, "publication.uri") orelse 92 - zat.json.getString(record_val, "site") orelse 93 - zat.json.getString(record_val, "site.uri"); 94 - 95 - // extract URL path (site.standard.document uses "path" field like "/001") 96 - const path = zat.json.getString(record_val, "path"); 85 + zat.json.getString(record_val, "site"); // site.standard uses "site" 97 86 98 87 // extract tags - allocate owned slice 99 88 const tags = try extractTags(allocator, record_val); ··· 111 100 .tags = tags, 112 101 .platform = platform, 113 102 .source_collection = collection, 114 - .path = path, 115 103 }; 116 104 } 117 105 ··· 234 222 235 223 // --- tests --- 236 224 237 - test "Platform.fromCollection: leaflet" { 238 - try std.testing.expectEqual(Platform.leaflet, Platform.fromCollection("pub.leaflet.document")); 239 - try std.testing.expectEqual(Platform.leaflet, Platform.fromCollection("pub.leaflet.publication")); 225 + test "Platform.fromContentType: leaflet" { 226 + try std.testing.expectEqual(Platform.leaflet, Platform.fromContentType("pub.leaflet.content")); 227 + try std.testing.expectEqual(Platform.leaflet, Platform.fromContentType("pub.leaflet.blocks.text")); 240 228 } 241 229 242 - test "Platform.fromCollection: standardsite" { 243 - // pckt and others use site.standard.* collections 244 - try std.testing.expectEqual(Platform.standardsite, Platform.fromCollection("site.standard.document")); 245 - try std.testing.expectEqual(Platform.standardsite, Platform.fromCollection("site.standard.publication")); 230 + test "Platform.fromContentType: pckt" { 231 + try std.testing.expectEqual(Platform.pckt, Platform.fromContentType("blog.pckt.content")); 232 + try std.testing.expectEqual(Platform.pckt, Platform.fromContentType("blog.pckt.blocks.whatever")); 233 + } 234 + 235 + test "Platform.fromContentType: offprint" { 236 + try std.testing.expectEqual(Platform.offprint, Platform.fromContentType("app.offprint.content")); 246 237 } 247 238 248 - test "Platform.fromCollection: unknown" { 249 - try std.testing.expectEqual(Platform.unknown, Platform.fromCollection("something.else")); 250 - try std.testing.expectEqual(Platform.unknown, Platform.fromCollection("")); 239 + test "Platform.fromContentType: unknown" { 240 + try std.testing.expectEqual(Platform.unknown, Platform.fromContentType("something.else")); 241 + try std.testing.expectEqual(Platform.unknown, Platform.fromContentType("")); 251 242 } 252 243 253 244 test "Platform.name" { 254 245 try std.testing.expectEqualStrings("leaflet", Platform.leaflet.name()); 255 - try std.testing.expectEqualStrings("standardsite", Platform.standardsite.name()); 246 + try std.testing.expectEqualStrings("pckt", Platform.pckt.name()); 247 + try std.testing.expectEqualStrings("offprint", Platform.offprint.name()); 256 248 try std.testing.expectEqualStrings("unknown", Platform.unknown.name()); 257 249 } 258 - 259 - test "Platform.displayName" { 260 - try std.testing.expectEqualStrings("leaflet", Platform.leaflet.displayName()); 261 - try std.testing.expectEqualStrings("standardsite", Platform.standardsite.displayName()); 262 - }
+5 -34
backend/src/indexer.zig
··· 12 12 tags: []const []const u8, 13 13 platform: []const u8, 14 14 source_collection: []const u8, 15 - path: ?[]const u8, 16 15 ) !void { 17 16 const c = db.getClient() orelse return error.NotInitialized; 18 17 19 - // dedupe: if (did, rkey) exists with different uri, clean up old record first 20 - // this handles cross-collection duplicates (e.g., pub.leaflet.document + site.standard.document) 21 - if (c.query("SELECT uri FROM documents WHERE did = ? AND rkey = ?", &.{ did, rkey })) |result_val| { 22 - var result = result_val; 23 - defer result.deinit(); 24 - if (result.first()) |row| { 25 - const old_uri = row.text(0); 26 - if (!std.mem.eql(u8, old_uri, uri)) { 27 - c.exec("DELETE FROM documents_fts WHERE uri = ?", &.{old_uri}) catch {}; 28 - c.exec("DELETE FROM document_tags WHERE document_uri = ?", &.{old_uri}) catch {}; 29 - c.exec("DELETE FROM documents WHERE uri = ?", &.{old_uri}) catch {}; 30 - } 31 - } 32 - } else |_| {} 33 - 34 18 try c.exec( 35 - "INSERT OR REPLACE INTO documents (uri, did, rkey, title, content, created_at, publication_uri, platform, source_collection, path) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", 36 - &.{ uri, did, rkey, title, content, created_at orelse "", publication_uri orelse "", platform, source_collection, path orelse "" }, 19 + "INSERT OR REPLACE INTO documents (uri, did, rkey, title, content, created_at, publication_uri, platform, source_collection) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", 20 + &.{ uri, did, rkey, title, content, created_at orelse "", publication_uri orelse "", platform, source_collection }, 37 21 ); 38 22 39 23 // update FTS index ··· 63 47 ) !void { 64 48 const c = db.getClient() orelse return error.NotInitialized; 65 49 66 - // dedupe: if (did, rkey) exists with different uri, clean up old record first 67 - if (c.query("SELECT uri FROM publications WHERE did = ? AND rkey = ?", &.{ did, rkey })) |result_val| { 68 - var result = result_val; 69 - defer result.deinit(); 70 - if (result.first()) |row| { 71 - const old_uri = row.text(0); 72 - if (!std.mem.eql(u8, old_uri, uri)) { 73 - c.exec("DELETE FROM publications_fts WHERE uri = ?", &.{old_uri}) catch {}; 74 - c.exec("DELETE FROM publications WHERE uri = ?", &.{old_uri}) catch {}; 75 - } 76 - } 77 - } else |_| {} 78 - 79 50 try c.exec( 80 51 "INSERT OR REPLACE INTO publications (uri, did, rkey, name, description, base_path) VALUES (?, ?, ?, ?, ?, ?)", 81 52 &.{ uri, did, rkey, name, description orelse "", base_path orelse "" }, 82 53 ); 83 54 84 - // update FTS index (includes base_path for subdomain search) 55 + // update FTS index 85 56 c.exec("DELETE FROM publications_fts WHERE uri = ?", &.{uri}) catch {}; 86 57 c.exec( 87 - "INSERT INTO publications_fts (uri, name, description, base_path) VALUES (?, ?, ?, ?)", 88 - &.{ uri, name, description orelse "", base_path orelse "" }, 58 + "INSERT INTO publications_fts (uri, name, description) VALUES (?, ?, ?)", 59 + &.{ uri, name, description orelse "" }, 89 60 ) catch {}; 90 61 } 91 62
+1 -2
backend/src/main.zig
··· 43 43 var listener = try address.listen(.{ .reuse_address = true }); 44 44 defer listener.deinit(); 45 45 46 - const app_name = posix.getenv("APP_NAME") orelse "leaflet-search"; 47 - std.debug.print("{s} listening on http://0.0.0.0:{d} (max {} workers)\n", .{ app_name, port, MAX_HTTP_WORKERS }); 46 + std.debug.print("leaflet-search listening on http://0.0.0.0:{d} (max {} workers)\n", .{ port, MAX_HTTP_WORKERS }); 48 47 49 48 while (true) { 50 49 const conn = listener.accept() catch |err| {
+22 -161
backend/src/search.zig
··· 16 16 rkey: []const u8, 17 17 basePath: []const u8, 18 18 platform: []const u8, 19 - path: []const u8 = "", // URL path from record (e.g., "/001") 20 19 }; 21 20 22 21 /// Document search result (internal) ··· 30 29 basePath: []const u8, 31 30 hasPublication: bool, 32 31 platform: []const u8, 33 - path: []const u8, 34 32 35 33 fn fromRow(row: db.Row) Doc { 36 34 return .{ ··· 43 41 .basePath = row.text(6), 44 42 .hasPublication = row.int(7) != 0, 45 43 .platform = row.text(8), 46 - .path = row.text(9), 47 44 }; 48 45 } 49 46 ··· 58 55 .rkey = self.rkey, 59 56 .basePath = self.basePath, 60 57 .platform = self.platform, 61 - .path = self.path, 62 58 }; 63 59 } 64 60 }; 65 61 66 62 const DocsByTag = zql.Query( 67 63 \\SELECT d.uri, d.did, d.title, '' as snippet, 68 - \\ d.created_at, d.rkey, 69 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 64 + \\ d.created_at, d.rkey, COALESCE(p.base_path, '') as base_path, 70 65 \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 71 - \\ d.platform, COALESCE(d.path, '') as path 66 + \\ d.platform 72 67 \\FROM documents d 73 68 \\LEFT JOIN publications p ON d.publication_uri = p.uri 74 69 \\JOIN document_tags dt ON d.uri = dt.document_uri ··· 79 74 const DocsByFtsAndTag = zql.Query( 80 75 \\SELECT f.uri, d.did, d.title, 81 76 \\ snippet(documents_fts, 2, '', '', '...', 32) as snippet, 82 - \\ d.created_at, d.rkey, 83 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 77 + \\ d.created_at, d.rkey, COALESCE(p.base_path, '') as base_path, 84 78 \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 85 - \\ d.platform, COALESCE(d.path, '') as path 79 + \\ d.platform 86 80 \\FROM documents_fts f 87 81 \\JOIN documents d ON f.uri = d.uri 88 82 \\LEFT JOIN publications p ON d.publication_uri = p.uri ··· 94 88 const DocsByFts = zql.Query( 95 89 \\SELECT f.uri, d.did, d.title, 96 90 \\ snippet(documents_fts, 2, '', '', '...', 32) as snippet, 97 - \\ d.created_at, d.rkey, 98 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 91 + \\ d.created_at, d.rkey, COALESCE(p.base_path, '') as base_path, 99 92 \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 100 - \\ d.platform, COALESCE(d.path, '') as path 93 + \\ d.platform 101 94 \\FROM documents_fts f 102 95 \\JOIN documents d ON f.uri = d.uri 103 96 \\LEFT JOIN publications p ON d.publication_uri = p.uri ··· 105 98 \\ORDER BY rank LIMIT 40 106 99 ); 107 100 108 - const DocsByFtsAndPlatform = zql.Query( 109 - \\SELECT f.uri, d.did, d.title, 110 - \\ snippet(documents_fts, 2, '', '', '...', 32) as snippet, 111 - \\ d.created_at, d.rkey, 112 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 113 - \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 114 - \\ d.platform, COALESCE(d.path, '') as path 115 - \\FROM documents_fts f 116 - \\JOIN documents d ON f.uri = d.uri 117 - \\LEFT JOIN publications p ON d.publication_uri = p.uri 118 - \\WHERE documents_fts MATCH :query AND d.platform = :platform 119 - \\ORDER BY rank LIMIT 40 120 - ); 121 - 122 - const DocsByTagAndPlatform = zql.Query( 123 - \\SELECT d.uri, d.did, d.title, '' as snippet, 124 - \\ d.created_at, d.rkey, 125 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 126 - \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 127 - \\ d.platform, COALESCE(d.path, '') as path 128 - \\FROM documents d 129 - \\LEFT JOIN publications p ON d.publication_uri = p.uri 130 - \\JOIN document_tags dt ON d.uri = dt.document_uri 131 - \\WHERE dt.tag = :tag AND d.platform = :platform 132 - \\ORDER BY d.created_at DESC LIMIT 40 133 - ); 134 - 135 - const DocsByFtsAndTagAndPlatform = zql.Query( 136 - \\SELECT f.uri, d.did, d.title, 137 - \\ snippet(documents_fts, 2, '', '', '...', 32) as snippet, 138 - \\ d.created_at, d.rkey, 139 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 140 - \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 141 - \\ d.platform, COALESCE(d.path, '') as path 142 - \\FROM documents_fts f 143 - \\JOIN documents d ON f.uri = d.uri 144 - \\LEFT JOIN publications p ON d.publication_uri = p.uri 145 - \\JOIN document_tags dt ON d.uri = dt.document_uri 146 - \\WHERE documents_fts MATCH :query AND dt.tag = :tag AND d.platform = :platform 147 - \\ORDER BY rank LIMIT 40 148 - ); 149 - 150 - const DocsByPlatform = zql.Query( 151 - \\SELECT d.uri, d.did, d.title, '' as snippet, 152 - \\ d.created_at, d.rkey, 153 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path, 154 - \\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 155 - \\ d.platform, COALESCE(d.path, '') as path 156 - \\FROM documents d 157 - \\LEFT JOIN publications p ON d.publication_uri = p.uri 158 - \\WHERE d.platform = :platform 159 - \\ORDER BY d.created_at DESC LIMIT 40 160 - ); 161 - 162 - // Find documents by their publication's base_path (subdomain search) 163 - // e.g., searching "gyst" finds all docs on gyst.leaflet.pub 164 - const DocsByPubBasePath = zql.Query( 165 - \\SELECT d.uri, d.did, d.title, '' as snippet, 166 - \\ d.created_at, d.rkey, 167 - \\ p.base_path, 168 - \\ 1 as has_publication, 169 - \\ d.platform, COALESCE(d.path, '') as path 170 - \\FROM documents d 171 - \\JOIN publications p ON d.publication_uri = p.uri 172 - \\JOIN publications_fts pf ON p.uri = pf.uri 173 - \\WHERE publications_fts MATCH :query 174 - \\ORDER BY d.created_at DESC LIMIT 40 175 - ); 176 - 177 - const DocsByPubBasePathAndPlatform = zql.Query( 178 - \\SELECT d.uri, d.did, d.title, '' as snippet, 179 - \\ d.created_at, d.rkey, 180 - \\ p.base_path, 181 - \\ 1 as has_publication, 182 - \\ d.platform, COALESCE(d.path, '') as path 183 - \\FROM documents d 184 - \\JOIN publications p ON d.publication_uri = p.uri 185 - \\JOIN publications_fts pf ON p.uri = pf.uri 186 - \\WHERE publications_fts MATCH :query AND d.platform = :platform 187 - \\ORDER BY d.created_at DESC LIMIT 40 188 - ); 189 - 190 101 /// Publication search result (internal) 191 102 const Pub = struct { 192 103 uri: []const u8, ··· 195 106 snippet: []const u8, 196 107 rkey: []const u8, 197 108 basePath: []const u8, 198 - platform: []const u8, 199 109 200 110 fn fromRow(row: db.Row) Pub { 201 111 return .{ ··· 205 115 .snippet = row.text(3), 206 116 .rkey = row.text(4), 207 117 .basePath = row.text(5), 208 - .platform = row.text(6), 209 118 }; 210 119 } 211 120 ··· 218 127 .snippet = self.snippet, 219 128 .rkey = self.rkey, 220 129 .basePath = self.basePath, 221 - .platform = self.platform, 130 + .platform = "leaflet", // publications are leaflet-only for now 222 131 }; 223 132 } 224 133 }; ··· 226 135 const PubSearch = zql.Query( 227 136 \\SELECT f.uri, p.did, p.name, 228 137 \\ snippet(publications_fts, 2, '', '', '...', 32) as snippet, 229 - \\ p.rkey, p.base_path, p.platform 138 + \\ p.rkey, p.base_path 230 139 \\FROM publications_fts f 231 140 \\JOIN publications p ON f.uri = p.uri 232 141 \\WHERE publications_fts MATCH :query ··· 243 152 try jw.beginArray(); 244 153 245 154 const fts_query = try buildFtsQuery(alloc, query); 246 - const has_query = query.len > 0; 247 - const has_tag = tag_filter != null; 248 - const has_platform = platform_filter != null; 249 155 250 - // track seen URIs for deduplication (content match + base_path match) 251 - var seen_uris = std.StringHashMap(void).init(alloc); 252 - defer seen_uris.deinit(); 253 - 254 - // search documents by content (title, content) - handle all filter combinations 255 - var doc_result = if (has_query and has_tag and has_platform) 256 - c.query(DocsByFtsAndTagAndPlatform.positional, DocsByFtsAndTagAndPlatform.bind(.{ 257 - .query = fts_query, 258 - .tag = tag_filter.?, 259 - .platform = platform_filter.?, 260 - })) catch null 261 - else if (has_query and has_tag) 262 - c.query(DocsByFtsAndTag.positional, DocsByFtsAndTag.bind(.{ .query = fts_query, .tag = tag_filter.? })) catch null 263 - else if (has_query and has_platform) 264 - c.query(DocsByFtsAndPlatform.positional, DocsByFtsAndPlatform.bind(.{ .query = fts_query, .platform = platform_filter.? })) catch null 265 - else if (has_query) 266 - c.query(DocsByFts.positional, DocsByFts.bind(.{ .query = fts_query })) catch null 267 - else if (has_tag and has_platform) 268 - c.query(DocsByTagAndPlatform.positional, DocsByTagAndPlatform.bind(.{ .tag = tag_filter.?, .platform = platform_filter.? })) catch null 269 - else if (has_tag) 156 + // search documents 157 + var doc_result = if (query.len == 0 and tag_filter != null) 270 158 c.query(DocsByTag.positional, DocsByTag.bind(.{ .tag = tag_filter.? })) catch null 271 - else if (has_platform) 272 - c.query(DocsByPlatform.positional, DocsByPlatform.bind(.{ .platform = platform_filter.? })) catch null 159 + else if (tag_filter) |tag| 160 + c.query(DocsByFtsAndTag.positional, DocsByFtsAndTag.bind(.{ .query = fts_query, .tag = tag })) catch null 273 161 else 274 - null; // no filters at all - return empty 162 + c.query(DocsByFts.positional, DocsByFts.bind(.{ .query = fts_query })) catch null; 275 163 276 164 if (doc_result) |*res| { 277 165 defer res.deinit(); 278 166 for (res.rows) |row| { 279 167 const doc = Doc.fromRow(row); 280 - // dupe URI for hash map (outlives result) 281 - const uri_dupe = try alloc.dupe(u8, doc.uri); 282 - try seen_uris.put(uri_dupe, {}); 168 + // filter by platform if specified 169 + if (platform_filter) |pf| { 170 + if (!std.mem.eql(u8, doc.platform, pf)) continue; 171 + } 283 172 try jw.write(doc.toJson()); 284 173 } 285 174 } 286 175 287 - // also search documents by publication base_path (subdomain search) 288 - // e.g., "gyst" finds all docs on gyst.leaflet.pub even if content doesn't contain "gyst" 289 - // skip if tag filter is set (tag filter is content-specific) 290 - if (has_query and !has_tag) { 291 - var basepath_result = if (has_platform) 292 - c.query(DocsByPubBasePathAndPlatform.positional, DocsByPubBasePathAndPlatform.bind(.{ 293 - .query = fts_query, 294 - .platform = platform_filter.?, 295 - })) catch null 296 - else 297 - c.query(DocsByPubBasePath.positional, DocsByPubBasePath.bind(.{ .query = fts_query })) catch null; 298 - 299 - if (basepath_result) |*res| { 300 - defer res.deinit(); 301 - for (res.rows) |row| { 302 - const doc = Doc.fromRow(row); 303 - // deduplicate: skip if already found by content search 304 - if (!seen_uris.contains(doc.uri)) { 305 - try jw.write(doc.toJson()); 306 - } 307 - } 308 - } 309 - } 310 - 311 - // publications are excluded when filtering by tag or platform 312 - // (platform filter is for documents only - publications don't have meaningful platform distinction) 313 - if (tag_filter == null and platform_filter == null) { 176 + // publications are excluded when filtering by tag or platform (only leaflet has publications) 177 + if (tag_filter == null and (platform_filter == null or std.mem.eql(u8, platform_filter.?, "leaflet"))) { 314 178 var pub_result = c.query( 315 179 PubSearch.positional, 316 180 PubSearch.bind(.{ .query = fts_query }), ··· 318 182 319 183 if (pub_result) |*res| { 320 184 defer res.deinit(); 321 - for (res.rows) |row| { 322 - try jw.write(Pub.fromRow(row).toJson()); 323 - } 185 + for (res.rows) |row| try jw.write(Pub.fromRow(row).toJson()); 324 186 } 325 187 } 326 188 ··· 353 215 // brute-force cosine similarity search (no vector index needed) 354 216 var res = c.query( 355 217 \\SELECT d2.uri, d2.did, d2.title, '' as snippet, 356 - \\ d2.created_at, d2.rkey, 357 - \\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d2.did LIMIT 1), '') as base_path, 218 + \\ d2.created_at, d2.rkey, COALESCE(p.base_path, '') as base_path, 358 219 \\ CASE WHEN d2.publication_uri != '' THEN 1 ELSE 0 END as has_publication, 359 - \\ d2.platform, COALESCE(d2.path, '') as path 220 + \\ d2.platform 360 221 \\FROM documents d1, documents d2 361 222 \\LEFT JOIN publications p ON d2.publication_uri = p.uri 362 223 \\WHERE d1.uri = ?
+2 -18
backend/src/server.zig
··· 56 56 try sendJson(request, "{\"status\":\"ok\"}"); 57 57 } else if (mem.eql(u8, target, "/popular")) { 58 58 try handlePopular(request); 59 - } else if (mem.eql(u8, target, "/platforms")) { 60 - try handlePlatforms(request); 61 59 } else if (mem.eql(u8, target, "/dashboard")) { 62 60 try handleDashboard(request); 63 61 } else if (mem.eql(u8, target, "/api/dashboard")) { ··· 113 111 try sendJson(request, popular); 114 112 } 115 113 116 - fn handlePlatforms(request: *http.Server.Request) !void { 117 - var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator); 118 - defer arena.deinit(); 119 - const alloc = arena.allocator(); 120 - 121 - const data = try stats.getPlatformCounts(alloc); 122 - try sendJson(request, data); 123 - } 124 - 125 114 fn parseQueryParam(alloc: std.mem.Allocator, target: []const u8, param: []const u8) ![]const u8 { 126 115 // look for ?param= or &param= 127 116 const patterns = [_][]const u8{ "?", "&" }; ··· 153 142 var response: std.ArrayList(u8) = .{}; 154 143 defer response.deinit(alloc); 155 144 156 - try response.print(alloc, "{{\"documents\":{d},\"publications\":{d},\"embeddings\":{d},\"cache_hits\":{d},\"cache_misses\":{d}}}", .{ db_stats.documents, db_stats.publications, db_stats.embeddings, db_stats.cache_hits, db_stats.cache_misses }); 145 + try response.print(alloc, "{{\"documents\":{d},\"publications\":{d},\"cache_hits\":{d},\"cache_misses\":{d}}}", .{ db_stats.documents, db_stats.publications, db_stats.cache_hits, db_stats.cache_misses }); 157 146 158 147 try sendJson(request, response.items); 159 148 } ··· 209 198 try sendJson(request, json_response); 210 199 } 211 200 212 - fn getDashboardUrl() []const u8 { 213 - return std.posix.getenv("DASHBOARD_URL") orelse "https://leaflet-search.pages.dev/dashboard.html"; 214 - } 215 - 216 201 fn handleDashboard(request: *http.Server.Request) !void { 217 - const dashboard_url = getDashboardUrl(); 218 202 try request.respond("", .{ 219 203 .status = .moved_permanently, 220 204 .extra_headers = &.{ 221 - .{ .name = "location", .value = dashboard_url }, 205 + .{ .name = "location", .value = "https://leaflet-search.pages.dev/dashboard.html" }, 222 206 }, 223 207 }); 224 208 }
+8 -64
backend/src/stats.zig
··· 38 38 pub const Stats = struct { 39 39 documents: i64, 40 40 publications: i64, 41 - embeddings: i64, 42 41 searches: i64, 43 42 errors: i64, 44 43 started_at: i64, ··· 46 45 cache_misses: i64, 47 46 }; 48 47 49 - const default_stats: Stats = .{ .documents = 0, .publications = 0, .embeddings = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 }; 50 - 51 48 pub fn getStats() Stats { 52 - const c = db.getClient() orelse return default_stats; 49 + const c = db.getClient() orelse return .{ .documents = 0, .publications = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 }; 53 50 54 51 var res = c.query( 55 52 \\SELECT 56 53 \\ (SELECT COUNT(*) FROM documents) as docs, 57 54 \\ (SELECT COUNT(*) FROM publications) as pubs, 58 - \\ (SELECT COUNT(*) FROM documents WHERE embedding IS NOT NULL) as embeddings, 59 55 \\ (SELECT total_searches FROM stats WHERE id = 1) as searches, 60 56 \\ (SELECT total_errors FROM stats WHERE id = 1) as errors, 61 57 \\ (SELECT service_started_at FROM stats WHERE id = 1) as started_at, 62 58 \\ (SELECT COALESCE(cache_hits, 0) FROM stats WHERE id = 1) as cache_hits, 63 59 \\ (SELECT COALESCE(cache_misses, 0) FROM stats WHERE id = 1) as cache_misses 64 - , &.{}) catch return default_stats; 60 + , &.{}) catch return .{ .documents = 0, .publications = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 }; 65 61 defer res.deinit(); 66 62 67 - const row = res.first() orelse return default_stats; 63 + const row = res.first() orelse return .{ .documents = 0, .publications = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 }; 68 64 return .{ 69 65 .documents = row.int(0), 70 66 .publications = row.int(1), 71 - .embeddings = row.int(2), 72 - .searches = row.int(3), 73 - .errors = row.int(4), 74 - .started_at = row.int(5), 75 - .cache_hits = row.int(6), 76 - .cache_misses = row.int(7), 67 + .searches = row.int(2), 68 + .errors = row.int(3), 69 + .started_at = row.int(4), 70 + .cache_hits = row.int(5), 71 + .cache_misses = row.int(6), 77 72 }; 78 73 } 79 74 ··· 105 100 pub fn recordCacheMiss() void { 106 101 const c = db.getClient() orelse return; 107 102 c.exec("UPDATE stats SET cache_misses = COALESCE(cache_misses, 0) + 1 WHERE id = 1", &.{}) catch {}; 108 - } 109 - 110 - const PlatformCount = struct { platform: []const u8, count: i64 }; 111 - 112 - pub fn getPlatformCounts(alloc: Allocator) ![]const u8 { 113 - const c = db.getClient() orelse return error.NotInitialized; 114 - 115 - var output: std.Io.Writer.Allocating = .init(alloc); 116 - errdefer output.deinit(); 117 - 118 - var jw: json.Stringify = .{ .writer = &output.writer }; 119 - try jw.beginObject(); 120 - 121 - // documents by platform 122 - try jw.objectField("documents"); 123 - if (c.query("SELECT platform, COUNT(*) as count FROM documents GROUP BY platform ORDER BY count DESC", &.{})) |res_val| { 124 - var res = res_val; 125 - defer res.deinit(); 126 - try jw.beginArray(); 127 - for (res.rows) |row| try jw.write(PlatformCount{ .platform = row.text(0), .count = row.int(1) }); 128 - try jw.endArray(); 129 - } else |_| { 130 - try jw.beginArray(); 131 - try jw.endArray(); 132 - } 133 - 134 - // FTS document count 135 - try jw.objectField("fts_count"); 136 - if (c.query("SELECT COUNT(*) FROM documents_fts", &.{})) |res_val| { 137 - var res = res_val; 138 - defer res.deinit(); 139 - if (res.first()) |row| { 140 - try jw.write(row.int(0)); 141 - } else try jw.write(0); 142 - } else |_| try jw.write(0); 143 - 144 - // sample URIs from each platform (for debugging) 145 - try jw.objectField("sample_standardsite"); 146 - if (c.query("SELECT uri FROM documents WHERE platform = 'standardsite' LIMIT 3", &.{})) |res_val| { 147 - var res = res_val; 148 - defer res.deinit(); 149 - try jw.beginArray(); 150 - for (res.rows) |row| try jw.write(row.text(0)); 151 - try jw.endArray(); 152 - } else |_| { 153 - try jw.beginArray(); 154 - try jw.endArray(); 155 - } 156 - 157 - try jw.endObject(); 158 - return try output.toOwnedSlice(); 159 103 } 160 104 161 105 pub fn getPopular(alloc: Allocator, limit: usize) ![]const u8 {
+35 -98
backend/src/tap.zig
··· 60 60 61 61 const Handler = struct { 62 62 allocator: Allocator, 63 - client: *websocket.Client, 64 63 msg_count: usize = 0, 65 - ack_buf: [64]u8 = undefined, 66 64 67 65 pub fn serverMessage(self: *Handler, data: []const u8) !void { 68 66 self.msg_count += 1; 69 67 if (self.msg_count % 100 == 1) { 70 68 std.debug.print("tap: received {} messages\n", .{self.msg_count}); 71 69 } 72 - 73 - // extract message ID for ACK 74 - const msg_id = extractMessageId(self.allocator, data); 75 - 76 - // process the message 77 70 processMessage(self.allocator, data) catch |err| { 78 71 std.debug.print("message processing error: {}\n", .{err}); 79 - // still ACK even on error to avoid infinite retries 80 - }; 81 - 82 - // send ACK if we have a message ID 83 - if (msg_id) |id| { 84 - self.sendAck(id); 85 - } 86 - } 87 - 88 - fn sendAck(self: *Handler, msg_id: i64) void { 89 - const ack_json = std.fmt.bufPrint(&self.ack_buf, "{{\"type\":\"ack\",\"id\":{d}}}", .{msg_id}) catch |err| { 90 - std.debug.print("tap: ACK format error: {}\n", .{err}); 91 - return; 92 - }; 93 - std.debug.print("tap: sending ACK for id={d}\n", .{msg_id}); 94 - self.client.write(@constCast(ack_json)) catch |err| { 95 - std.debug.print("tap: failed to send ACK: {}\n", .{err}); 96 72 }; 97 73 } 98 74 ··· 100 76 std.debug.print("tap connection closed\n", .{}); 101 77 } 102 78 }; 103 - 104 - fn extractMessageId(allocator: Allocator, payload: []const u8) ?i64 { 105 - const parsed = json.parseFromSlice(json.Value, allocator, payload, .{}) catch return null; 106 - defer parsed.deinit(); 107 - return zat.json.getInt(parsed.value, "id"); 108 - } 109 79 110 80 fn connect(allocator: Allocator) !void { 111 81 const host = getTapHost(); ··· 136 106 137 107 std.debug.print("tap connected!\n", .{}); 138 108 139 - var handler = Handler{ .allocator = allocator, .client = &client }; 109 + var handler = Handler{ .allocator = allocator }; 140 110 client.readLoop(&handler) catch |err| { 141 111 std.debug.print("websocket read loop error: {}\n", .{err}); 142 112 return err; ··· 146 116 /// TAP record envelope - extracted via zat.json.extractAt 147 117 const TapRecord = struct { 148 118 collection: []const u8, 149 - action: []const u8, // "create", "update", "delete" 119 + action: zat.CommitAction, 150 120 did: []const u8, 151 121 rkey: []const u8, 152 - 153 - pub fn isCreate(self: TapRecord) bool { 154 - return mem.eql(u8, self.action, "create"); 155 - } 156 - pub fn isUpdate(self: TapRecord) bool { 157 - return mem.eql(u8, self.action, "update"); 158 - } 159 - pub fn isDelete(self: TapRecord) bool { 160 - return mem.eql(u8, self.action, "delete"); 161 - } 162 122 }; 163 123 164 124 /// Leaflet publication fields ··· 169 129 }; 170 130 171 131 fn processMessage(allocator: Allocator, payload: []const u8) !void { 172 - const parsed = json.parseFromSlice(json.Value, allocator, payload, .{}) catch { 173 - std.debug.print("tap: JSON parse failed, first 100 bytes: {s}\n", .{payload[0..@min(payload.len, 100)]}); 174 - return; 175 - }; 132 + const parsed = json.parseFromSlice(json.Value, allocator, payload, .{}) catch return; 176 133 defer parsed.deinit(); 177 134 178 135 // check message type 179 - const msg_type = zat.json.getString(parsed.value, "type") orelse { 180 - std.debug.print("tap: no type field in message\n", .{}); 181 - return; 182 - }; 183 - 136 + const msg_type = zat.json.getString(parsed.value, "type") orelse return; 184 137 if (!mem.eql(u8, msg_type, "record")) return; 185 138 186 - // extract record envelope (extractAt ignores extra fields like live, rev, cid) 187 - const rec = zat.json.extractAt(TapRecord, allocator, parsed.value, .{"record"}) catch |err| { 188 - std.debug.print("tap: failed to extract record: {}\n", .{err}); 189 - return; 190 - }; 139 + // extract record envelope 140 + const rec = zat.json.extractAt(TapRecord, allocator, parsed.value, .{"record"}) catch return; 191 141 192 142 // validate DID 193 143 const did = zat.Did.parse(rec.did) orelse return; 194 144 195 - // build AT-URI string (no allocation - uses stack buffer) 196 - var uri_buf: [256]u8 = undefined; 197 - const uri = zat.AtUri.format(&uri_buf, did.raw, rec.collection, rec.rkey) orelse return; 145 + // build AT-URI string 146 + const uri = try std.fmt.allocPrint(allocator, "at://{s}/{s}/{s}", .{ did.raw, rec.collection, rec.rkey }); 147 + defer allocator.free(uri); 198 148 199 - if (rec.isCreate() or rec.isUpdate()) { 200 - const inner_record = zat.json.getObject(parsed.value, "record.record") orelse return; 149 + switch (rec.action) { 150 + .create, .update => { 151 + const record_obj = zat.json.getObject(parsed.value, "record.record") orelse return; 201 152 202 - if (isDocumentCollection(rec.collection)) { 203 - processDocument(allocator, uri, did.raw, rec.rkey, inner_record, rec.collection) catch |err| { 204 - std.debug.print("document processing error: {}\n", .{err}); 205 - }; 206 - } else if (isPublicationCollection(rec.collection)) { 207 - processPublication(allocator, uri, did.raw, rec.rkey, inner_record) catch |err| { 208 - std.debug.print("publication processing error: {}\n", .{err}); 209 - }; 210 - } 211 - } else if (rec.isDelete()) { 212 - if (isDocumentCollection(rec.collection)) { 213 - indexer.deleteDocument(uri); 214 - std.debug.print("deleted document: {s}\n", .{uri}); 215 - } else if (isPublicationCollection(rec.collection)) { 216 - indexer.deletePublication(uri); 217 - std.debug.print("deleted publication: {s}\n", .{uri}); 218 - } 153 + if (isDocumentCollection(rec.collection)) { 154 + processDocument(allocator, uri, did.raw, rec.rkey, record_obj, rec.collection) catch |err| { 155 + std.debug.print("document processing error: {}\n", .{err}); 156 + }; 157 + } else if (isPublicationCollection(rec.collection)) { 158 + processPublication(allocator, uri, did.raw, rec.rkey, record_obj) catch |err| { 159 + std.debug.print("publication processing error: {}\n", .{err}); 160 + }; 161 + } 162 + }, 163 + .delete => { 164 + if (isDocumentCollection(rec.collection)) { 165 + indexer.deleteDocument(uri); 166 + std.debug.print("deleted document: {s}\n", .{uri}); 167 + } else if (isPublicationCollection(rec.collection)) { 168 + indexer.deletePublication(uri); 169 + std.debug.print("deleted publication: {s}\n", .{uri}); 170 + } 171 + }, 219 172 } 220 173 } 221 174 ··· 239 192 doc.tags, 240 193 doc.platformName(), 241 194 doc.source_collection, 242 - doc.path, 243 195 ); 244 196 std.debug.print("indexed document: {s} [{s}] ({} chars, {} tags)\n", .{ uri, doc.platformName(), doc.content.len, doc.tags.len }); 245 197 } 246 198 247 - fn processPublication(_: Allocator, uri: []const u8, did: []const u8, rkey: []const u8, record: json.ObjectMap) !void { 199 + fn processPublication(allocator: Allocator, uri: []const u8, did: []const u8, rkey: []const u8, record: json.ObjectMap) !void { 248 200 const record_val: json.Value = .{ .object = record }; 201 + const pub_data = zat.json.extractAt(LeafletPublication, allocator, record_val, .{}) catch return; 249 202 250 - // extract required field 251 - const name = zat.json.getString(record_val, "name") orelse return; 252 - const description = zat.json.getString(record_val, "description"); 253 - 254 - // base_path: try leaflet's "base_path", then site.standard's "url" 255 - // url is full URL like "https://devlog.pckt.blog", we need just the host 256 - const base_path = zat.json.getString(record_val, "base_path") orelse 257 - stripUrlScheme(zat.json.getString(record_val, "url")); 258 - 259 - try indexer.insertPublication(uri, did, rkey, name, description, base_path); 260 - std.debug.print("indexed publication: {s} (base_path: {s})\n", .{ uri, base_path orelse "none" }); 261 - } 262 - 263 - fn stripUrlScheme(url: ?[]const u8) ?[]const u8 { 264 - const u = url orelse return null; 265 - if (mem.startsWith(u8, u, "https://")) return u["https://".len..]; 266 - if (mem.startsWith(u8, u, "http://")) return u["http://".len..]; 267 - return u; 203 + try indexer.insertPublication(uri, did, rkey, pub_data.name, pub_data.description, pub_data.base_path); 204 + std.debug.print("indexed publication: {s} (base_path: {s})\n", .{ uri, pub_data.base_path orelse "none" }); 268 205 }
-226
docs/leaflet-publishing-plan.md
··· 1 - # publishing to leaflet.pub 2 - 3 - ## goal 4 - 5 - publish markdown docs to both: 6 - 1. `site.standard.document` (for search/interop) - already working 7 - 2. `pub.leaflet.document` (for leaflet.pub display) - this plan 8 - 9 - ## the mapping 10 - 11 - ### block types 12 - 13 - | markdown | leaflet block | 14 - |----------|---------------| 15 - | `# heading` | `pub.leaflet.blocks.header` (level 1-6) | 16 - | paragraph | `pub.leaflet.blocks.text` | 17 - | ``` code ``` | `pub.leaflet.blocks.code` | 18 - | `> quote` | `pub.leaflet.blocks.blockquote` | 19 - | `---` | `pub.leaflet.blocks.horizontalRule` | 20 - | `- item` | `pub.leaflet.blocks.unorderedList` | 21 - | `![alt](src)` | `pub.leaflet.blocks.image` (requires blob upload) | 22 - | `[text](url)` (standalone) | `pub.leaflet.blocks.website` | 23 - 24 - ### inline formatting (facets) 25 - 26 - leaflet uses byte-indexed facets for inline formatting within text blocks: 27 - 28 - ```json 29 - { 30 - "$type": "pub.leaflet.blocks.text", 31 - "plaintext": "hello world with bold text", 32 - "facets": [{ 33 - "index": { "byteStart": 17, "byteEnd": 21 }, 34 - "features": [{ "$type": "pub.leaflet.richtext.facet#bold" }] 35 - }] 36 - } 37 - ``` 38 - 39 - | markdown | facet type | 40 - |----------|------------| 41 - | `**bold**` | `pub.leaflet.richtext.facet#bold` | 42 - | `*italic*` | `pub.leaflet.richtext.facet#italic` | 43 - | `` `code` `` | `pub.leaflet.richtext.facet#code` | 44 - | `[text](url)` | `pub.leaflet.richtext.facet#link` | 45 - | `~~strike~~` | `pub.leaflet.richtext.facet#strikethrough` | 46 - 47 - ## record structure 48 - 49 - ```json 50 - { 51 - "$type": "pub.leaflet.document", 52 - "author": "did:plc:...", 53 - "title": "document title", 54 - "description": "optional description", 55 - "publishedAt": "2026-01-06T00:00:00Z", 56 - "publication": "at://did:plc:.../pub.leaflet.publication/rkey", 57 - "tags": ["tag1", "tag2"], 58 - "pages": [{ 59 - "$type": "pub.leaflet.pages.linearDocument", 60 - "id": "page-uuid", 61 - "blocks": [ 62 - { 63 - "$type": "pub.leaflet.pages.linearDocument#block", 64 - "block": { /* one of the block types above */ } 65 - } 66 - ] 67 - }] 68 - } 69 - ``` 70 - 71 - ## implementation plan 72 - 73 - ### phase 1: markdown parser 74 - 75 - add a simple markdown block parser to zat or the publish script: 76 - 77 - ```zig 78 - const BlockType = enum { 79 - heading, 80 - paragraph, 81 - code, 82 - blockquote, 83 - horizontal_rule, 84 - unordered_list, 85 - image, 86 - }; 87 - 88 - const Block = struct { 89 - type: BlockType, 90 - content: []const u8, 91 - level: ?u8 = null, // for headings 92 - language: ?[]const u8 = null, // for code blocks 93 - alt: ?[]const u8 = null, // for images 94 - src: ?[]const u8 = null, // for images 95 - }; 96 - 97 - fn parseMarkdownBlocks(allocator: Allocator, markdown: []const u8) ![]Block 98 - ``` 99 - 100 - parsing approach: 101 - - split on blank lines to get blocks 102 - - identify block type by first characters: 103 - - `#` โ†’ heading (count `#` for level) 104 - - ``` โ†’ code block (capture until closing ```) 105 - - `>` โ†’ blockquote 106 - - `---` โ†’ horizontal rule 107 - - `-` or `*` at start โ†’ list item 108 - - `![` โ†’ image 109 - - else โ†’ paragraph 110 - 111 - ### phase 2: inline facet extraction 112 - 113 - for text blocks, extract inline formatting: 114 - 115 - ```zig 116 - const Facet = struct { 117 - byte_start: usize, 118 - byte_end: usize, 119 - feature: FacetFeature, 120 - }; 121 - 122 - const FacetFeature = union(enum) { 123 - bold, 124 - italic, 125 - code, 126 - link: []const u8, // url 127 - strikethrough, 128 - }; 129 - 130 - fn extractFacets(allocator: Allocator, text: []const u8) !struct { 131 - plaintext: []const u8, 132 - facets: []Facet, 133 - } 134 - ``` 135 - 136 - approach: 137 - - scan for `**`, `*`, `` ` ``, `[`, `~~` 138 - - track byte positions as we strip markers 139 - - build facet list with adjusted indices 140 - 141 - ### phase 3: image blob upload 142 - 143 - images need to be uploaded as blobs before referencing: 144 - 145 - ```zig 146 - fn uploadImageBlob(client: *XrpcClient, allocator: Allocator, image_path: []const u8) !BlobRef 147 - ``` 148 - 149 - for now, could skip images or require them to already be uploaded. 150 - 151 - ### phase 4: json serialization 152 - 153 - build the full `pub.leaflet.document` record: 154 - 155 - ```zig 156 - const LeafletDocument = struct { 157 - @"$type": []const u8 = "pub.leaflet.document", 158 - author: []const u8, 159 - title: []const u8, 160 - description: ?[]const u8 = null, 161 - publishedAt: []const u8, 162 - publication: ?[]const u8 = null, 163 - tags: ?[][]const u8 = null, 164 - pages: []Page, 165 - }; 166 - 167 - const Page = struct { 168 - @"$type": []const u8 = "pub.leaflet.pages.linearDocument", 169 - id: []const u8, 170 - blocks: []BlockWrapper, 171 - }; 172 - ``` 173 - 174 - ### phase 5: integrate into publish-docs.zig 175 - 176 - update the publish script to: 177 - 1. parse markdown into blocks 178 - 2. convert to leaflet structure 179 - 3. publish `pub.leaflet.document` alongside `site.standard.document` 180 - 181 - ```zig 182 - // existing: publish site.standard.document 183 - try putRecord(&client, allocator, session.did, "site.standard.document", tid.str(), doc_record); 184 - 185 - // new: also publish pub.leaflet.document 186 - const leaflet_record = try markdownToLeaflet(allocator, content, title, session.did, pub_uri); 187 - try putRecord(&client, allocator, session.did, "pub.leaflet.document", tid.str(), leaflet_record); 188 - ``` 189 - 190 - ## complexity estimate 191 - 192 - | component | complexity | notes | 193 - |-----------|------------|-------| 194 - | block parsing | medium | regex-free, line-by-line | 195 - | facet extraction | medium | byte index tracking is fiddly | 196 - | image upload | low | already have blob upload in xrpc | 197 - | json serialization | low | std.json handles it | 198 - | integration | low | add to existing publish flow | 199 - 200 - total: ~300-500 lines of zig 201 - 202 - ## open questions 203 - 204 - 1. **publication record**: do we need a `pub.leaflet.publication` too, or just documents? 205 - - leaflet allows standalone documents without publications 206 - - could skip publication for now 207 - 208 - 2. **image handling**: 209 - - option A: skip images initially (just text content) 210 - - option B: require images to be URLs (no blob upload) 211 - - option C: full blob upload support 212 - 213 - 3. **deduplication**: same rkey for both record types? 214 - - pro: easy to correlate 215 - - con: different collections, might not matter 216 - 217 - 4. **validation**: leaflet has a validate endpoint 218 - - could call `/api/unstable_validate` to check records before publish 219 - - probably skip for v1 220 - 221 - ## references 222 - 223 - - [pub.leaflet.document schema](/tmp/leaflet/lexicons/pub/leaflet/document.json) 224 - - [leaflet publishToPublication.ts](/tmp/leaflet/actions/publishToPublication.ts) - how leaflet creates records 225 - - [site.standard.document schema](/tmp/standard.site/app/data/lexicons/document.json) 226 - - paul's site: fetches records, doesn't publish them
-142
docs/tap.md
··· 1 - # tap (firehose sync) 2 - 3 - leaflet-search uses [TAP](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) from bluesky-social/indigo to receive real-time events from the ATProto firehose. 4 - 5 - ## what is tap? 6 - 7 - tap subscribes to the ATProto firehose, filters for specific collections (e.g., `pub.leaflet.document`), and broadcasts matching events to websocket clients. it also does initial crawling/backfilling of existing records. 8 - 9 - key behavior: **TAP backfills historical data when repos are added**. when a repo is added to tracking: 10 - 1. TAP fetches the full repo from the account's PDS using `com.atproto.sync.getRepo` 11 - 2. live firehose events during backfill are buffered in memory 12 - 3. historical events (marked `live: false`) are delivered first 13 - 4. after historical events complete, buffered live events are released 14 - 5. subsequent firehose events arrive immediately marked as `live: true` 15 - 16 - TAP enforces strict per-repo ordering - live events are synchronization barriers that require all prior events to complete first. 17 - 18 - ## message format 19 - 20 - TAP sends JSON messages over websocket. record events look like: 21 - 22 - ```json 23 - { 24 - "type": "record", 25 - "record": { 26 - "live": true, 27 - "did": "did:plc:abc123...", 28 - "rev": "3mbspmpaidl2a", 29 - "collection": "pub.leaflet.document", 30 - "rkey": "3lzyrj6q6gs27", 31 - "action": "create", 32 - "record": { ... }, 33 - "cid": "bafyrei..." 34 - } 35 - } 36 - ``` 37 - 38 - ### field types (important!) 39 - 40 - | field | type | values | notes | 41 - |-------|------|--------|-------| 42 - | type | string | "record", "identity", "account" | message type | 43 - | action | **string** | "create", "update", "delete" | NOT an enum! | 44 - | live | bool | true/false | true = firehose, false = resync | 45 - | collection | string | e.g., "pub.leaflet.document" | lexicon collection | 46 - 47 - ## gotchas 48 - 49 - 1. **action is a string, not an enum** - TAP sends `"action": "create"` as a JSON string. if your parser expects an enum type, extraction will silently fail. use string comparison. 50 - 51 - 2. **collection filters apply to output** - `TAP_COLLECTION_FILTERS` controls which records TAP sends to clients. records from other collections are fetched but not forwarded. 52 - 53 - 3. **signal collection vs collection filters** - `TAP_SIGNAL_COLLECTION` controls auto-discovery of repos (which repos to track), while `TAP_COLLECTION_FILTERS` controls which records from those repos to output. a repo must either be auto-discovered via signal collection OR manually added via `/repos/add`. 54 - 55 - 4. **silent extraction failures** - if using zat's `extractAt`, enable debug logging to see why parsing fails: 56 - ```zig 57 - pub const std_options = .{ 58 - .log_scope_levels = &.{.{ .scope = .zat, .level = .debug }}, 59 - }; 60 - ``` 61 - this will show messages like: 62 - ``` 63 - debug(zat): extractAt: parse failed for Op at path { "op" }: InvalidEnumTag 64 - ``` 65 - 66 - ## debugging 67 - 68 - ### check tap connection 69 - ```bash 70 - fly logs -a leaflet-search-tap --no-tail | tail -30 71 - ``` 72 - 73 - look for: 74 - - `"connected to firehose"` - successfully connected to bsky relay 75 - - `"websocket connected"` - backend connected to TAP 76 - - `"dialing failed"` / `"i/o timeout"` - network issues 77 - 78 - ### check backend is receiving 79 - ```bash 80 - fly logs -a leaflet-search-backend --no-tail | grep -E "(tap|indexed)" 81 - ``` 82 - 83 - look for: 84 - - `tap connected!` - connected to TAP 85 - - `tap: msg_type=record` - receiving messages 86 - - `indexed document:` - successfully processing 87 - 88 - ### common issues 89 - 90 - | symptom | cause | fix | 91 - |---------|-------|-----| 92 - | `websocket handshake failed: error.Timeout` | TAP not running or network issue | restart TAP, check regions match | 93 - | `dialing failed: lookup ... i/o timeout` | DNS issues reaching bsky relay | restart TAP, transient network issue | 94 - | messages received but not indexed | extraction failing (type mismatch) | enable zat debug logging, check field types | 95 - | repo shows `records: 0` after adding | resync failed or collection not in filters | check TAP logs for resync errors, verify `TAP_COLLECTION_FILTERS` | 96 - | new platform records not appearing | platform's collection not in `TAP_COLLECTION_FILTERS` | add collection to filters, restart TAP | 97 - 98 - ## TAP API endpoints 99 - 100 - TAP exposes HTTP endpoints for monitoring and control: 101 - 102 - | endpoint | description | 103 - |----------|-------------| 104 - | `/health` | health check | 105 - | `/stats/repo-count` | number of tracked repos | 106 - | `/stats/record-count` | total records processed | 107 - | `/stats/outbox-buffer` | events waiting to be sent | 108 - | `/stats/resync-buffer` | DIDs waiting to be resynced | 109 - | `/stats/cursors` | firehose cursor position | 110 - | `/info/:did` | repo status: `{"did":"...","state":"active","records":N}` | 111 - | `/repos/add` | POST with `{"dids":["did:plc:..."]}` to add repos | 112 - | `/repos/remove` | POST with `{"dids":["did:plc:..."]}` to remove repos | 113 - 114 - example: check repo status 115 - ```bash 116 - fly ssh console -a leaflet-search-tap -C "curl -s localhost:2480/info/did:plc:abc123" 117 - ``` 118 - 119 - example: manually add a repo for backfill 120 - ```bash 121 - fly ssh console -a leaflet-search-tap -C 'curl -X POST -H "Content-Type: application/json" -d "{\"dids\":[\"did:plc:abc123\"]}" localhost:2480/repos/add' 122 - ``` 123 - 124 - ## fly.io deployment 125 - 126 - both TAP and backend should be in the same region for internal networking: 127 - 128 - ```bash 129 - # check current regions 130 - fly status -a leaflet-search-tap 131 - fly status -a leaflet-search-backend 132 - 133 - # restart TAP if needed 134 - fly machine restart -a leaflet-search-tap <machine-id> 135 - ``` 136 - 137 - note: changing `primary_region` in fly.toml only affects new machines. to move existing machines, clone to new region and destroy old one. 138 - 139 - ## references 140 - 141 - - [TAP source (bluesky-social/indigo)](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) 142 - - [ATProto firehose docs](https://atproto.com/specs/sync#firehose)
+5 -5
mcp/README.md
··· 1 - # pub search MCP 1 + # leaflet-mcp 2 2 3 - MCP server for [pub search](https://pub-search.waow.tech) - search ATProto publishing platforms (Leaflet, pckt, standard.site). 3 + MCP server for [Leaflet](https://leaflet.pub) - search decentralized publications on ATProto. 4 4 5 5 ## usage 6 6 7 7 ### hosted (recommended) 8 8 9 9 ```bash 10 - claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}' 10 + claude mcp add-json leaflet '{"type": "http", "url": "https://leaflet-search-by-zzstoatzz.fastmcp.app/mcp"}' 11 11 ``` 12 12 13 13 ### local ··· 15 15 run the MCP server locally with `uvx`: 16 16 17 17 ```bash 18 - uvx --from git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp pub-search 18 + uvx --from git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp leaflet-mcp 19 19 ``` 20 20 21 21 to add it to claude code as a local stdio server: 22 22 23 23 ```bash 24 - claude mcp add pub-search -- uvx --from 'git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp' pub-search 24 + claude mcp add leaflet -- uvx --from 'git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp' leaflet-mcp 25 25 ``` 26 26 27 27 ## workflow
+5 -5
mcp/pyproject.toml
··· 1 1 [project] 2 - name = "pub-search" 2 + name = "leaflet-mcp" 3 3 dynamic = ["version"] 4 - description = "MCP server for searching ATProto publishing platforms (Leaflet, pckt, and more)" 4 + description = "MCP server for Leaflet - search decentralized publications on ATProto" 5 5 readme = "README.md" 6 6 authors = [{ name = "zzstoatzz", email = "thrast36@gmail.com" }] 7 7 requires-python = ">=3.10" 8 8 license = "MIT" 9 9 10 - keywords = ["pub-search", "mcp", "atproto", "publications", "search", "fastmcp", "leaflet", "pckt"] 10 + keywords = ["leaflet", "mcp", "atproto", "publications", "search", "fastmcp"] 11 11 12 12 classifiers = [ 13 13 "Development Status :: 3 - Alpha", ··· 27 27 ] 28 28 29 29 [project.scripts] 30 - pub-search = "pub_search.server:main" 30 + leaflet-mcp = "leaflet_mcp.server:main" 31 31 32 32 [build-system] 33 33 requires = ["hatchling", "uv-dynamic-versioning>=0.7.0"] 34 34 build-backend = "hatchling.build" 35 35 36 36 [tool.hatch.build.targets.wheel] 37 - packages = ["src/pub_search"] 37 + packages = ["src/leaflet_mcp"] 38 38 39 39 [tool.hatch.version] 40 40 source = "uv-dynamic-versioning"
+5
mcp/src/leaflet_mcp/__init__.py
··· 1 + """Leaflet MCP server - search decentralized publications on ATProto.""" 2 + 3 + from leaflet_mcp.server import main, mcp 4 + 5 + __all__ = ["main", "mcp"]
+58
mcp/src/leaflet_mcp/_types.py
··· 1 + """Type definitions for Leaflet MCP responses.""" 2 + 3 + from typing import Literal 4 + 5 + from pydantic import BaseModel, computed_field 6 + 7 + 8 + class SearchResult(BaseModel): 9 + """A search result from the Leaflet API.""" 10 + 11 + type: Literal["article", "looseleaf", "publication"] 12 + uri: str 13 + did: str 14 + title: str 15 + snippet: str 16 + createdAt: str = "" 17 + rkey: str 18 + basePath: str = "" 19 + 20 + @computed_field 21 + @property 22 + def url(self) -> str: 23 + """web URL for this document.""" 24 + if self.basePath: 25 + return f"https://{self.basePath}/{self.rkey}" 26 + return "" 27 + 28 + 29 + class Tag(BaseModel): 30 + """A tag with document count.""" 31 + 32 + tag: str 33 + count: int 34 + 35 + 36 + class PopularSearch(BaseModel): 37 + """A popular search query with count.""" 38 + 39 + query: str 40 + count: int 41 + 42 + 43 + class Stats(BaseModel): 44 + """Leaflet index statistics.""" 45 + 46 + documents: int 47 + publications: int 48 + 49 + 50 + class Document(BaseModel): 51 + """Full document content from ATProto.""" 52 + 53 + uri: str 54 + title: str 55 + content: str 56 + createdAt: str = "" 57 + tags: list[str] = [] 58 + publicationUri: str = ""
+21
mcp/src/leaflet_mcp/client.py
··· 1 + """HTTP client for Leaflet search API.""" 2 + 3 + import os 4 + from contextlib import asynccontextmanager 5 + from typing import AsyncIterator 6 + 7 + import httpx 8 + 9 + # configurable via env var, defaults to production 10 + LEAFLET_API_URL = os.getenv("LEAFLET_API_URL", "https://leaflet-search-backend.fly.dev") 11 + 12 + 13 + @asynccontextmanager 14 + async def get_http_client() -> AsyncIterator[httpx.AsyncClient]: 15 + """Get an async HTTP client for Leaflet API requests.""" 16 + async with httpx.AsyncClient( 17 + base_url=LEAFLET_API_URL, 18 + timeout=30.0, 19 + headers={"Accept": "application/json"}, 20 + ) as client: 21 + yield client
+289
mcp/src/leaflet_mcp/server.py
··· 1 + """Leaflet MCP server implementation using fastmcp.""" 2 + 3 + from __future__ import annotations 4 + 5 + from typing import Any 6 + 7 + from fastmcp import FastMCP 8 + 9 + from leaflet_mcp._types import Document, PopularSearch, SearchResult, Stats, Tag 10 + from leaflet_mcp.client import get_http_client 11 + 12 + mcp = FastMCP("leaflet") 13 + 14 + 15 + # ----------------------------------------------------------------------------- 16 + # prompts 17 + # ----------------------------------------------------------------------------- 18 + 19 + 20 + @mcp.prompt("usage_guide") 21 + def usage_guide() -> str: 22 + """instructions for using leaflet MCP tools.""" 23 + return """\ 24 + # Leaflet MCP server usage guide 25 + 26 + Leaflet is a decentralized publishing platform on ATProto (the protocol behind Bluesky). 27 + This MCP server provides search and discovery tools for Leaflet publications. 28 + 29 + ## core tools 30 + 31 + - `search(query, tag)` - search documents and publications by text or tag 32 + - `get_document(uri)` - get the full content of a document by its AT-URI 33 + - `find_similar(uri)` - find documents similar to a given document 34 + - `get_tags()` - list all available tags with document counts 35 + - `get_stats()` - get index statistics (document/publication counts) 36 + - `get_popular()` - see popular search queries 37 + 38 + ## workflow for research 39 + 40 + 1. use `search("your topic")` to find relevant documents 41 + 2. use `get_document(uri)` to retrieve full content of interesting results 42 + 3. use `find_similar(uri)` to discover related content 43 + 44 + ## result types 45 + 46 + search returns three types of results: 47 + - **publication**: a collection of articles (like a blog or magazine) 48 + - **article**: a document that belongs to a publication 49 + - **looseleaf**: a standalone document not part of a publication 50 + 51 + ## AT-URIs 52 + 53 + documents are identified by AT-URIs like: 54 + `at://did:plc:abc123/pub.leaflet.document/xyz789` 55 + 56 + you can also browse documents on the web at leaflet.pub 57 + """ 58 + 59 + 60 + @mcp.prompt("search_tips") 61 + def search_tips() -> str: 62 + """tips for effective searching.""" 63 + return """\ 64 + # Leaflet search tips 65 + 66 + ## text search 67 + - searches both document titles and content 68 + - uses FTS5 full-text search with prefix matching 69 + - the last word gets prefix matching: "cat dog" matches "cat dogs" 70 + 71 + ## tag filtering 72 + - combine text search with tag filter: `search("python", tag="programming")` 73 + - use `get_tags()` to discover available tags 74 + - tags are only applied to documents, not publications 75 + 76 + ## finding related content 77 + - after finding an interesting document, use `find_similar(uri)` 78 + - similarity is based on semantic embeddings (voyage-3-lite) 79 + - great for exploring related topics 80 + 81 + ## browsing by popularity 82 + - use `get_popular()` to see what others are searching for 83 + - can inspire new research directions 84 + """ 85 + 86 + 87 + # ----------------------------------------------------------------------------- 88 + # tools 89 + # ----------------------------------------------------------------------------- 90 + 91 + 92 + @mcp.tool 93 + async def search( 94 + query: str = "", 95 + tag: str | None = None, 96 + limit: int = 5, 97 + ) -> list[SearchResult]: 98 + """search leaflet documents and publications. 99 + 100 + searches the full text of documents (titles and content) and publications. 101 + results include a snippet showing where the match was found. 102 + 103 + args: 104 + query: search query (searches titles and content) 105 + tag: optional tag to filter by (only applies to documents) 106 + limit: max results to return (default 5, max 40) 107 + 108 + returns: 109 + list of search results with uri, title, snippet, and metadata 110 + """ 111 + if not query and not tag: 112 + return [] 113 + 114 + params: dict[str, Any] = {} 115 + if query: 116 + params["q"] = query 117 + if tag: 118 + params["tag"] = tag 119 + 120 + async with get_http_client() as client: 121 + response = await client.get("/search", params=params) 122 + response.raise_for_status() 123 + results = response.json() 124 + 125 + # apply client-side limit since API returns up to 40 126 + return [SearchResult(**r) for r in results[:limit]] 127 + 128 + 129 + @mcp.tool 130 + async def get_document(uri: str) -> Document: 131 + """get the full content of a document by its AT-URI. 132 + 133 + fetches the complete document from ATProto, including full text content. 134 + use this after finding documents via search to get the complete text. 135 + 136 + args: 137 + uri: the AT-URI of the document (e.g., at://did:plc:.../pub.leaflet.document/...) 138 + 139 + returns: 140 + document with full content, title, tags, and metadata 141 + """ 142 + # use pdsx to fetch the actual record from ATProto 143 + try: 144 + from pdsx._internal.operations import get_record 145 + from pdsx.mcp.client import get_atproto_client 146 + except ImportError as e: 147 + raise RuntimeError( 148 + "pdsx is required for fetching full documents. install with: uv add pdsx" 149 + ) from e 150 + 151 + # extract repo from URI for PDS discovery 152 + # at://did:plc:xxx/collection/rkey 153 + parts = uri.replace("at://", "").split("/") 154 + if len(parts) < 3: 155 + raise ValueError(f"invalid AT-URI: {uri}") 156 + 157 + repo = parts[0] 158 + 159 + async with get_atproto_client(target_repo=repo) as client: 160 + record = await get_record(client, uri) 161 + 162 + value = record.value 163 + # DotDict doesn't have a working .get(), convert to dict first 164 + if hasattr(value, "to_dict") and callable(value.to_dict): 165 + value = value.to_dict() 166 + elif not isinstance(value, dict): 167 + value = dict(value) 168 + 169 + # extract content from leaflet's block structure 170 + # pages[].blocks[].block.plaintext 171 + content_parts = [] 172 + for page in value.get("pages", []): 173 + for block_wrapper in page.get("blocks", []): 174 + block = block_wrapper.get("block", {}) 175 + plaintext = block.get("plaintext", "") 176 + if plaintext: 177 + content_parts.append(plaintext) 178 + 179 + content = "\n\n".join(content_parts) 180 + 181 + return Document( 182 + uri=record.uri, 183 + title=value.get("title", ""), 184 + content=content, 185 + createdAt=value.get("publishedAt", "") or value.get("createdAt", ""), 186 + tags=value.get("tags", []), 187 + publicationUri=value.get("publication", ""), 188 + ) 189 + 190 + 191 + @mcp.tool 192 + async def find_similar(uri: str, limit: int = 5) -> list[SearchResult]: 193 + """find documents similar to a given document. 194 + 195 + uses vector similarity (voyage-3-lite embeddings) to find semantically 196 + related documents. great for discovering related content after finding 197 + an interesting document. 198 + 199 + args: 200 + uri: the AT-URI of the document to find similar content for 201 + limit: max similar documents to return (default 5) 202 + 203 + returns: 204 + list of similar documents with uri, title, and metadata 205 + """ 206 + async with get_http_client() as client: 207 + response = await client.get("/similar", params={"uri": uri}) 208 + response.raise_for_status() 209 + results = response.json() 210 + 211 + return [SearchResult(**r) for r in results[:limit]] 212 + 213 + 214 + @mcp.tool 215 + async def get_tags() -> list[Tag]: 216 + """list all available tags with document counts. 217 + 218 + returns tags sorted by document count (most popular first). 219 + useful for discovering topics and filtering searches. 220 + 221 + returns: 222 + list of tags with their document counts 223 + """ 224 + async with get_http_client() as client: 225 + response = await client.get("/tags") 226 + response.raise_for_status() 227 + results = response.json() 228 + 229 + return [Tag(**t) for t in results] 230 + 231 + 232 + @mcp.tool 233 + async def get_stats() -> Stats: 234 + """get leaflet index statistics. 235 + 236 + returns: 237 + document and publication counts 238 + """ 239 + async with get_http_client() as client: 240 + response = await client.get("/stats") 241 + response.raise_for_status() 242 + return Stats(**response.json()) 243 + 244 + 245 + @mcp.tool 246 + async def get_popular(limit: int = 5) -> list[PopularSearch]: 247 + """get popular search queries. 248 + 249 + see what others are searching for on leaflet. 250 + can inspire new research directions. 251 + 252 + args: 253 + limit: max queries to return (default 5) 254 + 255 + returns: 256 + list of popular queries with search counts 257 + """ 258 + async with get_http_client() as client: 259 + response = await client.get("/popular") 260 + response.raise_for_status() 261 + results = response.json() 262 + 263 + return [PopularSearch(**p) for p in results[:limit]] 264 + 265 + 266 + # ----------------------------------------------------------------------------- 267 + # resources 268 + # ----------------------------------------------------------------------------- 269 + 270 + 271 + @mcp.resource("leaflet://stats") 272 + async def stats_resource() -> str: 273 + """current leaflet index statistics.""" 274 + stats = await get_stats() 275 + return f"Leaflet index: {stats.documents} documents, {stats.publications} publications" 276 + 277 + 278 + # ----------------------------------------------------------------------------- 279 + # entrypoint 280 + # ----------------------------------------------------------------------------- 281 + 282 + 283 + def main() -> None: 284 + """run the MCP server.""" 285 + mcp.run() 286 + 287 + 288 + if __name__ == "__main__": 289 + main()
-5
mcp/src/pub_search/__init__.py
··· 1 - """MCP server for searching ATProto publishing platforms.""" 2 - 3 - from pub_search.server import main, mcp 4 - 5 - __all__ = ["main", "mcp"]
-58
mcp/src/pub_search/_types.py
··· 1 - """Type definitions for Leaflet MCP responses.""" 2 - 3 - from typing import Literal 4 - 5 - from pydantic import BaseModel, computed_field 6 - 7 - 8 - class SearchResult(BaseModel): 9 - """A search result from the Leaflet API.""" 10 - 11 - type: Literal["article", "looseleaf", "publication"] 12 - uri: str 13 - did: str 14 - title: str 15 - snippet: str 16 - createdAt: str = "" 17 - rkey: str 18 - basePath: str = "" 19 - 20 - @computed_field 21 - @property 22 - def url(self) -> str: 23 - """web URL for this document.""" 24 - if self.basePath: 25 - return f"https://{self.basePath}/{self.rkey}" 26 - return "" 27 - 28 - 29 - class Tag(BaseModel): 30 - """A tag with document count.""" 31 - 32 - tag: str 33 - count: int 34 - 35 - 36 - class PopularSearch(BaseModel): 37 - """A popular search query with count.""" 38 - 39 - query: str 40 - count: int 41 - 42 - 43 - class Stats(BaseModel): 44 - """Leaflet index statistics.""" 45 - 46 - documents: int 47 - publications: int 48 - 49 - 50 - class Document(BaseModel): 51 - """Full document content from ATProto.""" 52 - 53 - uri: str 54 - title: str 55 - content: str 56 - createdAt: str = "" 57 - tags: list[str] = [] 58 - publicationUri: str = ""
-21
mcp/src/pub_search/client.py
··· 1 - """HTTP client for leaflet-search API.""" 2 - 3 - import os 4 - from contextlib import asynccontextmanager 5 - from typing import AsyncIterator 6 - 7 - import httpx 8 - 9 - # configurable via env var, defaults to production 10 - API_URL = os.getenv("LEAFLET_SEARCH_API_URL", "https://leaflet-search-backend.fly.dev") 11 - 12 - 13 - @asynccontextmanager 14 - async def get_http_client() -> AsyncIterator[httpx.AsyncClient]: 15 - """Get an async HTTP client for API requests.""" 16 - async with httpx.AsyncClient( 17 - base_url=API_URL, 18 - timeout=30.0, 19 - headers={"Accept": "application/json"}, 20 - ) as client: 21 - yield client
-288
mcp/src/pub_search/server.py
··· 1 - """MCP server for searching ATProto publishing platforms.""" 2 - 3 - from __future__ import annotations 4 - 5 - from typing import Any 6 - 7 - from fastmcp import FastMCP 8 - 9 - from pub_search._types import Document, PopularSearch, SearchResult, Stats, Tag 10 - from pub_search.client import get_http_client 11 - 12 - mcp = FastMCP("pub-search") 13 - 14 - 15 - # ----------------------------------------------------------------------------- 16 - # prompts 17 - # ----------------------------------------------------------------------------- 18 - 19 - 20 - @mcp.prompt("usage_guide") 21 - def usage_guide() -> str: 22 - """instructions for using pub-search MCP tools.""" 23 - return """\ 24 - # pub-search MCP usage guide 25 - 26 - search documents across ATProto publishing platforms including Leaflet, pckt, and others. 27 - 28 - ## core tools 29 - 30 - - `search(query, tag)` - search documents and publications by text or tag 31 - - `get_document(uri)` - get the full content of a document by its AT-URI 32 - - `find_similar(uri)` - find documents similar to a given document 33 - - `get_tags()` - list all available tags with document counts 34 - - `get_stats()` - get index statistics (document/publication counts) 35 - - `get_popular()` - see popular search queries 36 - 37 - ## workflow for research 38 - 39 - 1. use `search("your topic")` to find relevant documents 40 - 2. use `get_document(uri)` to retrieve full content of interesting results 41 - 3. use `find_similar(uri)` to discover related content 42 - 43 - ## result types 44 - 45 - search returns three types of results: 46 - - **publication**: a collection of articles (like a blog or magazine) 47 - - **article**: a document that belongs to a publication 48 - - **looseleaf**: a standalone document not part of a publication 49 - 50 - ## AT-URIs 51 - 52 - documents are identified by AT-URIs like: 53 - `at://did:plc:abc123/pub.leaflet.document/xyz789` 54 - 55 - browse the web UI at pub-search.waow.tech 56 - """ 57 - 58 - 59 - @mcp.prompt("search_tips") 60 - def search_tips() -> str: 61 - """tips for effective searching.""" 62 - return """\ 63 - # search tips 64 - 65 - ## text search 66 - - searches both document titles and content 67 - - uses FTS5 full-text search with prefix matching 68 - - the last word gets prefix matching: "cat dog" matches "cat dogs" 69 - 70 - ## tag filtering 71 - - combine text search with tag filter: `search("python", tag="programming")` 72 - - use `get_tags()` to discover available tags 73 - - tags are only applied to documents, not publications 74 - 75 - ## finding related content 76 - - after finding an interesting document, use `find_similar(uri)` 77 - - similarity is based on semantic embeddings (voyage-3-lite) 78 - - great for exploring related topics 79 - 80 - ## browsing by popularity 81 - - use `get_popular()` to see what others are searching for 82 - - can inspire new research directions 83 - """ 84 - 85 - 86 - # ----------------------------------------------------------------------------- 87 - # tools 88 - # ----------------------------------------------------------------------------- 89 - 90 - 91 - @mcp.tool 92 - async def search( 93 - query: str = "", 94 - tag: str | None = None, 95 - limit: int = 5, 96 - ) -> list[SearchResult]: 97 - """search documents and publications. 98 - 99 - searches the full text of documents (titles and content) and publications. 100 - results include a snippet showing where the match was found. 101 - 102 - args: 103 - query: search query (searches titles and content) 104 - tag: optional tag to filter by (only applies to documents) 105 - limit: max results to return (default 5, max 40) 106 - 107 - returns: 108 - list of search results with uri, title, snippet, and metadata 109 - """ 110 - if not query and not tag: 111 - return [] 112 - 113 - params: dict[str, Any] = {} 114 - if query: 115 - params["q"] = query 116 - if tag: 117 - params["tag"] = tag 118 - 119 - async with get_http_client() as client: 120 - response = await client.get("/search", params=params) 121 - response.raise_for_status() 122 - results = response.json() 123 - 124 - # apply client-side limit since API returns up to 40 125 - return [SearchResult(**r) for r in results[:limit]] 126 - 127 - 128 - @mcp.tool 129 - async def get_document(uri: str) -> Document: 130 - """get the full content of a document by its AT-URI. 131 - 132 - fetches the complete document from ATProto, including full text content. 133 - use this after finding documents via search to get the complete text. 134 - 135 - args: 136 - uri: the AT-URI of the document (e.g., at://did:plc:.../pub.leaflet.document/...) 137 - 138 - returns: 139 - document with full content, title, tags, and metadata 140 - """ 141 - # use pdsx to fetch the actual record from ATProto 142 - try: 143 - from pdsx._internal.operations import get_record 144 - from pdsx.mcp.client import get_atproto_client 145 - except ImportError as e: 146 - raise RuntimeError( 147 - "pdsx is required for fetching full documents. install with: uv add pdsx" 148 - ) from e 149 - 150 - # extract repo from URI for PDS discovery 151 - # at://did:plc:xxx/collection/rkey 152 - parts = uri.replace("at://", "").split("/") 153 - if len(parts) < 3: 154 - raise ValueError(f"invalid AT-URI: {uri}") 155 - 156 - repo = parts[0] 157 - 158 - async with get_atproto_client(target_repo=repo) as client: 159 - record = await get_record(client, uri) 160 - 161 - value = record.value 162 - # DotDict doesn't have a working .get(), convert to dict first 163 - if hasattr(value, "to_dict") and callable(value.to_dict): 164 - value = value.to_dict() 165 - elif not isinstance(value, dict): 166 - value = dict(value) 167 - 168 - # extract content from leaflet's block structure 169 - # pages[].blocks[].block.plaintext 170 - content_parts = [] 171 - for page in value.get("pages", []): 172 - for block_wrapper in page.get("blocks", []): 173 - block = block_wrapper.get("block", {}) 174 - plaintext = block.get("plaintext", "") 175 - if plaintext: 176 - content_parts.append(plaintext) 177 - 178 - content = "\n\n".join(content_parts) 179 - 180 - return Document( 181 - uri=record.uri, 182 - title=value.get("title", ""), 183 - content=content, 184 - createdAt=value.get("publishedAt", "") or value.get("createdAt", ""), 185 - tags=value.get("tags", []), 186 - publicationUri=value.get("publication", ""), 187 - ) 188 - 189 - 190 - @mcp.tool 191 - async def find_similar(uri: str, limit: int = 5) -> list[SearchResult]: 192 - """find documents similar to a given document. 193 - 194 - uses vector similarity (voyage-3-lite embeddings) to find semantically 195 - related documents. great for discovering related content after finding 196 - an interesting document. 197 - 198 - args: 199 - uri: the AT-URI of the document to find similar content for 200 - limit: max similar documents to return (default 5) 201 - 202 - returns: 203 - list of similar documents with uri, title, and metadata 204 - """ 205 - async with get_http_client() as client: 206 - response = await client.get("/similar", params={"uri": uri}) 207 - response.raise_for_status() 208 - results = response.json() 209 - 210 - return [SearchResult(**r) for r in results[:limit]] 211 - 212 - 213 - @mcp.tool 214 - async def get_tags() -> list[Tag]: 215 - """list all available tags with document counts. 216 - 217 - returns tags sorted by document count (most popular first). 218 - useful for discovering topics and filtering searches. 219 - 220 - returns: 221 - list of tags with their document counts 222 - """ 223 - async with get_http_client() as client: 224 - response = await client.get("/tags") 225 - response.raise_for_status() 226 - results = response.json() 227 - 228 - return [Tag(**t) for t in results] 229 - 230 - 231 - @mcp.tool 232 - async def get_stats() -> Stats: 233 - """get index statistics. 234 - 235 - returns: 236 - document and publication counts 237 - """ 238 - async with get_http_client() as client: 239 - response = await client.get("/stats") 240 - response.raise_for_status() 241 - return Stats(**response.json()) 242 - 243 - 244 - @mcp.tool 245 - async def get_popular(limit: int = 5) -> list[PopularSearch]: 246 - """get popular search queries. 247 - 248 - see what others are searching for. 249 - can inspire new research directions. 250 - 251 - args: 252 - limit: max queries to return (default 5) 253 - 254 - returns: 255 - list of popular queries with search counts 256 - """ 257 - async with get_http_client() as client: 258 - response = await client.get("/popular") 259 - response.raise_for_status() 260 - results = response.json() 261 - 262 - return [PopularSearch(**p) for p in results[:limit]] 263 - 264 - 265 - # ----------------------------------------------------------------------------- 266 - # resources 267 - # ----------------------------------------------------------------------------- 268 - 269 - 270 - @mcp.resource("pub-search://stats") 271 - async def stats_resource() -> str: 272 - """current index statistics.""" 273 - stats = await get_stats() 274 - return f"pub search index: {stats.documents} documents, {stats.publications} publications" 275 - 276 - 277 - # ----------------------------------------------------------------------------- 278 - # entrypoint 279 - # ----------------------------------------------------------------------------- 280 - 281 - 282 - def main() -> None: 283 - """run the MCP server.""" 284 - mcp.run() 285 - 286 - 287 - if __name__ == "__main__": 288 - main()
+8 -8
mcp/tests/test_mcp.py
··· 1 - """tests for pub-search MCP server.""" 1 + """tests for leaflet MCP server.""" 2 2 3 3 import pytest 4 4 from mcp.types import TextContent ··· 6 6 from fastmcp.client import Client 7 7 from fastmcp.client.transports import FastMCPTransport 8 8 9 - from pub_search._types import Document, PopularSearch, SearchResult, Stats, Tag 10 - from pub_search.server import mcp 9 + from leaflet_mcp._types import Document, PopularSearch, SearchResult, Stats, Tag 10 + from leaflet_mcp.server import mcp 11 11 12 12 13 13 class TestTypes: ··· 93 93 94 94 def test_mcp_server_imports(self): 95 95 """mcp server can be imported without errors.""" 96 - from pub_search import mcp 96 + from leaflet_mcp import mcp 97 97 98 - assert mcp.name == "pub-search" 98 + assert mcp.name == "leaflet" 99 99 100 100 def test_exports(self): 101 101 """all expected exports are available.""" 102 - from pub_search import main, mcp 102 + from leaflet_mcp import main, mcp 103 103 104 104 assert mcp is not None 105 105 assert main is not None ··· 138 138 resources = await client.list_resources() 139 139 140 140 resource_uris = {str(r.uri) for r in resources} 141 - assert "pub-search://stats" in resource_uris 141 + assert "leaflet://stats" in resource_uris 142 142 143 143 async def test_usage_guide_prompt_content(self, client): 144 144 """usage_guide prompt returns helpful content.""" ··· 148 148 assert len(result.messages) > 0 149 149 content = result.messages[0].content 150 150 assert isinstance(content, TextContent) 151 - assert "pub-search" in content.text 151 + assert "Leaflet" in content.text 152 152 assert "search" in content.text 153 153 154 154 async def test_search_tips_prompt_content(self, client):
+32 -32
mcp/uv.lock
··· 691 691 ] 692 692 693 693 [[package]] 694 + name = "leaflet-mcp" 695 + source = { editable = "." } 696 + dependencies = [ 697 + { name = "fastmcp" }, 698 + { name = "httpx" }, 699 + { name = "pdsx" }, 700 + ] 701 + 702 + [package.dev-dependencies] 703 + dev = [ 704 + { name = "pytest" }, 705 + { name = "pytest-asyncio" }, 706 + { name = "pytest-sugar" }, 707 + { name = "ruff" }, 708 + ] 709 + 710 + [package.metadata] 711 + requires-dist = [ 712 + { name = "fastmcp", specifier = ">=2.0" }, 713 + { name = "httpx", specifier = ">=0.28" }, 714 + { name = "pdsx", git = "https://github.com/zzstoatzz/pdsx.git" }, 715 + ] 716 + 717 + [package.metadata.requires-dev] 718 + dev = [ 719 + { name = "pytest", specifier = ">=8.3.0" }, 720 + { name = "pytest-asyncio", specifier = ">=0.25.0" }, 721 + { name = "pytest-sugar" }, 722 + { name = "ruff", specifier = ">=0.12.0" }, 723 + ] 724 + 725 + [[package]] 694 726 name = "libipld" 695 727 version = "3.3.2" 696 728 source = { registry = "https://pypi.org/simple" } ··· 1043 1075 sdist = { url = "https://files.pythonhosted.org/packages/23/53/3edb5d68ecf6b38fcbcc1ad28391117d2a322d9a1a3eff04bfdb184d8c3b/prometheus_client-0.23.1.tar.gz", hash = "sha256:6ae8f9081eaaaf153a2e959d2e6c4f4fb57b12ef76c8c7980202f1e57b48b2ce", size = 80481, upload-time = "2025-09-18T20:47:25.043Z" } 1044 1076 wheels = [ 1045 1077 { url = "https://files.pythonhosted.org/packages/b8/db/14bafcb4af2139e046d03fd00dea7873e48eafe18b7d2797e73d6681f210/prometheus_client-0.23.1-py3-none-any.whl", hash = "sha256:dd1913e6e76b59cfe44e7a4b83e01afc9873c1bdfd2ed8739f1e76aeca115f99", size = 61145, upload-time = "2025-09-18T20:47:23.875Z" }, 1046 - ] 1047 - 1048 - [[package]] 1049 - name = "pub-search" 1050 - source = { editable = "." } 1051 - dependencies = [ 1052 - { name = "fastmcp" }, 1053 - { name = "httpx" }, 1054 - { name = "pdsx" }, 1055 - ] 1056 - 1057 - [package.dev-dependencies] 1058 - dev = [ 1059 - { name = "pytest" }, 1060 - { name = "pytest-asyncio" }, 1061 - { name = "pytest-sugar" }, 1062 - { name = "ruff" }, 1063 - ] 1064 - 1065 - [package.metadata] 1066 - requires-dist = [ 1067 - { name = "fastmcp", specifier = ">=2.0" }, 1068 - { name = "httpx", specifier = ">=0.28" }, 1069 - { name = "pdsx", git = "https://github.com/zzstoatzz/pdsx.git" }, 1070 - ] 1071 - 1072 - [package.metadata.requires-dev] 1073 - dev = [ 1074 - { name = "pytest", specifier = ">=8.3.0" }, 1075 - { name = "pytest-asyncio", specifier = ">=0.25.0" }, 1076 - { name = "pytest-sugar" }, 1077 - { name = "ruff", specifier = ">=0.12.0" }, 1078 1078 ] 1079 1079 1080 1080 [[package]]
-383
scripts/backfill-pds
··· 1 - #!/usr/bin/env -S uv run --script --quiet 2 - # /// script 3 - # requires-python = ">=3.12" 4 - # dependencies = ["httpx", "pydantic-settings"] 5 - # /// 6 - """ 7 - Backfill records directly from a PDS. 8 - 9 - Usage: 10 - ./scripts/backfill-pds did:plc:mkqt76xvfgxuemlwlx6ruc3w 11 - ./scripts/backfill-pds zat.dev 12 - """ 13 - 14 - import argparse 15 - import json 16 - import os 17 - import sys 18 - 19 - import httpx 20 - from pydantic_settings import BaseSettings, SettingsConfigDict 21 - 22 - 23 - class Settings(BaseSettings): 24 - model_config = SettingsConfigDict( 25 - env_file=os.environ.get("ENV_FILE", ".env"), extra="ignore" 26 - ) 27 - 28 - turso_url: str 29 - turso_token: str 30 - 31 - @property 32 - def turso_host(self) -> str: 33 - url = self.turso_url 34 - if url.startswith("libsql://"): 35 - url = url[len("libsql://") :] 36 - return url 37 - 38 - 39 - def resolve_handle(handle: str) -> str: 40 - """Resolve a handle to a DID.""" 41 - resp = httpx.get( 42 - f"https://bsky.social/xrpc/com.atproto.identity.resolveHandle", 43 - params={"handle": handle}, 44 - timeout=30, 45 - ) 46 - resp.raise_for_status() 47 - return resp.json()["did"] 48 - 49 - 50 - def get_pds_endpoint(did: str) -> str: 51 - """Get PDS endpoint from PLC directory.""" 52 - resp = httpx.get(f"https://plc.directory/{did}", timeout=30) 53 - resp.raise_for_status() 54 - data = resp.json() 55 - for service in data.get("service", []): 56 - if service.get("type") == "AtprotoPersonalDataServer": 57 - return service["serviceEndpoint"] 58 - raise ValueError(f"No PDS endpoint found for {did}") 59 - 60 - 61 - def list_records(pds: str, did: str, collection: str) -> list[dict]: 62 - """List all records from a collection.""" 63 - records = [] 64 - cursor = None 65 - while True: 66 - params = {"repo": did, "collection": collection, "limit": 100} 67 - if cursor: 68 - params["cursor"] = cursor 69 - resp = httpx.get( 70 - f"{pds}/xrpc/com.atproto.repo.listRecords", params=params, timeout=30 71 - ) 72 - resp.raise_for_status() 73 - data = resp.json() 74 - records.extend(data.get("records", [])) 75 - cursor = data.get("cursor") 76 - if not cursor: 77 - break 78 - return records 79 - 80 - 81 - def turso_exec(settings: Settings, sql: str, args: list | None = None) -> None: 82 - """Execute a statement against Turso.""" 83 - stmt = {"sql": sql} 84 - if args: 85 - # Handle None values properly - use null type 86 - stmt["args"] = [] 87 - for a in args: 88 - if a is None: 89 - stmt["args"].append({"type": "null"}) 90 - else: 91 - stmt["args"].append({"type": "text", "value": str(a)}) 92 - 93 - response = httpx.post( 94 - f"https://{settings.turso_host}/v2/pipeline", 95 - headers={ 96 - "Authorization": f"Bearer {settings.turso_token}", 97 - "Content-Type": "application/json", 98 - }, 99 - json={"requests": [{"type": "execute", "stmt": stmt}, {"type": "close"}]}, 100 - timeout=30, 101 - ) 102 - if response.status_code != 200: 103 - print(f"Turso error: {response.text}", file=sys.stderr) 104 - response.raise_for_status() 105 - 106 - 107 - def extract_leaflet_blocks(pages: list) -> str: 108 - """Extract text from leaflet pages/blocks structure.""" 109 - texts = [] 110 - for page in pages: 111 - if not isinstance(page, dict): 112 - continue 113 - blocks = page.get("blocks", []) 114 - for wrapper in blocks: 115 - if not isinstance(wrapper, dict): 116 - continue 117 - block = wrapper.get("block", {}) 118 - if not isinstance(block, dict): 119 - continue 120 - # Extract plaintext from text, header, blockquote, code blocks 121 - block_type = block.get("$type", "") 122 - if block_type in ( 123 - "pub.leaflet.blocks.text", 124 - "pub.leaflet.blocks.header", 125 - "pub.leaflet.blocks.blockquote", 126 - "pub.leaflet.blocks.code", 127 - ): 128 - plaintext = block.get("plaintext", "") 129 - if plaintext: 130 - texts.append(plaintext) 131 - # Handle lists 132 - elif block_type == "pub.leaflet.blocks.unorderedList": 133 - texts.extend(extract_list_items(block.get("children", []))) 134 - return " ".join(texts) 135 - 136 - 137 - def extract_list_items(children: list) -> list[str]: 138 - """Recursively extract text from list items.""" 139 - texts = [] 140 - for child in children: 141 - if not isinstance(child, dict): 142 - continue 143 - content = child.get("content", {}) 144 - if isinstance(content, dict): 145 - plaintext = content.get("plaintext", "") 146 - if plaintext: 147 - texts.append(plaintext) 148 - # Recurse into nested children 149 - nested = child.get("children", []) 150 - if nested: 151 - texts.extend(extract_list_items(nested)) 152 - return texts 153 - 154 - 155 - def extract_document(record: dict, collection: str) -> dict | None: 156 - """Extract document fields from a record.""" 157 - value = record.get("value", {}) 158 - 159 - # Get title 160 - title = value.get("title") 161 - if not title: 162 - return None 163 - 164 - # Get content - try textContent (site.standard), then leaflet blocks, then content/text 165 - content = value.get("textContent") or "" 166 - if not content: 167 - # Try leaflet-style pages/blocks 168 - pages = value.get("pages", []) 169 - if pages: 170 - content = extract_leaflet_blocks(pages) 171 - if not content: 172 - # Fall back to simple content/text fields 173 - content = value.get("content") or value.get("text") or "" 174 - if isinstance(content, dict): 175 - # Handle richtext format 176 - content = content.get("text", "") 177 - 178 - # Get created_at 179 - created_at = value.get("createdAt", "") 180 - 181 - # Get publication reference - try "publication" (leaflet) then "site" (site.standard) 182 - publication = value.get("publication") or value.get("site") 183 - publication_uri = None 184 - if publication: 185 - if isinstance(publication, dict): 186 - publication_uri = publication.get("uri") 187 - elif isinstance(publication, str): 188 - publication_uri = publication 189 - 190 - # Get URL path (site.standard.document uses "path" field like "/001") 191 - path = value.get("path") 192 - 193 - # Get tags 194 - tags = value.get("tags", []) 195 - if not isinstance(tags, list): 196 - tags = [] 197 - 198 - # Determine platform from collection (site.standard is a lexicon, not a platform) 199 - if collection.startswith("pub.leaflet"): 200 - platform = "leaflet" 201 - elif collection.startswith("blog.pckt"): 202 - platform = "pckt" 203 - else: 204 - # site.standard.* and others - platform will be detected from publication basePath 205 - platform = "unknown" 206 - 207 - return { 208 - "title": title, 209 - "content": content, 210 - "created_at": created_at, 211 - "publication_uri": publication_uri, 212 - "tags": tags, 213 - "platform": platform, 214 - "collection": collection, 215 - "path": path, 216 - } 217 - 218 - 219 - def main(): 220 - parser = argparse.ArgumentParser(description="Backfill records from a PDS") 221 - parser.add_argument("identifier", help="DID or handle to backfill") 222 - parser.add_argument("--dry-run", action="store_true", help="Show what would be done") 223 - args = parser.parse_args() 224 - 225 - try: 226 - settings = Settings() # type: ignore 227 - except Exception as e: 228 - print(f"error loading settings: {e}", file=sys.stderr) 229 - print("required env vars: TURSO_URL, TURSO_TOKEN", file=sys.stderr) 230 - sys.exit(1) 231 - 232 - # Resolve identifier to DID 233 - identifier = args.identifier 234 - if identifier.startswith("did:"): 235 - did = identifier 236 - else: 237 - print(f"resolving handle {identifier}...") 238 - did = resolve_handle(identifier) 239 - print(f" -> {did}") 240 - 241 - # Get PDS endpoint 242 - print(f"looking up PDS for {did}...") 243 - pds = get_pds_endpoint(did) 244 - print(f" -> {pds}") 245 - 246 - # Collections to fetch 247 - collections = [ 248 - "pub.leaflet.document", 249 - "pub.leaflet.publication", 250 - "site.standard.document", 251 - "site.standard.publication", 252 - ] 253 - 254 - total_docs = 0 255 - total_pubs = 0 256 - 257 - for collection in collections: 258 - print(f"fetching {collection}...") 259 - try: 260 - records = list_records(pds, did, collection) 261 - except httpx.HTTPStatusError as e: 262 - if e.response.status_code == 400: 263 - print(f" (no records)") 264 - continue 265 - raise 266 - 267 - if not records: 268 - print(f" (no records)") 269 - continue 270 - 271 - print(f" found {len(records)} records") 272 - 273 - for record in records: 274 - uri = record["uri"] 275 - # Parse rkey from URI: at://did/collection/rkey 276 - parts = uri.split("/") 277 - rkey = parts[-1] 278 - 279 - if collection.endswith(".document"): 280 - doc = extract_document(record, collection) 281 - if not doc: 282 - print(f" skip {uri} (no title)") 283 - continue 284 - 285 - if args.dry_run: 286 - print(f" would insert: {doc['title'][:50]}...") 287 - else: 288 - # Insert document 289 - turso_exec( 290 - settings, 291 - """ 292 - INSERT INTO documents (uri, did, rkey, title, content, created_at, publication_uri, platform, source_collection, path) 293 - VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) 294 - ON CONFLICT(did, rkey) DO UPDATE SET 295 - uri = excluded.uri, 296 - title = excluded.title, 297 - content = excluded.content, 298 - created_at = excluded.created_at, 299 - publication_uri = excluded.publication_uri, 300 - platform = excluded.platform, 301 - source_collection = excluded.source_collection, 302 - path = excluded.path 303 - """, 304 - [uri, did, rkey, doc["title"], doc["content"], doc["created_at"], doc["publication_uri"], doc["platform"], doc["collection"], doc["path"]], 305 - ) 306 - # Insert tags 307 - for tag in doc["tags"]: 308 - turso_exec( 309 - settings, 310 - "INSERT OR IGNORE INTO document_tags (document_uri, tag) VALUES (?, ?)", 311 - [uri, tag], 312 - ) 313 - # Update FTS index (delete then insert, FTS5 doesn't support ON CONFLICT) 314 - turso_exec(settings, "DELETE FROM documents_fts WHERE uri = ?", [uri]) 315 - turso_exec( 316 - settings, 317 - "INSERT INTO documents_fts (uri, title, content) VALUES (?, ?, ?)", 318 - [uri, doc["title"], doc["content"]], 319 - ) 320 - print(f" indexed: {doc['title'][:50]}...") 321 - total_docs += 1 322 - 323 - elif collection.endswith(".publication"): 324 - value = record["value"] 325 - name = value.get("name", "") 326 - description = value.get("description") 327 - # base_path: try leaflet's "base_path", then strip scheme from site.standard's "url" 328 - base_path = value.get("base_path") 329 - if not base_path: 330 - url = value.get("url") 331 - if url: 332 - # Strip https:// or http:// prefix 333 - if url.startswith("https://"): 334 - base_path = url[len("https://"):] 335 - elif url.startswith("http://"): 336 - base_path = url[len("http://"):] 337 - else: 338 - base_path = url 339 - 340 - if args.dry_run: 341 - print(f" would insert pub: {name}") 342 - else: 343 - turso_exec( 344 - settings, 345 - """ 346 - INSERT INTO publications (uri, did, rkey, name, description, base_path) 347 - VALUES (?, ?, ?, ?, ?, ?) 348 - ON CONFLICT(uri) DO UPDATE SET 349 - name = excluded.name, 350 - description = excluded.description, 351 - base_path = excluded.base_path 352 - """, 353 - [uri, did, rkey, name, description, base_path], 354 - ) 355 - print(f" indexed pub: {name}") 356 - total_pubs += 1 357 - 358 - # post-process: detect platform from publication basePath 359 - if not args.dry_run and (total_docs > 0 or total_pubs > 0): 360 - print("detecting platforms from publication basePath...") 361 - turso_exec( 362 - settings, 363 - """ 364 - UPDATE documents SET platform = 'pckt' 365 - WHERE platform IN ('standardsite', 'unknown') 366 - AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%pckt.blog%') 367 - """, 368 - ) 369 - turso_exec( 370 - settings, 371 - """ 372 - UPDATE documents SET platform = 'leaflet' 373 - WHERE platform IN ('standardsite', 'unknown') 374 - AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%leaflet.pub%') 375 - """, 376 - ) 377 - print(" done") 378 - 379 - print(f"\ndone! {total_docs} documents, {total_pubs} publications") 380 - 381 - 382 - if __name__ == "__main__": 383 - main()
-109
scripts/enumerate-standard-repos
··· 1 - #!/usr/bin/env -S uv run --script --quiet 2 - # /// script 3 - # requires-python = ">=3.12" 4 - # dependencies = ["httpx"] 5 - # /// 6 - """ 7 - Enumerate repos with site.standard.* records and add them to TAP. 8 - 9 - TAP only signals on one collection, so we use this to discover repos 10 - that use site.standard.publication (pckt, etc) and add them to TAP. 11 - 12 - Usage: 13 - ./scripts/enumerate-standard-repos 14 - ./scripts/enumerate-standard-repos --dry-run 15 - """ 16 - 17 - import argparse 18 - import sys 19 - 20 - import httpx 21 - 22 - RELAY_URL = "https://relay1.us-east.bsky.network" 23 - TAP_URL = "http://leaflet-search-tap.internal:2480" # fly internal network 24 - COLLECTION = "site.standard.publication" 25 - 26 - 27 - def enumerate_repos(relay_url: str, collection: str) -> list[str]: 28 - """Enumerate all repos with records in the given collection.""" 29 - dids = [] 30 - cursor = None 31 - 32 - print(f"enumerating repos with {collection}...") 33 - 34 - while True: 35 - params = {"collection": collection, "limit": 1000} 36 - if cursor: 37 - params["cursor"] = cursor 38 - 39 - resp = httpx.get( 40 - f"{relay_url}/xrpc/com.atproto.sync.listReposByCollection", 41 - params=params, 42 - timeout=60, 43 - ) 44 - resp.raise_for_status() 45 - data = resp.json() 46 - 47 - repos = data.get("repos", []) 48 - for repo in repos: 49 - dids.append(repo["did"]) 50 - 51 - if not repos: 52 - break 53 - 54 - cursor = data.get("cursor") 55 - if not cursor: 56 - break 57 - 58 - print(f" found {len(dids)} repos so far...") 59 - 60 - return dids 61 - 62 - 63 - def add_repos_to_tap(tap_url: str, dids: list[str]) -> None: 64 - """Add repos to TAP for syncing.""" 65 - if not dids: 66 - return 67 - 68 - # batch in chunks of 100 69 - batch_size = 100 70 - for i in range(0, len(dids), batch_size): 71 - batch = dids[i:i + batch_size] 72 - resp = httpx.post( 73 - f"{tap_url}/repos/add", 74 - json={"dids": batch}, 75 - timeout=30, 76 - ) 77 - resp.raise_for_status() 78 - print(f" added batch {i // batch_size + 1}: {len(batch)} repos") 79 - 80 - 81 - def main(): 82 - parser = argparse.ArgumentParser(description="Enumerate and add standard.site repos to TAP") 83 - parser.add_argument("--dry-run", action="store_true", help="Show what would be done") 84 - parser.add_argument("--relay-url", default=RELAY_URL, help="Relay URL") 85 - parser.add_argument("--tap-url", default=TAP_URL, help="TAP URL") 86 - args = parser.parse_args() 87 - 88 - dids = enumerate_repos(args.relay_url, COLLECTION) 89 - print(f"found {len(dids)} repos with {COLLECTION}") 90 - 91 - if not dids: 92 - print("no repos to add") 93 - return 94 - 95 - if args.dry_run: 96 - print("dry run - would add these repos to TAP:") 97 - for did in dids[:10]: 98 - print(f" {did}") 99 - if len(dids) > 10: 100 - print(f" ... and {len(dids) - 10} more") 101 - return 102 - 103 - print(f"adding {len(dids)} repos to TAP...") 104 - add_repos_to_tap(args.tap_url, dids) 105 - print("done!") 106 - 107 - 108 - if __name__ == "__main__": 109 - main()
-86
scripts/rebuild-pub-fts
··· 1 - #!/usr/bin/env -S uv run --script --quiet 2 - # /// script 3 - # requires-python = ">=3.12" 4 - # dependencies = ["httpx", "pydantic-settings"] 5 - # /// 6 - """Rebuild publications_fts with base_path column for subdomain search.""" 7 - import os 8 - import httpx 9 - from pydantic_settings import BaseSettings, SettingsConfigDict 10 - 11 - 12 - class Settings(BaseSettings): 13 - model_config = SettingsConfigDict( 14 - env_file=os.environ.get("ENV_FILE", ".env"), extra="ignore" 15 - ) 16 - turso_url: str 17 - turso_token: str 18 - 19 - @property 20 - def turso_host(self) -> str: 21 - url = self.turso_url 22 - if url.startswith("libsql://"): 23 - url = url[len("libsql://") :] 24 - return url 25 - 26 - 27 - settings = Settings() # type: ignore 28 - 29 - print("Rebuilding publications_fts with base_path column...") 30 - 31 - response = httpx.post( 32 - f"https://{settings.turso_host}/v2/pipeline", 33 - headers={ 34 - "Authorization": f"Bearer {settings.turso_token}", 35 - "Content-Type": "application/json", 36 - }, 37 - json={ 38 - "requests": [ 39 - {"type": "execute", "stmt": {"sql": "DROP TABLE IF EXISTS publications_fts"}}, 40 - { 41 - "type": "execute", 42 - "stmt": { 43 - "sql": """ 44 - CREATE VIRTUAL TABLE publications_fts USING fts5( 45 - uri UNINDEXED, 46 - name, 47 - description, 48 - base_path 49 - ) 50 - """ 51 - }, 52 - }, 53 - { 54 - "type": "execute", 55 - "stmt": { 56 - "sql": """ 57 - INSERT INTO publications_fts (uri, name, description, base_path) 58 - SELECT uri, name, COALESCE(description, ''), COALESCE(base_path, '') 59 - FROM publications 60 - """ 61 - }, 62 - }, 63 - {"type": "execute", "stmt": {"sql": "SELECT COUNT(*) FROM publications_fts"}}, 64 - {"type": "close"}, 65 - ] 66 - }, 67 - timeout=60, 68 - ) 69 - response.raise_for_status() 70 - data = response.json() 71 - 72 - for i, result in enumerate(data["results"][:-1]): # skip close 73 - if result["type"] == "error": 74 - print(f"Step {i} error: {result['error']}") 75 - elif result["type"] == "ok": 76 - if i == 3: # count query 77 - rows = result["response"]["result"].get("rows", []) 78 - if rows: 79 - count = ( 80 - rows[0][0].get("value", rows[0][0]) 81 - if isinstance(rows[0][0], dict) 82 - else rows[0][0] 83 - ) 84 - print(f"Rebuilt with {count} publications") 85 - 86 - print("Done!")
+12 -5
site/dashboard.html
··· 3 3 <head> 4 4 <meta charset="UTF-8"> 5 5 <meta name="viewport" content="width=device-width, initial-scale=1.0"> 6 - <title>pub search / stats</title> 6 + <title>leaflet search / stats</title> 7 7 <link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 32 32'><rect x='4' y='18' width='6' height='10' fill='%231B7340'/><rect x='13' y='12' width='6' height='16' fill='%231B7340'/><rect x='22' y='6' width='6' height='22' fill='%231B7340'/></svg>"> 8 8 <link rel="stylesheet" href="dashboard.css"> 9 9 </head> 10 10 <body> 11 11 <div class="container"> 12 - <h1><a href="https://pub-search.waow.tech" class="title">pub search</a> <span class="dim">/ stats</span></h1> 12 + <h1><a href="https://leaflet-search.pages.dev" class="title">leaflet search</a> <span class="dim">/ stats</span></h1> 13 13 14 14 <section> 15 15 <div class="metrics"> ··· 30 30 </section> 31 31 32 32 <section> 33 - <div class="section-title">documents by platform</div> 33 + <div class="section-title">documents</div> 34 34 <div class="chart-box"> 35 - <div id="platforms"></div> 35 + <div class="doc-row"> 36 + <span class="doc-type">articles</span> 37 + <span class="doc-count" id="articles">--</span> 38 + </div> 39 + <div class="doc-row"> 40 + <span class="doc-type">looseleafs</span> 41 + <span class="doc-count" id="looseleafs">--</span> 42 + </div> 36 43 </div> 37 44 </section> 38 45 ··· 56 63 </section> 57 64 58 65 <footer> 59 - <a href="https://pub-search.waow.tech">back</a> ยท source on <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search">tangled</a> 66 + <a href="https://leaflet-search.pages.dev">back</a> ยท source on <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search">tangled</a> 60 67 </footer> 61 68 </div> 62 69
+3 -14
site/dashboard.js
··· 57 57 if (!tags) return; 58 58 59 59 el.innerHTML = tags.slice(0, 20).map(t => 60 - '<a class="tag" href="https://pub-search.waow.tech/?tag=' + encodeURIComponent(t.tag) + '">' + 60 + '<a class="tag" href="https://leaflet-search.pages.dev/?tag=' + encodeURIComponent(t.tag) + '">' + 61 61 escapeHtml(t.tag) + '<span class="n">' + t.count + '</span></a>' 62 62 ).join(''); 63 - } 64 - 65 - function renderPlatforms(platforms) { 66 - const el = document.getElementById('platforms'); 67 - if (!platforms) return; 68 - 69 - platforms.forEach(p => { 70 - const row = document.createElement('div'); 71 - row.className = 'doc-row'; 72 - row.innerHTML = '<span class="doc-type">' + escapeHtml(p.platform) + '</span><span class="doc-count">' + p.count + '</span>'; 73 - el.appendChild(row); 74 - }); 75 63 } 76 64 77 65 function escapeHtml(str) { ··· 95 83 96 84 document.getElementById('searches').textContent = data.searches; 97 85 document.getElementById('publications').textContent = data.publications; 86 + document.getElementById('articles').textContent = data.articles; 87 + document.getElementById('looseleafs').textContent = data.looseleafs; 98 88 99 - renderPlatforms(data.platforms); 100 89 renderTimeline(data.timeline); 101 90 renderPubs(data.topPubs); 102 91 renderTags(data.tags);
+44 -316
site/index.html
··· 4 4 <meta charset="UTF-8"> 5 5 <meta name="viewport" content="width=device-width, initial-scale=1.0"> 6 6 <link rel="icon" type="image/svg+xml" href="/favicon.svg"> 7 - <title>pub search</title> 8 - <meta name="description" content="search atproto publishing platforms"> 9 - <meta property="og:title" content="pub search"> 10 - <meta property="og:description" content="search atproto publishing platforms"> 7 + <title>leaflet search</title> 8 + <meta name="description" content="search for leaflet"> 9 + <meta property="og:title" content="leaflet search"> 10 + <meta property="og:description" content="search for leaflet"> 11 11 <meta property="og:type" content="website"> 12 12 <meta name="twitter:card" content="summary"> 13 - <meta name="twitter:title" content="pub search"> 14 - <meta name="twitter:description" content="search atproto publishing platforms"> 13 + <meta name="twitter:title" content="leaflet search"> 14 + <meta name="twitter:description" content="search for leaflet"> 15 15 <style> 16 16 * { box-sizing: border-box; margin: 0; padding: 0; } 17 17 ··· 75 75 flex: 1; 76 76 padding: 0.5rem; 77 77 font-family: monospace; 78 - font-size: 16px; /* prevents iOS auto-zoom on focus */ 78 + font-size: 14px; 79 79 background: #111; 80 80 border: 1px solid #333; 81 81 color: #ccc; ··· 111 111 .result-title { 112 112 color: #fff; 113 113 margin-bottom: 0.5rem; 114 - /* prevent long titles from breaking layout */ 115 - display: -webkit-box; 116 - -webkit-line-clamp: 2; 117 - -webkit-box-orient: vertical; 118 - overflow: hidden; 119 - word-break: break-word; 120 114 } 121 115 122 116 .result-title a { color: inherit; } ··· 331 325 margin-left: 4px; 332 326 } 333 327 334 - .platform-filter { 335 - margin-bottom: 1rem; 336 - } 337 - 338 - .platform-filter-label { 339 - font-size: 11px; 340 - color: #444; 341 - margin-bottom: 0.5rem; 342 - } 343 - 344 - .platform-filter-list { 345 - display: flex; 346 - gap: 0.5rem; 347 - } 348 - 349 - .platform-option { 350 - font-size: 11px; 351 - padding: 3px 8px; 352 - background: #151515; 353 - border: 1px solid #252525; 354 - border-radius: 3px; 355 - cursor: pointer; 356 - color: #777; 357 - } 358 - 359 - .platform-option:hover { 360 - background: #1a1a1a; 361 - border-color: #333; 362 - color: #aaa; 363 - } 364 - 365 - .platform-option.active { 366 - background: rgba(180, 100, 64, 0.2); 367 - border-color: #d4956a; 368 - color: #d4956a; 369 - } 370 - 371 328 .active-filter { 372 329 display: flex; 373 330 align-items: center; ··· 389 346 .active-filter .clear:hover { 390 347 color: #c44; 391 348 } 392 - 393 - /* mobile improvements */ 394 - @media (max-width: 600px) { 395 - body { 396 - padding: 0.75rem; 397 - font-size: 13px; 398 - } 399 - 400 - .container { 401 - max-width: 100%; 402 - } 403 - 404 - /* ensure minimum 44px touch targets */ 405 - .tag, .platform-option, .suggestion { 406 - min-height: 44px; 407 - display: inline-flex; 408 - align-items: center; 409 - padding: 0.5rem 0.75rem; 410 - } 411 - 412 - button { 413 - min-height: 44px; 414 - padding: 0.5rem 0.75rem; 415 - } 416 - 417 - /* stack search box on very small screens */ 418 - .search-box { 419 - flex-direction: column; 420 - gap: 0.5rem; 421 - } 422 - 423 - .search-box input[type="text"] { 424 - width: 100%; 425 - } 426 - 427 - .search-box button { 428 - width: 100%; 429 - } 430 - 431 - /* result card mobile tweaks */ 432 - .result { 433 - padding: 0.75rem 0; 434 - } 435 - 436 - .result:hover { 437 - margin: 0 -0.75rem; 438 - padding: 0.75rem; 439 - } 440 - 441 - .result-title { 442 - font-size: 14px; 443 - line-height: 1.4; 444 - } 445 - 446 - .result-snippet { 447 - font-size: 12px; 448 - line-height: 1.5; 449 - } 450 - 451 - /* badges inline on mobile */ 452 - .entity-type, .platform-badge { 453 - font-size: 9px; 454 - padding: 2px 5px; 455 - margin-right: 6px; 456 - vertical-align: middle; 457 - } 458 - 459 - /* tags wrap better on mobile */ 460 - .tags-list, .platform-filter-list { 461 - gap: 0.5rem; 462 - } 463 - 464 - /* suggestions responsive */ 465 - .suggestions { 466 - line-height: 2; 467 - } 468 - 469 - /* related items more compact */ 470 - .related-item { 471 - max-width: 150px; 472 - font-size: 11px; 473 - padding: 0.5rem; 474 - } 475 - } 476 - 477 - /* ensure touch targets on tablets too */ 478 - @media (hover: none) and (pointer: coarse) { 479 - .tag, .platform-option, .suggestion, .related-item { 480 - min-height: 44px; 481 - display: inline-flex; 482 - align-items: center; 483 - } 484 - } 485 349 </style> 486 350 </head> 487 351 <body> 488 352 <div class="container"> 489 - <h1><a href="/" class="title">pub search</a> <span class="by">by <a href="https://bsky.app/profile/zzstoatzz.io" target="_blank">@zzstoatzz.io</a></span> <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search" target="_blank" class="src">[src]</a></h1> 353 + <h1><a href="/" class="title">leaflet search</a> <span class="by">by <a href="https://bsky.app/profile/zzstoatzz.io" target="_blank">@zzstoatzz.io</a></span> <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search" target="_blank" class="src">[src]</a></h1> 490 354 491 355 <div class="search-box"> 492 356 <input type="text" id="query" placeholder="search content..." autofocus> ··· 499 363 500 364 <div id="tags" class="tags"></div> 501 365 502 - <div id="platform-filter" class="platform-filter"></div> 503 - 504 366 <div id="results" class="results"> 505 367 <div class="empty-state"> 506 - <p>search atproto publishing platforms</p> 507 - <p style="font-size:11px;margin-top:0.5rem"><a href="https://leaflet.pub" target="_blank">leaflet</a> ยท <a href="https://pckt.blog" target="_blank">pckt</a> ยท <a href="https://standard.site" target="_blank">standard.site</a></p> 368 + <p>search for <a href="https://leaflet.pub" target="_blank">leaflet.pub</a></p> 508 369 </div> 509 370 </div> 510 371 ··· 523 384 const tagsDiv = document.getElementById('tags'); 524 385 const activeFilterDiv = document.getElementById('active-filter'); 525 386 const suggestionsDiv = document.getElementById('suggestions'); 526 - const platformFilterDiv = document.getElementById('platform-filter'); 527 387 528 388 let currentTag = null; 529 - let currentPlatform = null; 530 389 let allTags = []; 531 390 let popularSearches = []; 532 391 533 - async function search(query, tag = null, platform = null) { 534 - if (!query.trim() && !tag && !platform) return; 392 + async function search(query, tag = null) { 393 + if (!query.trim() && !tag) return; 535 394 536 395 searchBtn.disabled = true; 537 396 let searchUrl = `${API_URL}/search?q=${encodeURIComponent(query || '')}`; 538 397 if (tag) searchUrl += `&tag=${encodeURIComponent(tag)}`; 539 - if (platform) searchUrl += `&platform=${encodeURIComponent(platform)}`; 540 398 resultsDiv.innerHTML = `<div class="status">searching...</div>`; 541 399 542 400 try { ··· 559 417 if (results.length === 0) { 560 418 resultsDiv.innerHTML = ` 561 419 <div class="empty-state"> 562 - <p>no results${query ? ` for ${formatQueryForDisplay(query)}` : ''}${tag ? ` in #${escapeHtml(tag)}` : ''}${platform ? ` on ${escapeHtml(platform)}` : ''}</p> 420 + <p>no results${query ? ` for "${escapeHtml(query)}"` : ''}${tag ? ` in #${escapeHtml(tag)}` : ''}</p> 563 421 <p>try different keywords</p> 564 422 </div> 565 423 `; ··· 571 429 572 430 for (const doc of results) { 573 431 const entityType = doc.type || 'article'; 574 - const platform = doc.platform || 'leaflet'; 575 432 576 - // build URL based on entity type and platform 577 - const docUrl = buildDocUrl(doc, entityType, platform); 578 - // only show platform badge for actual platforms, not for lexicon-only records 579 - const platformConfig = PLATFORM_CONFIG[platform]; 580 - const platformBadge = platformConfig 581 - ? `<span class="platform-badge">${escapeHtml(platformConfig.label)}</span>` 582 - : ''; 583 - const date = doc.createdAt ? new Date(doc.createdAt).toLocaleDateString() : ''; 584 - 585 - // platform home URL for meta link 586 - const platformHome = getPlatformHome(platform, doc.basePath); 433 + // build URL based on entity type 434 + let leafletUrl = null; 435 + if (entityType === 'publication') { 436 + // publications link to their base path 437 + leafletUrl = doc.basePath ? `https://${doc.basePath}` : null; 438 + } else { 439 + // articles and looseleafs link to specific document 440 + leafletUrl = doc.basePath && doc.rkey 441 + ? `https://${doc.basePath}/${doc.rkey}` 442 + : (doc.did && doc.rkey ? `https://leaflet.pub/p/${doc.did}/${doc.rkey}` : null); 443 + } 587 444 445 + const date = doc.createdAt ? new Date(doc.createdAt).toLocaleDateString() : ''; 446 + const platform = doc.platform || 'leaflet'; 447 + const platformBadge = platform !== 'leaflet' ? `<span class="platform-badge">${escapeHtml(platform)}</span>` : ''; 588 448 html += ` 589 449 <div class="result"> 590 450 <div class="result-title"> 591 451 <span class="entity-type ${entityType}">${entityType}</span>${platformBadge} 592 - ${docUrl 593 - ? `<a href="${docUrl}" target="_blank">${escapeHtml(doc.title || 'Untitled')}</a>` 452 + ${leafletUrl 453 + ? `<a href="${leafletUrl}" target="_blank">${escapeHtml(doc.title || 'Untitled')}</a>` 594 454 : escapeHtml(doc.title || 'Untitled')} 595 455 </div> 596 456 <div class="result-snippet">${highlightTerms(doc.snippet, query)}</div> 597 457 <div class="result-meta"> 598 - ${date ? `${date} | ` : ''}${platformHome.url 599 - ? `<a href="${platformHome.url}" target="_blank">${platformHome.label}</a>` 600 - : platformHome.label} 458 + ${date ? `${date} | ` : ''}${doc.basePath 459 + ? `<a href="https://${doc.basePath}" target="_blank">${doc.basePath}</a>` 460 + : `<a href="https://leaflet.pub" target="_blank">leaflet.pub</a>`} 601 461 </div> 602 462 </div> 603 463 `; ··· 625 485 })[c]); 626 486 } 627 487 628 - // display query without adding redundant quotes 629 - function formatQueryForDisplay(query) { 630 - if (!query) return ''; 631 - const escaped = escapeHtml(query); 632 - // if query is already fully quoted, don't add more quotes 633 - if (query.startsWith('"') && query.endsWith('"')) { 634 - return escaped; 635 - } 636 - return `"${escaped}"`; 637 - } 638 - 639 - // platform-specific URL patterns 640 - // note: some platforms use basePath from publication, which we prefer 641 - // fallback docUrl() is used when basePath is missing 642 - const PLATFORM_CONFIG = { 643 - leaflet: { 644 - home: 'https://leaflet.pub', 645 - label: 'leaflet.pub', 646 - // leaflet uses did/rkey pattern for fallback URLs 647 - docUrl: (did, rkey) => `https://leaflet.pub/p/${did}/${rkey}` 648 - }, 649 - pckt: { 650 - home: 'https://pckt.blog', 651 - label: 'pckt.blog', 652 - // pckt uses blog slugs + path, not did/rkey - needs basePath from publication 653 - docUrl: null 654 - }, 655 - offprint: { 656 - home: 'https://offprint.app', 657 - label: 'offprint.app', 658 - // offprint is in early beta, URL pattern unknown 659 - docUrl: null 660 - }, 661 - }; 662 - 663 - function buildDocUrl(doc, entityType, platform) { 664 - if (entityType === 'publication') { 665 - return doc.basePath ? `https://${doc.basePath}` : null; 666 - } 667 - 668 - // Platform-specific URL patterns: 669 - // 1. Leaflet: basePath + rkey (e.g., https://dad.leaflet.pub/3mburumcnbs2m) 670 - if (platform === 'leaflet' && doc.basePath && doc.rkey) { 671 - return `https://${doc.basePath}/${doc.rkey}`; 672 - } 673 - 674 - // 2. pckt: basePath + path (e.g., https://devlog.pckt.blog/some-slug-abc123) 675 - if (platform === 'pckt' && doc.basePath && doc.path) { 676 - return `https://${doc.basePath}${doc.path}`; 677 - } 678 - 679 - // 3. Other platforms with path: basePath + path 680 - if (doc.basePath && doc.path) { 681 - return `https://${doc.basePath}${doc.path}`; 682 - } 683 - 684 - // 4. Platform-specific fallback URL (e.g., leaflet.pub/p/did/rkey) 685 - const config = PLATFORM_CONFIG[platform]; 686 - if (config?.docUrl && doc.did && doc.rkey) { 687 - return config.docUrl(doc.did, doc.rkey); 688 - } 689 - 690 - // 5. Fallback: pdsls.dev universal viewer (always works for any AT Protocol record) 691 - if (doc.uri) { 692 - return `https://pdsls.dev/${doc.uri}`; 693 - } 694 - 695 - return null; 696 - } 697 - 698 - function getPlatformHome(platform, basePath) { 699 - if (basePath) { 700 - return { url: `https://${basePath}`, label: basePath }; 701 - } 702 - const config = PLATFORM_CONFIG[platform]; 703 - if (config) { 704 - return { url: config.home, label: config.label }; 705 - } 706 - // unknown platform using standard.site lexicon - link to standard.site 707 - return { url: 'https://standard.site', label: 'standard.site' }; 708 - } 709 - 710 488 function highlightTerms(text, query) { 711 489 if (!text || !query) return escapeHtml(text); 712 490 const terms = query.toLowerCase().split(/\s+/).filter(t => t.length > 0); ··· 725 503 const q = queryInput.value.trim(); 726 504 if (q) params.set('q', q); 727 505 if (currentTag) params.set('tag', currentTag); 728 - if (currentPlatform) params.set('platform', currentPlatform); 729 506 const url = params.toString() ? `?${params}` : '/'; 730 507 history.pushState(null, '', url); 731 508 } 732 509 733 510 function doSearch() { 734 511 updateUrl(); 735 - search(queryInput.value, currentTag, currentPlatform); 512 + search(queryInput.value, currentTag); 736 513 } 737 514 738 515 function setTag(tag) { 739 - if (currentTag === tag) { 740 - clearTag(); 741 - return; 742 - } 743 516 currentTag = tag; 744 517 renderActiveFilter(); 745 518 renderTags(); ··· 751 524 renderActiveFilter(); 752 525 renderTags(); 753 526 updateUrl(); 754 - if (queryInput.value.trim() || currentPlatform) { 755 - search(queryInput.value, null, currentPlatform); 756 - } else { 757 - renderEmptyState(); 758 - } 759 - } 760 - 761 - function setPlatform(platform) { 762 - if (currentPlatform === platform) { 763 - clearPlatform(); 764 - return; 765 - } 766 - currentPlatform = platform; 767 - renderActiveFilter(); 768 - renderPlatformFilter(); 769 - doSearch(); 770 - } 771 - 772 - function clearPlatform() { 773 - currentPlatform = null; 774 - renderActiveFilter(); 775 - renderPlatformFilter(); 776 - updateUrl(); 777 - if (queryInput.value.trim() || currentTag) { 778 - search(queryInput.value, currentTag, null); 527 + if (queryInput.value.trim()) { 528 + search(queryInput.value, null); 779 529 } else { 780 530 renderEmptyState(); 781 531 } 782 532 } 783 533 784 - function renderPlatformFilter() { 785 - const platforms = [ 786 - { id: 'leaflet', label: 'leaflet' }, 787 - { id: 'pckt', label: 'pckt' }, 788 - ]; 789 - const html = platforms.map(p => ` 790 - <span class="platform-option${currentPlatform === p.id ? ' active' : ''}" onclick="setPlatform('${p.id}')">${p.label}</span> 791 - `).join(''); 792 - platformFilterDiv.innerHTML = `<div class="platform-filter-label">filter by platform:</div><div class="platform-filter-list">${html}</div>`; 793 - } 794 - 795 534 function renderActiveFilter() { 796 - if (!currentTag && !currentPlatform) { 535 + if (!currentTag) { 797 536 activeFilterDiv.innerHTML = ''; 798 537 return; 799 538 } 800 - let parts = []; 801 - if (currentTag) parts.push(`tag: <strong>#${escapeHtml(currentTag)}</strong>`); 802 - if (currentPlatform) parts.push(`platform: <strong>${escapeHtml(currentPlatform)}</strong>`); 803 - const clearActions = []; 804 - if (currentTag) clearActions.push(`<span class="clear" onclick="clearTag()">ร— tag</span>`); 805 - if (currentPlatform) clearActions.push(`<span class="clear" onclick="clearPlatform()">ร— platform</span>`); 806 539 activeFilterDiv.innerHTML = ` 807 540 <div class="active-filter"> 808 - <span>filtering by ${parts.join(', ')} <span style="color:#666;font-size:10px">(documents only)</span></span> 809 - ${clearActions.join(' ')} 541 + <span>filtering by tag: <strong>#${escapeHtml(currentTag)}</strong> <span style="color:#666;font-size:10px">(documents only)</span></span> 542 + <span class="clear" onclick="clearTag()">ร— clear</span> 810 543 </div> 811 544 `; 812 545 } ··· 868 601 function renderEmptyState() { 869 602 resultsDiv.innerHTML = ` 870 603 <div class="empty-state"> 871 - <p>search atproto publishing platforms</p> 872 - <p style="font-size:11px;margin-top:0.5rem"><a href="https://leaflet.pub" target="_blank">leaflet</a> ยท <a href="https://pckt.blog" target="_blank">pckt</a> ยท <a href="https://standard.site" target="_blank">standard.site</a></p> 604 + <p>search for <a href="https://leaflet.pub" target="_blank">leaflet.pub</a></p> 873 605 </div> 874 606 `; 875 607 } ··· 888 620 const params = new URLSearchParams(location.search); 889 621 queryInput.value = params.get('q') || ''; 890 622 currentTag = params.get('tag') || null; 891 - currentPlatform = params.get('platform') || null; 892 623 renderActiveFilter(); 893 624 renderTags(); 894 - renderPlatformFilter(); 895 - if (queryInput.value || currentTag || currentPlatform) search(queryInput.value, currentTag, currentPlatform); 625 + if (queryInput.value || currentTag) search(queryInput.value, currentTag); 896 626 }); 897 627 898 628 // init 899 629 const initialParams = new URLSearchParams(location.search); 900 630 const initialQuery = initialParams.get('q'); 901 631 const initialTag = initialParams.get('tag'); 902 - const initialPlatform = initialParams.get('platform'); 903 632 if (initialQuery) queryInput.value = initialQuery; 904 633 if (initialTag) currentTag = initialTag; 905 - if (initialPlatform) currentPlatform = initialPlatform; 906 634 renderActiveFilter(); 907 - renderPlatformFilter(); 908 635 909 - if (initialQuery || initialTag || initialPlatform) { 910 - search(initialQuery || '', initialTag, initialPlatform); 636 + if (initialQuery || initialTag) { 637 + search(initialQuery || '', initialTag); 911 638 } 912 639 913 640 async function loadRelated(topResult) { ··· 933 660 if (filtered.length === 0) return; 934 661 935 662 const items = filtered.map(doc => { 936 - const platform = doc.platform || 'leaflet'; 937 - const url = buildDocUrl(doc, doc.type || 'article', platform); 663 + const url = doc.basePath && doc.rkey 664 + ? `https://${doc.basePath}/${doc.rkey}` 665 + : (doc.did && doc.rkey ? `https://leaflet.pub/p/${doc.did}/${doc.rkey}` : null); 938 666 return url 939 667 ? `<a href="${url}" target="_blank" class="related-item">${escapeHtml(doc.title || 'Untitled')}</a>` 940 668 : `<span class="related-item">${escapeHtml(doc.title || 'Untitled')}</span>`;
+40 -32
site/loading.js
··· 82 82 const style = document.createElement('style'); 83 83 style.id = 'loader-styles'; 84 84 style.textContent = ` 85 - /* skeleton shimmer - subtle pulse */ 85 + /* skeleton shimmer for loading values */ 86 86 .loading .metric-value, 87 87 .loading .doc-count, 88 88 .loading .pub-count { 89 - color: #333 !important; 90 - animation: dim-pulse 2s ease-in-out infinite; 89 + background: linear-gradient(90deg, #1a1a1a 25%, #252525 50%, #1a1a1a 75%); 90 + background-size: 200% 100%; 91 + animation: shimmer 1.5s infinite; 92 + border-radius: 3px; 93 + color: transparent !important; 94 + min-width: 3ch; 95 + display: inline-block; 91 96 } 92 97 93 - @keyframes dim-pulse { 94 - 0%, 100% { opacity: 0.3; } 95 - 50% { opacity: 0.6; } 98 + @keyframes shimmer { 99 + 0% { background-position: 200% 0; } 100 + 100% { background-position: -200% 0; } 96 101 } 97 102 98 - /* wake message - terminal style, ephemeral */ 103 + /* wake message */ 99 104 .wake-message { 100 105 position: fixed; 101 - bottom: 1rem; 102 - left: 1rem; 103 - font-family: monospace; 106 + top: 1rem; 107 + right: 1rem; 104 108 font-size: 11px; 105 - color: #444; 109 + color: #666; 110 + background: #111; 111 + border: 1px solid #222; 112 + padding: 6px 12px; 113 + border-radius: 4px; 114 + display: flex; 115 + align-items: center; 116 + gap: 8px; 106 117 z-index: 1000; 107 - animation: fade-in 0.5s ease; 108 - } 109 - 110 - .wake-message::before { 111 - content: '>'; 112 - margin-right: 6px; 113 - opacity: 0.5; 118 + animation: fade-in 0.2s ease; 114 119 } 115 120 116 121 .wake-dot { 117 - display: inline-block; 118 - width: 4px; 119 - height: 4px; 120 - background: #555; 122 + width: 6px; 123 + height: 6px; 124 + background: #4ade80; 121 125 border-radius: 50%; 122 - margin-left: 4px; 123 - animation: blink 1s step-end infinite; 126 + animation: pulse-dot 1s infinite; 124 127 } 125 128 126 - @keyframes blink { 127 - 0%, 100% { opacity: 1; } 128 - 50% { opacity: 0; } 129 + @keyframes pulse-dot { 130 + 0%, 100% { opacity: 0.3; } 131 + 50% { opacity: 1; } 129 132 } 130 133 131 134 @keyframes fade-in { 132 - from { opacity: 0; } 133 - to { opacity: 1; } 135 + from { opacity: 0; transform: translateY(-4px); } 136 + to { opacity: 1; transform: translateY(0); } 134 137 } 135 138 136 139 .wake-message.fade-out { 137 - animation: fade-out 0.5s ease forwards; 140 + animation: fade-out 0.3s ease forwards; 138 141 } 139 142 140 143 @keyframes fade-out { 141 - to { opacity: 0; } 144 + to { opacity: 0; transform: translateY(-4px); } 142 145 } 143 146 144 147 /* loaded transition */ 145 148 .loaded .metric-value, 146 149 .loaded .doc-count, 147 150 .loaded .pub-count { 148 - animation: none; 151 + animation: reveal 0.3s ease; 152 + } 153 + 154 + @keyframes reveal { 155 + from { opacity: 0; } 156 + to { opacity: 1; } 149 157 } 150 158 `; 151 159 document.head.appendChild(style);
+4 -5
tap/fly.toml
··· 1 1 app = 'leaflet-search-tap' 2 - primary_region = 'ewr' 2 + primary_region = 'iad' 3 3 4 4 [build] 5 5 image = 'ghcr.io/bluesky-social/indigo/tap:latest' ··· 9 9 TAP_BIND = ':2480' 10 10 TAP_RELAY_URL = 'https://relay1.us-east.bsky.network' 11 11 TAP_SIGNAL_COLLECTION = 'pub.leaflet.document' 12 - TAP_COLLECTION_FILTERS = 'pub.leaflet.document,pub.leaflet.publication,site.standard.document,site.standard.publication' 12 + TAP_COLLECTION_FILTERS = 'pub.leaflet.document,pub.leaflet.publication' 13 + TAP_DISABLE_ACKS = 'true' 13 14 TAP_LOG_LEVEL = 'info' 14 - TAP_RESYNC_PARALLELISM = '2' 15 - TAP_IDENT_CACHE_SIZE = '10000' 16 15 TAP_CURSOR_SAVE_INTERVAL = '5s' 17 16 TAP_REPO_FETCH_TIMEOUT = '600s' 18 17 ··· 24 23 min_machines_running = 1 25 24 26 25 [[vm]] 27 - memory = '1gb' 26 + memory = '2gb' 28 27 cpu_kind = 'shared' 29 28 cpus = 1 30 29