+13
-32
README.md
+13
-32
README.md
···
1
-
# pub search
1
+
# leaflet-search
2
2
3
3
by [@zzstoatzz.io](https://bsky.app/profile/zzstoatzz.io)
4
4
5
-
search ATProto publishing platforms ([leaflet](https://leaflet.pub), [pckt](https://pckt.blog), and others using [standard.site](https://standard.site)).
6
-
7
-
**live:** [pub-search.waow.tech](https://pub-search.waow.tech)
5
+
search for [leaflet](https://leaflet.pub).
8
6
9
-
> formerly "leaflet-search" - generalized to support multiple publishing platforms
7
+
**live:** [leaflet-search.pages.dev](https://leaflet-search.pages.dev)
10
8
11
9
## how it works
12
10
13
-
1. **tap** syncs content from ATProto firehose (signals on `pub.leaflet.document`, filters `pub.leaflet.*` + `site.standard.*`)
11
+
1. **tap** syncs leaflet content from the network
14
12
2. **backend** indexes content into SQLite FTS5 via [Turso](https://turso.tech), serves search API
15
13
3. **site** static frontend on Cloudflare Pages
16
14
···
19
17
search is also exposed as an MCP server for AI agents like Claude Code:
20
18
21
19
```bash
22
-
claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'
20
+
claude mcp add-json leaflet '{"type": "http", "url": "https://leaflet-search-by-zzstoatzz.fastmcp.app/mcp"}'
23
21
```
24
22
25
23
see [mcp/README.md](mcp/README.md) for local setup and usage details.
···
27
25
## api
28
26
29
27
```
30
-
GET /search?q=<query>&tag=<tag>&platform=<platform> # full-text search
31
-
GET /similar?uri=<at-uri> # find similar documents
32
-
GET /tags # list all tags with counts
33
-
GET /popular # popular search queries
34
-
GET /stats # document/publication counts
35
-
GET /health # health check
28
+
GET /search?q=<query>&tag=<tag> # full-text search with query, tag, or both
29
+
GET /similar?uri=<at-uri> # find similar documents via vector embeddings
30
+
GET /tags # list all tags with counts
31
+
GET /popular # popular search queries
32
+
GET /stats # document/publication counts
33
+
GET /health # health check
36
34
```
37
35
38
-
search returns three entity types: `article` (document in a publication), `looseleaf` (standalone document), `publication` (newsletter itself). each result includes a `platform` field (leaflet, pckt, etc). tag and platform filtering apply to documents only.
36
+
search returns three entity types: `article` (document in a publication), `looseleaf` (standalone document), `publication` (newsletter itself). tag filtering applies to documents only.
39
37
40
38
`/similar` uses [Voyage AI](https://voyageai.com) embeddings with brute-force cosine similarity (~0.15s for 3500 docs).
41
39
42
-
## configuration
43
-
44
-
the backend is fully configurable via environment variables:
45
-
46
-
| variable | default | description |
47
-
|----------|---------|-------------|
48
-
| `APP_NAME` | `leaflet-search` | name shown in startup logs |
49
-
| `DASHBOARD_URL` | `https://pub-search.waow.tech/dashboard.html` | redirect target for `/dashboard` |
50
-
| `TAP_HOST` | `leaflet-search-tap.fly.dev` | TAP websocket host |
51
-
| `TAP_PORT` | `443` | TAP websocket port |
52
-
| `PORT` | `3000` | HTTP server port |
53
-
| `TURSO_URL` | - | Turso database URL (required) |
54
-
| `TURSO_TOKEN` | - | Turso auth token (required) |
55
-
| `VOYAGE_API_KEY` | - | Voyage AI API key (for embeddings) |
56
-
57
-
the backend indexes multiple ATProto platforms - currently `pub.leaflet.*` and `site.standard.*` collections. platform is stored per-document and returned in search results.
58
-
59
40
## [stack](https://bsky.app/profile/zzstoatzz.io/post/3mbij5ip4ws2a)
60
41
61
42
- [Fly.io](https://fly.io) hosts backend + tap
62
43
- [Turso](https://turso.tech) cloud SQLite with vector support
63
44
- [Voyage AI](https://voyageai.com) embeddings (voyage-3-lite)
64
-
- [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) syncs content from ATProto firehose
45
+
- [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) syncs leaflet content from ATProto firehose
65
46
- [Zig](https://ziglang.org) HTTP server, search API, content indexing
66
47
- [Cloudflare Pages](https://pages.cloudflare.com) static frontend
67
48
+2
-2
backend/build.zig.zon
+2
-2
backend/build.zig.zon
···
13
13
.hash = "zql-0.0.1-alpha-xNRI4IRNAABUb9gLat5FWUaZDD5HvxAxet_-elgR_A_y",
14
14
},
15
15
.zat = .{
16
-
.url = "https://tangled.sh/zat.dev/zat/archive/main",
17
-
.hash = "zat-0.1.0-5PuC7heIAQA4j2UVmJT-oivQh5AwZTrFQ-NC4CJi2-_R",
16
+
.url = "https://tangled.sh/zzstoatzz.io/zat/archive/main",
17
+
.hash = "zat-0.1.0-5PuC7ntmAQA9_8rALQwWad2riXWTY9p_ohVOD54_Y-2c",
18
18
},
19
19
},
20
20
.paths = .{
+18
-27
backend/src/dashboard.zig
+18
-27
backend/src/dashboard.zig
···
7
7
const TagJson = struct { tag: []const u8, count: i64 };
8
8
const TimelineJson = struct { date: []const u8, count: i64 };
9
9
const PubJson = struct { name: []const u8, basePath: []const u8, count: i64 };
10
-
const PlatformJson = struct { platform: []const u8, count: i64 };
11
10
12
11
/// All data needed to render the dashboard
13
12
pub const Data = struct {
14
13
started_at: i64,
15
14
searches: i64,
16
15
publications: i64,
17
-
documents: i64,
16
+
articles: i64,
17
+
looseleafs: i64,
18
18
tags_json: []const u8,
19
19
timeline_json: []const u8,
20
20
top_pubs_json: []const u8,
21
-
platforms_json: []const u8,
22
21
};
23
22
24
23
// all dashboard queries batched into one request
···
31
30
\\ (SELECT service_started_at FROM stats WHERE id = 1) as started_at
32
31
;
33
32
34
-
const PLATFORMS_SQL =
35
-
\\SELECT platform, COUNT(*) as count
33
+
const DOC_TYPES_SQL =
34
+
\\SELECT
35
+
\\ SUM(CASE WHEN publication_uri != '' THEN 1 ELSE 0 END) as articles,
36
+
\\ SUM(CASE WHEN publication_uri = '' OR publication_uri IS NULL THEN 1 ELSE 0 END) as looseleafs
36
37
\\FROM documents
37
-
\\GROUP BY platform
38
-
\\ORDER BY count DESC
39
38
;
40
39
41
40
const TAGS_SQL =
···
70
69
// batch all 5 queries into one HTTP request
71
70
var batch = client.queryBatch(&.{
72
71
.{ .sql = STATS_SQL },
73
-
.{ .sql = PLATFORMS_SQL },
72
+
.{ .sql = DOC_TYPES_SQL },
74
73
.{ .sql = TAGS_SQL },
75
74
.{ .sql = TIMELINE_SQL },
76
75
.{ .sql = TOP_PUBS_SQL },
···
82
81
const started_at = if (stats_row) |r| r.int(4) else 0;
83
82
const searches = if (stats_row) |r| r.int(2) else 0;
84
83
const publications = if (stats_row) |r| r.int(1) else 0;
85
-
const documents = if (stats_row) |r| r.int(0) else 0;
84
+
85
+
// extract doc types (query 1)
86
+
const doc_row = batch.getFirst(1);
87
+
const articles = if (doc_row) |r| r.int(0) else 0;
88
+
const looseleafs = if (doc_row) |r| r.int(1) else 0;
86
89
87
90
return .{
88
91
.started_at = started_at,
89
92
.searches = searches,
90
93
.publications = publications,
91
-
.documents = documents,
94
+
.articles = articles,
95
+
.looseleafs = looseleafs,
92
96
.tags_json = try formatTagsJson(alloc, batch.get(2)),
93
97
.timeline_json = try formatTimelineJson(alloc, batch.get(3)),
94
98
.top_pubs_json = try formatPubsJson(alloc, batch.get(4)),
95
-
.platforms_json = try formatPlatformsJson(alloc, batch.get(1)),
96
99
};
97
100
}
98
101
···
126
129
return try output.toOwnedSlice();
127
130
}
128
131
129
-
fn formatPlatformsJson(alloc: Allocator, rows: []const db.Row) ![]const u8 {
130
-
var output: std.Io.Writer.Allocating = .init(alloc);
131
-
errdefer output.deinit();
132
-
var jw: json.Stringify = .{ .writer = &output.writer };
133
-
try jw.beginArray();
134
-
for (rows) |row| try jw.write(PlatformJson{ .platform = row.text(0), .count = row.int(1) });
135
-
try jw.endArray();
136
-
return try output.toOwnedSlice();
137
-
}
138
-
139
132
/// Generate dashboard data as JSON for API endpoint
140
133
pub fn toJson(alloc: Allocator, data: Data) ![]const u8 {
141
134
var output: std.Io.Writer.Allocating = .init(alloc);
···
153
146
try jw.objectField("publications");
154
147
try jw.write(data.publications);
155
148
156
-
try jw.objectField("documents");
157
-
try jw.write(data.documents);
149
+
try jw.objectField("articles");
150
+
try jw.write(data.articles);
158
151
159
-
try jw.objectField("platforms");
160
-
try jw.beginWriteRaw();
161
-
try jw.writer.writeAll(data.platforms_json);
162
-
jw.endWriteRaw();
152
+
try jw.objectField("looseleafs");
153
+
try jw.write(data.looseleafs);
163
154
164
155
// use beginWriteRaw/endWriteRaw for pre-formatted JSON arrays
165
156
try jw.objectField("tags");
+1
-39
backend/src/db/schema.zig
+1
-39
backend/src/db/schema.zig
···
44
44
\\CREATE VIRTUAL TABLE IF NOT EXISTS publications_fts USING fts5(
45
45
\\ uri UNINDEXED,
46
46
\\ name,
47
-
\\ description,
48
-
\\ base_path
47
+
\\ description
49
48
\\)
50
49
, &.{});
51
50
···
128
127
client.exec("UPDATE documents SET platform = 'leaflet' WHERE platform IS NULL", &.{}) catch {};
129
128
client.exec("UPDATE documents SET source_collection = 'pub.leaflet.document' WHERE source_collection IS NULL", &.{}) catch {};
130
129
131
-
// multi-platform support for publications
132
-
client.exec("ALTER TABLE publications ADD COLUMN platform TEXT DEFAULT 'leaflet'", &.{}) catch {};
133
-
client.exec("ALTER TABLE publications ADD COLUMN source_collection TEXT DEFAULT 'pub.leaflet.publication'", &.{}) catch {};
134
-
client.exec("UPDATE publications SET platform = 'leaflet' WHERE platform IS NULL", &.{}) catch {};
135
-
client.exec("UPDATE publications SET source_collection = 'pub.leaflet.publication' WHERE source_collection IS NULL", &.{}) catch {};
136
-
137
130
// vector embeddings column already added by backfill script
138
-
139
-
// dedupe index: same (did, rkey) across collections = same document
140
-
// e.g., pub.leaflet.document/abc and site.standard.document/abc are the same content
141
-
client.exec("CREATE UNIQUE INDEX IF NOT EXISTS idx_documents_did_rkey ON documents(did, rkey)", &.{}) catch {};
142
-
client.exec("CREATE UNIQUE INDEX IF NOT EXISTS idx_publications_did_rkey ON publications(did, rkey)", &.{}) catch {};
143
-
144
-
// backfill platform from source_collection for records indexed before platform detection fix
145
-
client.exec("UPDATE documents SET platform = 'leaflet' WHERE platform = 'unknown' AND source_collection LIKE 'pub.leaflet.%'", &.{}) catch {};
146
-
client.exec("UPDATE documents SET platform = 'pckt' WHERE platform = 'unknown' AND source_collection LIKE 'blog.pckt.%'", &.{}) catch {};
147
-
148
-
// detect platform from publication basePath (site.standard.* is a lexicon, not a platform)
149
-
// pckt uses site.standard.* lexicon but basePath contains pckt.blog
150
-
client.exec(
151
-
\\UPDATE documents SET platform = 'pckt'
152
-
\\WHERE platform IN ('standardsite', 'unknown')
153
-
\\AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%pckt.blog%')
154
-
, &.{}) catch {};
155
-
156
-
// leaflet also uses site.standard.* lexicon, detect by basePath
157
-
client.exec(
158
-
\\UPDATE documents SET platform = 'leaflet'
159
-
\\WHERE platform IN ('standardsite', 'unknown')
160
-
\\AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%leaflet.pub%')
161
-
, &.{}) catch {};
162
-
163
-
// URL path field for documents (e.g., "/001" for zat.dev)
164
-
// used to build full URL: publication.url + document.path
165
-
client.exec("ALTER TABLE documents ADD COLUMN path TEXT", &.{}) catch {};
166
-
167
-
// note: publications_fts was rebuilt with base_path column via scripts/rebuild-pub-fts
168
-
// new publications will include base_path via insertPublication in indexer.zig
169
131
}
+33
-46
backend/src/extractor.zig
+33
-46
backend/src/extractor.zig
···
4
4
const Allocator = mem.Allocator;
5
5
const zat = @import("zat");
6
6
7
-
/// Detected platform from collection name
8
-
/// Note: pckt and other platforms use site.standard.* collections.
9
-
/// Platform detection from collection only distinguishes leaflet (custom lexicon)
10
-
/// from site.standard users. Actual platform (pckt vs others) is detected later
11
-
/// from publication basePath.
7
+
/// Detected platform from content.$type
12
8
pub const Platform = enum {
13
9
leaflet,
14
-
standardsite, // pckt and others using site.standard.* lexicon
10
+
pckt,
11
+
offprint,
15
12
unknown,
16
13
17
-
pub fn fromCollection(collection: []const u8) Platform {
18
-
if (mem.startsWith(u8, collection, "pub.leaflet.")) return .leaflet;
19
-
if (mem.startsWith(u8, collection, "site.standard.")) return .standardsite;
14
+
pub fn fromContentType(content_type: []const u8) Platform {
15
+
if (mem.startsWith(u8, content_type, "pub.leaflet.")) return .leaflet;
16
+
if (mem.startsWith(u8, content_type, "blog.pckt.")) return .pckt;
17
+
if (mem.startsWith(u8, content_type, "app.offprint.")) return .offprint;
20
18
return .unknown;
21
19
}
22
20
23
-
/// Internal name (for DB storage)
24
21
pub fn name(self: Platform) []const u8 {
25
22
return @tagName(self);
26
23
}
27
-
28
-
/// Display name (for UI)
29
-
pub fn displayName(self: Platform) []const u8 {
30
-
return @tagName(self);
31
-
}
32
24
};
33
25
34
26
/// Extracted document data ready for indexing.
···
42
34
tags: [][]const u8,
43
35
platform: Platform,
44
36
source_collection: []const u8,
45
-
path: ?[]const u8, // URL path from record (e.g., "/001" for zat.dev)
46
37
47
38
pub fn deinit(self: *ExtractedDocument) void {
48
39
self.allocator.free(self.content);
···
63
54
.{ "pub.leaflet.blocks.code", {} },
64
55
});
65
56
66
-
/// Detect platform from collection name
67
-
pub fn detectPlatform(collection: []const u8) Platform {
68
-
return Platform.fromCollection(collection);
57
+
/// Detect platform from record's content.$type field
58
+
pub fn detectPlatform(record: json.ObjectMap) Platform {
59
+
const content = record.get("content") orelse return .unknown;
60
+
if (content != .object) return .unknown;
61
+
62
+
const type_val = content.object.get("$type") orelse return .unknown;
63
+
if (type_val != .string) return .unknown;
64
+
65
+
return Platform.fromContentType(type_val.string);
69
66
}
70
67
71
68
/// Extract document content from a record.
···
76
73
collection: []const u8,
77
74
) !ExtractedDocument {
78
75
const record_val: json.Value = .{ .object = record };
79
-
const platform = detectPlatform(collection);
76
+
const platform = detectPlatform(record);
80
77
81
78
// extract required fields
82
79
const title = zat.json.getString(record_val, "title") orelse return error.MissingTitle;
···
84
81
// extract optional fields
85
82
const created_at = zat.json.getString(record_val, "publishedAt") orelse
86
83
zat.json.getString(record_val, "createdAt");
87
-
88
-
// publication/site can be a string (direct URI) or strongRef object ({uri, cid})
89
-
// zat.json.getString supports paths like "publication.uri"
90
84
const publication_uri = zat.json.getString(record_val, "publication") orelse
91
-
zat.json.getString(record_val, "publication.uri") orelse
92
-
zat.json.getString(record_val, "site") orelse
93
-
zat.json.getString(record_val, "site.uri");
94
-
95
-
// extract URL path (site.standard.document uses "path" field like "/001")
96
-
const path = zat.json.getString(record_val, "path");
85
+
zat.json.getString(record_val, "site"); // site.standard uses "site"
97
86
98
87
// extract tags - allocate owned slice
99
88
const tags = try extractTags(allocator, record_val);
···
111
100
.tags = tags,
112
101
.platform = platform,
113
102
.source_collection = collection,
114
-
.path = path,
115
103
};
116
104
}
117
105
···
234
222
235
223
// --- tests ---
236
224
237
-
test "Platform.fromCollection: leaflet" {
238
-
try std.testing.expectEqual(Platform.leaflet, Platform.fromCollection("pub.leaflet.document"));
239
-
try std.testing.expectEqual(Platform.leaflet, Platform.fromCollection("pub.leaflet.publication"));
225
+
test "Platform.fromContentType: leaflet" {
226
+
try std.testing.expectEqual(Platform.leaflet, Platform.fromContentType("pub.leaflet.content"));
227
+
try std.testing.expectEqual(Platform.leaflet, Platform.fromContentType("pub.leaflet.blocks.text"));
240
228
}
241
229
242
-
test "Platform.fromCollection: standardsite" {
243
-
// pckt and others use site.standard.* collections
244
-
try std.testing.expectEqual(Platform.standardsite, Platform.fromCollection("site.standard.document"));
245
-
try std.testing.expectEqual(Platform.standardsite, Platform.fromCollection("site.standard.publication"));
230
+
test "Platform.fromContentType: pckt" {
231
+
try std.testing.expectEqual(Platform.pckt, Platform.fromContentType("blog.pckt.content"));
232
+
try std.testing.expectEqual(Platform.pckt, Platform.fromContentType("blog.pckt.blocks.whatever"));
233
+
}
234
+
235
+
test "Platform.fromContentType: offprint" {
236
+
try std.testing.expectEqual(Platform.offprint, Platform.fromContentType("app.offprint.content"));
246
237
}
247
238
248
-
test "Platform.fromCollection: unknown" {
249
-
try std.testing.expectEqual(Platform.unknown, Platform.fromCollection("something.else"));
250
-
try std.testing.expectEqual(Platform.unknown, Platform.fromCollection(""));
239
+
test "Platform.fromContentType: unknown" {
240
+
try std.testing.expectEqual(Platform.unknown, Platform.fromContentType("something.else"));
241
+
try std.testing.expectEqual(Platform.unknown, Platform.fromContentType(""));
251
242
}
252
243
253
244
test "Platform.name" {
254
245
try std.testing.expectEqualStrings("leaflet", Platform.leaflet.name());
255
-
try std.testing.expectEqualStrings("standardsite", Platform.standardsite.name());
246
+
try std.testing.expectEqualStrings("pckt", Platform.pckt.name());
247
+
try std.testing.expectEqualStrings("offprint", Platform.offprint.name());
256
248
try std.testing.expectEqualStrings("unknown", Platform.unknown.name());
257
249
}
258
-
259
-
test "Platform.displayName" {
260
-
try std.testing.expectEqualStrings("leaflet", Platform.leaflet.displayName());
261
-
try std.testing.expectEqualStrings("standardsite", Platform.standardsite.displayName());
262
-
}
+5
-34
backend/src/indexer.zig
+5
-34
backend/src/indexer.zig
···
12
12
tags: []const []const u8,
13
13
platform: []const u8,
14
14
source_collection: []const u8,
15
-
path: ?[]const u8,
16
15
) !void {
17
16
const c = db.getClient() orelse return error.NotInitialized;
18
17
19
-
// dedupe: if (did, rkey) exists with different uri, clean up old record first
20
-
// this handles cross-collection duplicates (e.g., pub.leaflet.document + site.standard.document)
21
-
if (c.query("SELECT uri FROM documents WHERE did = ? AND rkey = ?", &.{ did, rkey })) |result_val| {
22
-
var result = result_val;
23
-
defer result.deinit();
24
-
if (result.first()) |row| {
25
-
const old_uri = row.text(0);
26
-
if (!std.mem.eql(u8, old_uri, uri)) {
27
-
c.exec("DELETE FROM documents_fts WHERE uri = ?", &.{old_uri}) catch {};
28
-
c.exec("DELETE FROM document_tags WHERE document_uri = ?", &.{old_uri}) catch {};
29
-
c.exec("DELETE FROM documents WHERE uri = ?", &.{old_uri}) catch {};
30
-
}
31
-
}
32
-
} else |_| {}
33
-
34
18
try c.exec(
35
-
"INSERT OR REPLACE INTO documents (uri, did, rkey, title, content, created_at, publication_uri, platform, source_collection, path) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
36
-
&.{ uri, did, rkey, title, content, created_at orelse "", publication_uri orelse "", platform, source_collection, path orelse "" },
19
+
"INSERT OR REPLACE INTO documents (uri, did, rkey, title, content, created_at, publication_uri, platform, source_collection) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
20
+
&.{ uri, did, rkey, title, content, created_at orelse "", publication_uri orelse "", platform, source_collection },
37
21
);
38
22
39
23
// update FTS index
···
63
47
) !void {
64
48
const c = db.getClient() orelse return error.NotInitialized;
65
49
66
-
// dedupe: if (did, rkey) exists with different uri, clean up old record first
67
-
if (c.query("SELECT uri FROM publications WHERE did = ? AND rkey = ?", &.{ did, rkey })) |result_val| {
68
-
var result = result_val;
69
-
defer result.deinit();
70
-
if (result.first()) |row| {
71
-
const old_uri = row.text(0);
72
-
if (!std.mem.eql(u8, old_uri, uri)) {
73
-
c.exec("DELETE FROM publications_fts WHERE uri = ?", &.{old_uri}) catch {};
74
-
c.exec("DELETE FROM publications WHERE uri = ?", &.{old_uri}) catch {};
75
-
}
76
-
}
77
-
} else |_| {}
78
-
79
50
try c.exec(
80
51
"INSERT OR REPLACE INTO publications (uri, did, rkey, name, description, base_path) VALUES (?, ?, ?, ?, ?, ?)",
81
52
&.{ uri, did, rkey, name, description orelse "", base_path orelse "" },
82
53
);
83
54
84
-
// update FTS index (includes base_path for subdomain search)
55
+
// update FTS index
85
56
c.exec("DELETE FROM publications_fts WHERE uri = ?", &.{uri}) catch {};
86
57
c.exec(
87
-
"INSERT INTO publications_fts (uri, name, description, base_path) VALUES (?, ?, ?, ?)",
88
-
&.{ uri, name, description orelse "", base_path orelse "" },
58
+
"INSERT INTO publications_fts (uri, name, description) VALUES (?, ?, ?)",
59
+
&.{ uri, name, description orelse "" },
89
60
) catch {};
90
61
}
91
62
+1
-2
backend/src/main.zig
+1
-2
backend/src/main.zig
···
43
43
var listener = try address.listen(.{ .reuse_address = true });
44
44
defer listener.deinit();
45
45
46
-
const app_name = posix.getenv("APP_NAME") orelse "leaflet-search";
47
-
std.debug.print("{s} listening on http://0.0.0.0:{d} (max {} workers)\n", .{ app_name, port, MAX_HTTP_WORKERS });
46
+
std.debug.print("leaflet-search listening on http://0.0.0.0:{d} (max {} workers)\n", .{ port, MAX_HTTP_WORKERS });
48
47
49
48
while (true) {
50
49
const conn = listener.accept() catch |err| {
+22
-161
backend/src/search.zig
+22
-161
backend/src/search.zig
···
16
16
rkey: []const u8,
17
17
basePath: []const u8,
18
18
platform: []const u8,
19
-
path: []const u8 = "", // URL path from record (e.g., "/001")
20
19
};
21
20
22
21
/// Document search result (internal)
···
30
29
basePath: []const u8,
31
30
hasPublication: bool,
32
31
platform: []const u8,
33
-
path: []const u8,
34
32
35
33
fn fromRow(row: db.Row) Doc {
36
34
return .{
···
43
41
.basePath = row.text(6),
44
42
.hasPublication = row.int(7) != 0,
45
43
.platform = row.text(8),
46
-
.path = row.text(9),
47
44
};
48
45
}
49
46
···
58
55
.rkey = self.rkey,
59
56
.basePath = self.basePath,
60
57
.platform = self.platform,
61
-
.path = self.path,
62
58
};
63
59
}
64
60
};
65
61
66
62
const DocsByTag = zql.Query(
67
63
\\SELECT d.uri, d.did, d.title, '' as snippet,
68
-
\\ d.created_at, d.rkey,
69
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
64
+
\\ d.created_at, d.rkey, COALESCE(p.base_path, '') as base_path,
70
65
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
71
-
\\ d.platform, COALESCE(d.path, '') as path
66
+
\\ d.platform
72
67
\\FROM documents d
73
68
\\LEFT JOIN publications p ON d.publication_uri = p.uri
74
69
\\JOIN document_tags dt ON d.uri = dt.document_uri
···
79
74
const DocsByFtsAndTag = zql.Query(
80
75
\\SELECT f.uri, d.did, d.title,
81
76
\\ snippet(documents_fts, 2, '', '', '...', 32) as snippet,
82
-
\\ d.created_at, d.rkey,
83
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
77
+
\\ d.created_at, d.rkey, COALESCE(p.base_path, '') as base_path,
84
78
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
85
-
\\ d.platform, COALESCE(d.path, '') as path
79
+
\\ d.platform
86
80
\\FROM documents_fts f
87
81
\\JOIN documents d ON f.uri = d.uri
88
82
\\LEFT JOIN publications p ON d.publication_uri = p.uri
···
94
88
const DocsByFts = zql.Query(
95
89
\\SELECT f.uri, d.did, d.title,
96
90
\\ snippet(documents_fts, 2, '', '', '...', 32) as snippet,
97
-
\\ d.created_at, d.rkey,
98
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
91
+
\\ d.created_at, d.rkey, COALESCE(p.base_path, '') as base_path,
99
92
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
100
-
\\ d.platform, COALESCE(d.path, '') as path
93
+
\\ d.platform
101
94
\\FROM documents_fts f
102
95
\\JOIN documents d ON f.uri = d.uri
103
96
\\LEFT JOIN publications p ON d.publication_uri = p.uri
···
105
98
\\ORDER BY rank LIMIT 40
106
99
);
107
100
108
-
const DocsByFtsAndPlatform = zql.Query(
109
-
\\SELECT f.uri, d.did, d.title,
110
-
\\ snippet(documents_fts, 2, '', '', '...', 32) as snippet,
111
-
\\ d.created_at, d.rkey,
112
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
113
-
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
114
-
\\ d.platform, COALESCE(d.path, '') as path
115
-
\\FROM documents_fts f
116
-
\\JOIN documents d ON f.uri = d.uri
117
-
\\LEFT JOIN publications p ON d.publication_uri = p.uri
118
-
\\WHERE documents_fts MATCH :query AND d.platform = :platform
119
-
\\ORDER BY rank LIMIT 40
120
-
);
121
-
122
-
const DocsByTagAndPlatform = zql.Query(
123
-
\\SELECT d.uri, d.did, d.title, '' as snippet,
124
-
\\ d.created_at, d.rkey,
125
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
126
-
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
127
-
\\ d.platform, COALESCE(d.path, '') as path
128
-
\\FROM documents d
129
-
\\LEFT JOIN publications p ON d.publication_uri = p.uri
130
-
\\JOIN document_tags dt ON d.uri = dt.document_uri
131
-
\\WHERE dt.tag = :tag AND d.platform = :platform
132
-
\\ORDER BY d.created_at DESC LIMIT 40
133
-
);
134
-
135
-
const DocsByFtsAndTagAndPlatform = zql.Query(
136
-
\\SELECT f.uri, d.did, d.title,
137
-
\\ snippet(documents_fts, 2, '', '', '...', 32) as snippet,
138
-
\\ d.created_at, d.rkey,
139
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
140
-
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
141
-
\\ d.platform, COALESCE(d.path, '') as path
142
-
\\FROM documents_fts f
143
-
\\JOIN documents d ON f.uri = d.uri
144
-
\\LEFT JOIN publications p ON d.publication_uri = p.uri
145
-
\\JOIN document_tags dt ON d.uri = dt.document_uri
146
-
\\WHERE documents_fts MATCH :query AND dt.tag = :tag AND d.platform = :platform
147
-
\\ORDER BY rank LIMIT 40
148
-
);
149
-
150
-
const DocsByPlatform = zql.Query(
151
-
\\SELECT d.uri, d.did, d.title, '' as snippet,
152
-
\\ d.created_at, d.rkey,
153
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d.did LIMIT 1), '') as base_path,
154
-
\\ CASE WHEN d.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
155
-
\\ d.platform, COALESCE(d.path, '') as path
156
-
\\FROM documents d
157
-
\\LEFT JOIN publications p ON d.publication_uri = p.uri
158
-
\\WHERE d.platform = :platform
159
-
\\ORDER BY d.created_at DESC LIMIT 40
160
-
);
161
-
162
-
// Find documents by their publication's base_path (subdomain search)
163
-
// e.g., searching "gyst" finds all docs on gyst.leaflet.pub
164
-
const DocsByPubBasePath = zql.Query(
165
-
\\SELECT d.uri, d.did, d.title, '' as snippet,
166
-
\\ d.created_at, d.rkey,
167
-
\\ p.base_path,
168
-
\\ 1 as has_publication,
169
-
\\ d.platform, COALESCE(d.path, '') as path
170
-
\\FROM documents d
171
-
\\JOIN publications p ON d.publication_uri = p.uri
172
-
\\JOIN publications_fts pf ON p.uri = pf.uri
173
-
\\WHERE publications_fts MATCH :query
174
-
\\ORDER BY d.created_at DESC LIMIT 40
175
-
);
176
-
177
-
const DocsByPubBasePathAndPlatform = zql.Query(
178
-
\\SELECT d.uri, d.did, d.title, '' as snippet,
179
-
\\ d.created_at, d.rkey,
180
-
\\ p.base_path,
181
-
\\ 1 as has_publication,
182
-
\\ d.platform, COALESCE(d.path, '') as path
183
-
\\FROM documents d
184
-
\\JOIN publications p ON d.publication_uri = p.uri
185
-
\\JOIN publications_fts pf ON p.uri = pf.uri
186
-
\\WHERE publications_fts MATCH :query AND d.platform = :platform
187
-
\\ORDER BY d.created_at DESC LIMIT 40
188
-
);
189
-
190
101
/// Publication search result (internal)
191
102
const Pub = struct {
192
103
uri: []const u8,
···
195
106
snippet: []const u8,
196
107
rkey: []const u8,
197
108
basePath: []const u8,
198
-
platform: []const u8,
199
109
200
110
fn fromRow(row: db.Row) Pub {
201
111
return .{
···
205
115
.snippet = row.text(3),
206
116
.rkey = row.text(4),
207
117
.basePath = row.text(5),
208
-
.platform = row.text(6),
209
118
};
210
119
}
211
120
···
218
127
.snippet = self.snippet,
219
128
.rkey = self.rkey,
220
129
.basePath = self.basePath,
221
-
.platform = self.platform,
130
+
.platform = "leaflet", // publications are leaflet-only for now
222
131
};
223
132
}
224
133
};
···
226
135
const PubSearch = zql.Query(
227
136
\\SELECT f.uri, p.did, p.name,
228
137
\\ snippet(publications_fts, 2, '', '', '...', 32) as snippet,
229
-
\\ p.rkey, p.base_path, p.platform
138
+
\\ p.rkey, p.base_path
230
139
\\FROM publications_fts f
231
140
\\JOIN publications p ON f.uri = p.uri
232
141
\\WHERE publications_fts MATCH :query
···
243
152
try jw.beginArray();
244
153
245
154
const fts_query = try buildFtsQuery(alloc, query);
246
-
const has_query = query.len > 0;
247
-
const has_tag = tag_filter != null;
248
-
const has_platform = platform_filter != null;
249
155
250
-
// track seen URIs for deduplication (content match + base_path match)
251
-
var seen_uris = std.StringHashMap(void).init(alloc);
252
-
defer seen_uris.deinit();
253
-
254
-
// search documents by content (title, content) - handle all filter combinations
255
-
var doc_result = if (has_query and has_tag and has_platform)
256
-
c.query(DocsByFtsAndTagAndPlatform.positional, DocsByFtsAndTagAndPlatform.bind(.{
257
-
.query = fts_query,
258
-
.tag = tag_filter.?,
259
-
.platform = platform_filter.?,
260
-
})) catch null
261
-
else if (has_query and has_tag)
262
-
c.query(DocsByFtsAndTag.positional, DocsByFtsAndTag.bind(.{ .query = fts_query, .tag = tag_filter.? })) catch null
263
-
else if (has_query and has_platform)
264
-
c.query(DocsByFtsAndPlatform.positional, DocsByFtsAndPlatform.bind(.{ .query = fts_query, .platform = platform_filter.? })) catch null
265
-
else if (has_query)
266
-
c.query(DocsByFts.positional, DocsByFts.bind(.{ .query = fts_query })) catch null
267
-
else if (has_tag and has_platform)
268
-
c.query(DocsByTagAndPlatform.positional, DocsByTagAndPlatform.bind(.{ .tag = tag_filter.?, .platform = platform_filter.? })) catch null
269
-
else if (has_tag)
156
+
// search documents
157
+
var doc_result = if (query.len == 0 and tag_filter != null)
270
158
c.query(DocsByTag.positional, DocsByTag.bind(.{ .tag = tag_filter.? })) catch null
271
-
else if (has_platform)
272
-
c.query(DocsByPlatform.positional, DocsByPlatform.bind(.{ .platform = platform_filter.? })) catch null
159
+
else if (tag_filter) |tag|
160
+
c.query(DocsByFtsAndTag.positional, DocsByFtsAndTag.bind(.{ .query = fts_query, .tag = tag })) catch null
273
161
else
274
-
null; // no filters at all - return empty
162
+
c.query(DocsByFts.positional, DocsByFts.bind(.{ .query = fts_query })) catch null;
275
163
276
164
if (doc_result) |*res| {
277
165
defer res.deinit();
278
166
for (res.rows) |row| {
279
167
const doc = Doc.fromRow(row);
280
-
// dupe URI for hash map (outlives result)
281
-
const uri_dupe = try alloc.dupe(u8, doc.uri);
282
-
try seen_uris.put(uri_dupe, {});
168
+
// filter by platform if specified
169
+
if (platform_filter) |pf| {
170
+
if (!std.mem.eql(u8, doc.platform, pf)) continue;
171
+
}
283
172
try jw.write(doc.toJson());
284
173
}
285
174
}
286
175
287
-
// also search documents by publication base_path (subdomain search)
288
-
// e.g., "gyst" finds all docs on gyst.leaflet.pub even if content doesn't contain "gyst"
289
-
// skip if tag filter is set (tag filter is content-specific)
290
-
if (has_query and !has_tag) {
291
-
var basepath_result = if (has_platform)
292
-
c.query(DocsByPubBasePathAndPlatform.positional, DocsByPubBasePathAndPlatform.bind(.{
293
-
.query = fts_query,
294
-
.platform = platform_filter.?,
295
-
})) catch null
296
-
else
297
-
c.query(DocsByPubBasePath.positional, DocsByPubBasePath.bind(.{ .query = fts_query })) catch null;
298
-
299
-
if (basepath_result) |*res| {
300
-
defer res.deinit();
301
-
for (res.rows) |row| {
302
-
const doc = Doc.fromRow(row);
303
-
// deduplicate: skip if already found by content search
304
-
if (!seen_uris.contains(doc.uri)) {
305
-
try jw.write(doc.toJson());
306
-
}
307
-
}
308
-
}
309
-
}
310
-
311
-
// publications are excluded when filtering by tag or platform
312
-
// (platform filter is for documents only - publications don't have meaningful platform distinction)
313
-
if (tag_filter == null and platform_filter == null) {
176
+
// publications are excluded when filtering by tag or platform (only leaflet has publications)
177
+
if (tag_filter == null and (platform_filter == null or std.mem.eql(u8, platform_filter.?, "leaflet"))) {
314
178
var pub_result = c.query(
315
179
PubSearch.positional,
316
180
PubSearch.bind(.{ .query = fts_query }),
···
318
182
319
183
if (pub_result) |*res| {
320
184
defer res.deinit();
321
-
for (res.rows) |row| {
322
-
try jw.write(Pub.fromRow(row).toJson());
323
-
}
185
+
for (res.rows) |row| try jw.write(Pub.fromRow(row).toJson());
324
186
}
325
187
}
326
188
···
353
215
// brute-force cosine similarity search (no vector index needed)
354
216
var res = c.query(
355
217
\\SELECT d2.uri, d2.did, d2.title, '' as snippet,
356
-
\\ d2.created_at, d2.rkey,
357
-
\\ COALESCE(p.base_path, (SELECT base_path FROM publications WHERE did = d2.did LIMIT 1), '') as base_path,
218
+
\\ d2.created_at, d2.rkey, COALESCE(p.base_path, '') as base_path,
358
219
\\ CASE WHEN d2.publication_uri != '' THEN 1 ELSE 0 END as has_publication,
359
-
\\ d2.platform, COALESCE(d2.path, '') as path
220
+
\\ d2.platform
360
221
\\FROM documents d1, documents d2
361
222
\\LEFT JOIN publications p ON d2.publication_uri = p.uri
362
223
\\WHERE d1.uri = ?
+2
-18
backend/src/server.zig
+2
-18
backend/src/server.zig
···
56
56
try sendJson(request, "{\"status\":\"ok\"}");
57
57
} else if (mem.eql(u8, target, "/popular")) {
58
58
try handlePopular(request);
59
-
} else if (mem.eql(u8, target, "/platforms")) {
60
-
try handlePlatforms(request);
61
59
} else if (mem.eql(u8, target, "/dashboard")) {
62
60
try handleDashboard(request);
63
61
} else if (mem.eql(u8, target, "/api/dashboard")) {
···
113
111
try sendJson(request, popular);
114
112
}
115
113
116
-
fn handlePlatforms(request: *http.Server.Request) !void {
117
-
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
118
-
defer arena.deinit();
119
-
const alloc = arena.allocator();
120
-
121
-
const data = try stats.getPlatformCounts(alloc);
122
-
try sendJson(request, data);
123
-
}
124
-
125
114
fn parseQueryParam(alloc: std.mem.Allocator, target: []const u8, param: []const u8) ![]const u8 {
126
115
// look for ?param= or ¶m=
127
116
const patterns = [_][]const u8{ "?", "&" };
···
153
142
var response: std.ArrayList(u8) = .{};
154
143
defer response.deinit(alloc);
155
144
156
-
try response.print(alloc, "{{\"documents\":{d},\"publications\":{d},\"embeddings\":{d},\"cache_hits\":{d},\"cache_misses\":{d}}}", .{ db_stats.documents, db_stats.publications, db_stats.embeddings, db_stats.cache_hits, db_stats.cache_misses });
145
+
try response.print(alloc, "{{\"documents\":{d},\"publications\":{d},\"cache_hits\":{d},\"cache_misses\":{d}}}", .{ db_stats.documents, db_stats.publications, db_stats.cache_hits, db_stats.cache_misses });
157
146
158
147
try sendJson(request, response.items);
159
148
}
···
209
198
try sendJson(request, json_response);
210
199
}
211
200
212
-
fn getDashboardUrl() []const u8 {
213
-
return std.posix.getenv("DASHBOARD_URL") orelse "https://leaflet-search.pages.dev/dashboard.html";
214
-
}
215
-
216
201
fn handleDashboard(request: *http.Server.Request) !void {
217
-
const dashboard_url = getDashboardUrl();
218
202
try request.respond("", .{
219
203
.status = .moved_permanently,
220
204
.extra_headers = &.{
221
-
.{ .name = "location", .value = dashboard_url },
205
+
.{ .name = "location", .value = "https://leaflet-search.pages.dev/dashboard.html" },
222
206
},
223
207
});
224
208
}
+8
-64
backend/src/stats.zig
+8
-64
backend/src/stats.zig
···
38
38
pub const Stats = struct {
39
39
documents: i64,
40
40
publications: i64,
41
-
embeddings: i64,
42
41
searches: i64,
43
42
errors: i64,
44
43
started_at: i64,
···
46
45
cache_misses: i64,
47
46
};
48
47
49
-
const default_stats: Stats = .{ .documents = 0, .publications = 0, .embeddings = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 };
50
-
51
48
pub fn getStats() Stats {
52
-
const c = db.getClient() orelse return default_stats;
49
+
const c = db.getClient() orelse return .{ .documents = 0, .publications = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 };
53
50
54
51
var res = c.query(
55
52
\\SELECT
56
53
\\ (SELECT COUNT(*) FROM documents) as docs,
57
54
\\ (SELECT COUNT(*) FROM publications) as pubs,
58
-
\\ (SELECT COUNT(*) FROM documents WHERE embedding IS NOT NULL) as embeddings,
59
55
\\ (SELECT total_searches FROM stats WHERE id = 1) as searches,
60
56
\\ (SELECT total_errors FROM stats WHERE id = 1) as errors,
61
57
\\ (SELECT service_started_at FROM stats WHERE id = 1) as started_at,
62
58
\\ (SELECT COALESCE(cache_hits, 0) FROM stats WHERE id = 1) as cache_hits,
63
59
\\ (SELECT COALESCE(cache_misses, 0) FROM stats WHERE id = 1) as cache_misses
64
-
, &.{}) catch return default_stats;
60
+
, &.{}) catch return .{ .documents = 0, .publications = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 };
65
61
defer res.deinit();
66
62
67
-
const row = res.first() orelse return default_stats;
63
+
const row = res.first() orelse return .{ .documents = 0, .publications = 0, .searches = 0, .errors = 0, .started_at = 0, .cache_hits = 0, .cache_misses = 0 };
68
64
return .{
69
65
.documents = row.int(0),
70
66
.publications = row.int(1),
71
-
.embeddings = row.int(2),
72
-
.searches = row.int(3),
73
-
.errors = row.int(4),
74
-
.started_at = row.int(5),
75
-
.cache_hits = row.int(6),
76
-
.cache_misses = row.int(7),
67
+
.searches = row.int(2),
68
+
.errors = row.int(3),
69
+
.started_at = row.int(4),
70
+
.cache_hits = row.int(5),
71
+
.cache_misses = row.int(6),
77
72
};
78
73
}
79
74
···
105
100
pub fn recordCacheMiss() void {
106
101
const c = db.getClient() orelse return;
107
102
c.exec("UPDATE stats SET cache_misses = COALESCE(cache_misses, 0) + 1 WHERE id = 1", &.{}) catch {};
108
-
}
109
-
110
-
const PlatformCount = struct { platform: []const u8, count: i64 };
111
-
112
-
pub fn getPlatformCounts(alloc: Allocator) ![]const u8 {
113
-
const c = db.getClient() orelse return error.NotInitialized;
114
-
115
-
var output: std.Io.Writer.Allocating = .init(alloc);
116
-
errdefer output.deinit();
117
-
118
-
var jw: json.Stringify = .{ .writer = &output.writer };
119
-
try jw.beginObject();
120
-
121
-
// documents by platform
122
-
try jw.objectField("documents");
123
-
if (c.query("SELECT platform, COUNT(*) as count FROM documents GROUP BY platform ORDER BY count DESC", &.{})) |res_val| {
124
-
var res = res_val;
125
-
defer res.deinit();
126
-
try jw.beginArray();
127
-
for (res.rows) |row| try jw.write(PlatformCount{ .platform = row.text(0), .count = row.int(1) });
128
-
try jw.endArray();
129
-
} else |_| {
130
-
try jw.beginArray();
131
-
try jw.endArray();
132
-
}
133
-
134
-
// FTS document count
135
-
try jw.objectField("fts_count");
136
-
if (c.query("SELECT COUNT(*) FROM documents_fts", &.{})) |res_val| {
137
-
var res = res_val;
138
-
defer res.deinit();
139
-
if (res.first()) |row| {
140
-
try jw.write(row.int(0));
141
-
} else try jw.write(0);
142
-
} else |_| try jw.write(0);
143
-
144
-
// sample URIs from each platform (for debugging)
145
-
try jw.objectField("sample_standardsite");
146
-
if (c.query("SELECT uri FROM documents WHERE platform = 'standardsite' LIMIT 3", &.{})) |res_val| {
147
-
var res = res_val;
148
-
defer res.deinit();
149
-
try jw.beginArray();
150
-
for (res.rows) |row| try jw.write(row.text(0));
151
-
try jw.endArray();
152
-
} else |_| {
153
-
try jw.beginArray();
154
-
try jw.endArray();
155
-
}
156
-
157
-
try jw.endObject();
158
-
return try output.toOwnedSlice();
159
103
}
160
104
161
105
pub fn getPopular(alloc: Allocator, limit: usize) ![]const u8 {
+35
-98
backend/src/tap.zig
+35
-98
backend/src/tap.zig
···
60
60
61
61
const Handler = struct {
62
62
allocator: Allocator,
63
-
client: *websocket.Client,
64
63
msg_count: usize = 0,
65
-
ack_buf: [64]u8 = undefined,
66
64
67
65
pub fn serverMessage(self: *Handler, data: []const u8) !void {
68
66
self.msg_count += 1;
69
67
if (self.msg_count % 100 == 1) {
70
68
std.debug.print("tap: received {} messages\n", .{self.msg_count});
71
69
}
72
-
73
-
// extract message ID for ACK
74
-
const msg_id = extractMessageId(self.allocator, data);
75
-
76
-
// process the message
77
70
processMessage(self.allocator, data) catch |err| {
78
71
std.debug.print("message processing error: {}\n", .{err});
79
-
// still ACK even on error to avoid infinite retries
80
-
};
81
-
82
-
// send ACK if we have a message ID
83
-
if (msg_id) |id| {
84
-
self.sendAck(id);
85
-
}
86
-
}
87
-
88
-
fn sendAck(self: *Handler, msg_id: i64) void {
89
-
const ack_json = std.fmt.bufPrint(&self.ack_buf, "{{\"type\":\"ack\",\"id\":{d}}}", .{msg_id}) catch |err| {
90
-
std.debug.print("tap: ACK format error: {}\n", .{err});
91
-
return;
92
-
};
93
-
std.debug.print("tap: sending ACK for id={d}\n", .{msg_id});
94
-
self.client.write(@constCast(ack_json)) catch |err| {
95
-
std.debug.print("tap: failed to send ACK: {}\n", .{err});
96
72
};
97
73
}
98
74
···
100
76
std.debug.print("tap connection closed\n", .{});
101
77
}
102
78
};
103
-
104
-
fn extractMessageId(allocator: Allocator, payload: []const u8) ?i64 {
105
-
const parsed = json.parseFromSlice(json.Value, allocator, payload, .{}) catch return null;
106
-
defer parsed.deinit();
107
-
return zat.json.getInt(parsed.value, "id");
108
-
}
109
79
110
80
fn connect(allocator: Allocator) !void {
111
81
const host = getTapHost();
···
136
106
137
107
std.debug.print("tap connected!\n", .{});
138
108
139
-
var handler = Handler{ .allocator = allocator, .client = &client };
109
+
var handler = Handler{ .allocator = allocator };
140
110
client.readLoop(&handler) catch |err| {
141
111
std.debug.print("websocket read loop error: {}\n", .{err});
142
112
return err;
···
146
116
/// TAP record envelope - extracted via zat.json.extractAt
147
117
const TapRecord = struct {
148
118
collection: []const u8,
149
-
action: []const u8, // "create", "update", "delete"
119
+
action: zat.CommitAction,
150
120
did: []const u8,
151
121
rkey: []const u8,
152
-
153
-
pub fn isCreate(self: TapRecord) bool {
154
-
return mem.eql(u8, self.action, "create");
155
-
}
156
-
pub fn isUpdate(self: TapRecord) bool {
157
-
return mem.eql(u8, self.action, "update");
158
-
}
159
-
pub fn isDelete(self: TapRecord) bool {
160
-
return mem.eql(u8, self.action, "delete");
161
-
}
162
122
};
163
123
164
124
/// Leaflet publication fields
···
169
129
};
170
130
171
131
fn processMessage(allocator: Allocator, payload: []const u8) !void {
172
-
const parsed = json.parseFromSlice(json.Value, allocator, payload, .{}) catch {
173
-
std.debug.print("tap: JSON parse failed, first 100 bytes: {s}\n", .{payload[0..@min(payload.len, 100)]});
174
-
return;
175
-
};
132
+
const parsed = json.parseFromSlice(json.Value, allocator, payload, .{}) catch return;
176
133
defer parsed.deinit();
177
134
178
135
// check message type
179
-
const msg_type = zat.json.getString(parsed.value, "type") orelse {
180
-
std.debug.print("tap: no type field in message\n", .{});
181
-
return;
182
-
};
183
-
136
+
const msg_type = zat.json.getString(parsed.value, "type") orelse return;
184
137
if (!mem.eql(u8, msg_type, "record")) return;
185
138
186
-
// extract record envelope (extractAt ignores extra fields like live, rev, cid)
187
-
const rec = zat.json.extractAt(TapRecord, allocator, parsed.value, .{"record"}) catch |err| {
188
-
std.debug.print("tap: failed to extract record: {}\n", .{err});
189
-
return;
190
-
};
139
+
// extract record envelope
140
+
const rec = zat.json.extractAt(TapRecord, allocator, parsed.value, .{"record"}) catch return;
191
141
192
142
// validate DID
193
143
const did = zat.Did.parse(rec.did) orelse return;
194
144
195
-
// build AT-URI string (no allocation - uses stack buffer)
196
-
var uri_buf: [256]u8 = undefined;
197
-
const uri = zat.AtUri.format(&uri_buf, did.raw, rec.collection, rec.rkey) orelse return;
145
+
// build AT-URI string
146
+
const uri = try std.fmt.allocPrint(allocator, "at://{s}/{s}/{s}", .{ did.raw, rec.collection, rec.rkey });
147
+
defer allocator.free(uri);
198
148
199
-
if (rec.isCreate() or rec.isUpdate()) {
200
-
const inner_record = zat.json.getObject(parsed.value, "record.record") orelse return;
149
+
switch (rec.action) {
150
+
.create, .update => {
151
+
const record_obj = zat.json.getObject(parsed.value, "record.record") orelse return;
201
152
202
-
if (isDocumentCollection(rec.collection)) {
203
-
processDocument(allocator, uri, did.raw, rec.rkey, inner_record, rec.collection) catch |err| {
204
-
std.debug.print("document processing error: {}\n", .{err});
205
-
};
206
-
} else if (isPublicationCollection(rec.collection)) {
207
-
processPublication(allocator, uri, did.raw, rec.rkey, inner_record) catch |err| {
208
-
std.debug.print("publication processing error: {}\n", .{err});
209
-
};
210
-
}
211
-
} else if (rec.isDelete()) {
212
-
if (isDocumentCollection(rec.collection)) {
213
-
indexer.deleteDocument(uri);
214
-
std.debug.print("deleted document: {s}\n", .{uri});
215
-
} else if (isPublicationCollection(rec.collection)) {
216
-
indexer.deletePublication(uri);
217
-
std.debug.print("deleted publication: {s}\n", .{uri});
218
-
}
153
+
if (isDocumentCollection(rec.collection)) {
154
+
processDocument(allocator, uri, did.raw, rec.rkey, record_obj, rec.collection) catch |err| {
155
+
std.debug.print("document processing error: {}\n", .{err});
156
+
};
157
+
} else if (isPublicationCollection(rec.collection)) {
158
+
processPublication(allocator, uri, did.raw, rec.rkey, record_obj) catch |err| {
159
+
std.debug.print("publication processing error: {}\n", .{err});
160
+
};
161
+
}
162
+
},
163
+
.delete => {
164
+
if (isDocumentCollection(rec.collection)) {
165
+
indexer.deleteDocument(uri);
166
+
std.debug.print("deleted document: {s}\n", .{uri});
167
+
} else if (isPublicationCollection(rec.collection)) {
168
+
indexer.deletePublication(uri);
169
+
std.debug.print("deleted publication: {s}\n", .{uri});
170
+
}
171
+
},
219
172
}
220
173
}
221
174
···
239
192
doc.tags,
240
193
doc.platformName(),
241
194
doc.source_collection,
242
-
doc.path,
243
195
);
244
196
std.debug.print("indexed document: {s} [{s}] ({} chars, {} tags)\n", .{ uri, doc.platformName(), doc.content.len, doc.tags.len });
245
197
}
246
198
247
-
fn processPublication(_: Allocator, uri: []const u8, did: []const u8, rkey: []const u8, record: json.ObjectMap) !void {
199
+
fn processPublication(allocator: Allocator, uri: []const u8, did: []const u8, rkey: []const u8, record: json.ObjectMap) !void {
248
200
const record_val: json.Value = .{ .object = record };
201
+
const pub_data = zat.json.extractAt(LeafletPublication, allocator, record_val, .{}) catch return;
249
202
250
-
// extract required field
251
-
const name = zat.json.getString(record_val, "name") orelse return;
252
-
const description = zat.json.getString(record_val, "description");
253
-
254
-
// base_path: try leaflet's "base_path", then site.standard's "url"
255
-
// url is full URL like "https://devlog.pckt.blog", we need just the host
256
-
const base_path = zat.json.getString(record_val, "base_path") orelse
257
-
stripUrlScheme(zat.json.getString(record_val, "url"));
258
-
259
-
try indexer.insertPublication(uri, did, rkey, name, description, base_path);
260
-
std.debug.print("indexed publication: {s} (base_path: {s})\n", .{ uri, base_path orelse "none" });
261
-
}
262
-
263
-
fn stripUrlScheme(url: ?[]const u8) ?[]const u8 {
264
-
const u = url orelse return null;
265
-
if (mem.startsWith(u8, u, "https://")) return u["https://".len..];
266
-
if (mem.startsWith(u8, u, "http://")) return u["http://".len..];
267
-
return u;
203
+
try indexer.insertPublication(uri, did, rkey, pub_data.name, pub_data.description, pub_data.base_path);
204
+
std.debug.print("indexed publication: {s} (base_path: {s})\n", .{ uri, pub_data.base_path orelse "none" });
268
205
}
-226
docs/leaflet-publishing-plan.md
-226
docs/leaflet-publishing-plan.md
···
1
-
# publishing to leaflet.pub
2
-
3
-
## goal
4
-
5
-
publish markdown docs to both:
6
-
1. `site.standard.document` (for search/interop) - already working
7
-
2. `pub.leaflet.document` (for leaflet.pub display) - this plan
8
-
9
-
## the mapping
10
-
11
-
### block types
12
-
13
-
| markdown | leaflet block |
14
-
|----------|---------------|
15
-
| `# heading` | `pub.leaflet.blocks.header` (level 1-6) |
16
-
| paragraph | `pub.leaflet.blocks.text` |
17
-
| ``` code ``` | `pub.leaflet.blocks.code` |
18
-
| `> quote` | `pub.leaflet.blocks.blockquote` |
19
-
| `---` | `pub.leaflet.blocks.horizontalRule` |
20
-
| `- item` | `pub.leaflet.blocks.unorderedList` |
21
-
| `` | `pub.leaflet.blocks.image` (requires blob upload) |
22
-
| `[text](url)` (standalone) | `pub.leaflet.blocks.website` |
23
-
24
-
### inline formatting (facets)
25
-
26
-
leaflet uses byte-indexed facets for inline formatting within text blocks:
27
-
28
-
```json
29
-
{
30
-
"$type": "pub.leaflet.blocks.text",
31
-
"plaintext": "hello world with bold text",
32
-
"facets": [{
33
-
"index": { "byteStart": 17, "byteEnd": 21 },
34
-
"features": [{ "$type": "pub.leaflet.richtext.facet#bold" }]
35
-
}]
36
-
}
37
-
```
38
-
39
-
| markdown | facet type |
40
-
|----------|------------|
41
-
| `**bold**` | `pub.leaflet.richtext.facet#bold` |
42
-
| `*italic*` | `pub.leaflet.richtext.facet#italic` |
43
-
| `` `code` `` | `pub.leaflet.richtext.facet#code` |
44
-
| `[text](url)` | `pub.leaflet.richtext.facet#link` |
45
-
| `~~strike~~` | `pub.leaflet.richtext.facet#strikethrough` |
46
-
47
-
## record structure
48
-
49
-
```json
50
-
{
51
-
"$type": "pub.leaflet.document",
52
-
"author": "did:plc:...",
53
-
"title": "document title",
54
-
"description": "optional description",
55
-
"publishedAt": "2026-01-06T00:00:00Z",
56
-
"publication": "at://did:plc:.../pub.leaflet.publication/rkey",
57
-
"tags": ["tag1", "tag2"],
58
-
"pages": [{
59
-
"$type": "pub.leaflet.pages.linearDocument",
60
-
"id": "page-uuid",
61
-
"blocks": [
62
-
{
63
-
"$type": "pub.leaflet.pages.linearDocument#block",
64
-
"block": { /* one of the block types above */ }
65
-
}
66
-
]
67
-
}]
68
-
}
69
-
```
70
-
71
-
## implementation plan
72
-
73
-
### phase 1: markdown parser
74
-
75
-
add a simple markdown block parser to zat or the publish script:
76
-
77
-
```zig
78
-
const BlockType = enum {
79
-
heading,
80
-
paragraph,
81
-
code,
82
-
blockquote,
83
-
horizontal_rule,
84
-
unordered_list,
85
-
image,
86
-
};
87
-
88
-
const Block = struct {
89
-
type: BlockType,
90
-
content: []const u8,
91
-
level: ?u8 = null, // for headings
92
-
language: ?[]const u8 = null, // for code blocks
93
-
alt: ?[]const u8 = null, // for images
94
-
src: ?[]const u8 = null, // for images
95
-
};
96
-
97
-
fn parseMarkdownBlocks(allocator: Allocator, markdown: []const u8) ![]Block
98
-
```
99
-
100
-
parsing approach:
101
-
- split on blank lines to get blocks
102
-
- identify block type by first characters:
103
-
- `#` โ heading (count `#` for level)
104
-
- ``` โ code block (capture until closing ```)
105
-
- `>` โ blockquote
106
-
- `---` โ horizontal rule
107
-
- `-` or `*` at start โ list item
108
-
- `![` โ image
109
-
- else โ paragraph
110
-
111
-
### phase 2: inline facet extraction
112
-
113
-
for text blocks, extract inline formatting:
114
-
115
-
```zig
116
-
const Facet = struct {
117
-
byte_start: usize,
118
-
byte_end: usize,
119
-
feature: FacetFeature,
120
-
};
121
-
122
-
const FacetFeature = union(enum) {
123
-
bold,
124
-
italic,
125
-
code,
126
-
link: []const u8, // url
127
-
strikethrough,
128
-
};
129
-
130
-
fn extractFacets(allocator: Allocator, text: []const u8) !struct {
131
-
plaintext: []const u8,
132
-
facets: []Facet,
133
-
}
134
-
```
135
-
136
-
approach:
137
-
- scan for `**`, `*`, `` ` ``, `[`, `~~`
138
-
- track byte positions as we strip markers
139
-
- build facet list with adjusted indices
140
-
141
-
### phase 3: image blob upload
142
-
143
-
images need to be uploaded as blobs before referencing:
144
-
145
-
```zig
146
-
fn uploadImageBlob(client: *XrpcClient, allocator: Allocator, image_path: []const u8) !BlobRef
147
-
```
148
-
149
-
for now, could skip images or require them to already be uploaded.
150
-
151
-
### phase 4: json serialization
152
-
153
-
build the full `pub.leaflet.document` record:
154
-
155
-
```zig
156
-
const LeafletDocument = struct {
157
-
@"$type": []const u8 = "pub.leaflet.document",
158
-
author: []const u8,
159
-
title: []const u8,
160
-
description: ?[]const u8 = null,
161
-
publishedAt: []const u8,
162
-
publication: ?[]const u8 = null,
163
-
tags: ?[][]const u8 = null,
164
-
pages: []Page,
165
-
};
166
-
167
-
const Page = struct {
168
-
@"$type": []const u8 = "pub.leaflet.pages.linearDocument",
169
-
id: []const u8,
170
-
blocks: []BlockWrapper,
171
-
};
172
-
```
173
-
174
-
### phase 5: integrate into publish-docs.zig
175
-
176
-
update the publish script to:
177
-
1. parse markdown into blocks
178
-
2. convert to leaflet structure
179
-
3. publish `pub.leaflet.document` alongside `site.standard.document`
180
-
181
-
```zig
182
-
// existing: publish site.standard.document
183
-
try putRecord(&client, allocator, session.did, "site.standard.document", tid.str(), doc_record);
184
-
185
-
// new: also publish pub.leaflet.document
186
-
const leaflet_record = try markdownToLeaflet(allocator, content, title, session.did, pub_uri);
187
-
try putRecord(&client, allocator, session.did, "pub.leaflet.document", tid.str(), leaflet_record);
188
-
```
189
-
190
-
## complexity estimate
191
-
192
-
| component | complexity | notes |
193
-
|-----------|------------|-------|
194
-
| block parsing | medium | regex-free, line-by-line |
195
-
| facet extraction | medium | byte index tracking is fiddly |
196
-
| image upload | low | already have blob upload in xrpc |
197
-
| json serialization | low | std.json handles it |
198
-
| integration | low | add to existing publish flow |
199
-
200
-
total: ~300-500 lines of zig
201
-
202
-
## open questions
203
-
204
-
1. **publication record**: do we need a `pub.leaflet.publication` too, or just documents?
205
-
- leaflet allows standalone documents without publications
206
-
- could skip publication for now
207
-
208
-
2. **image handling**:
209
-
- option A: skip images initially (just text content)
210
-
- option B: require images to be URLs (no blob upload)
211
-
- option C: full blob upload support
212
-
213
-
3. **deduplication**: same rkey for both record types?
214
-
- pro: easy to correlate
215
-
- con: different collections, might not matter
216
-
217
-
4. **validation**: leaflet has a validate endpoint
218
-
- could call `/api/unstable_validate` to check records before publish
219
-
- probably skip for v1
220
-
221
-
## references
222
-
223
-
- [pub.leaflet.document schema](/tmp/leaflet/lexicons/pub/leaflet/document.json)
224
-
- [leaflet publishToPublication.ts](/tmp/leaflet/actions/publishToPublication.ts) - how leaflet creates records
225
-
- [site.standard.document schema](/tmp/standard.site/app/data/lexicons/document.json)
226
-
- paul's site: fetches records, doesn't publish them
-142
docs/tap.md
-142
docs/tap.md
···
1
-
# tap (firehose sync)
2
-
3
-
leaflet-search uses [TAP](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) from bluesky-social/indigo to receive real-time events from the ATProto firehose.
4
-
5
-
## what is tap?
6
-
7
-
tap subscribes to the ATProto firehose, filters for specific collections (e.g., `pub.leaflet.document`), and broadcasts matching events to websocket clients. it also does initial crawling/backfilling of existing records.
8
-
9
-
key behavior: **TAP backfills historical data when repos are added**. when a repo is added to tracking:
10
-
1. TAP fetches the full repo from the account's PDS using `com.atproto.sync.getRepo`
11
-
2. live firehose events during backfill are buffered in memory
12
-
3. historical events (marked `live: false`) are delivered first
13
-
4. after historical events complete, buffered live events are released
14
-
5. subsequent firehose events arrive immediately marked as `live: true`
15
-
16
-
TAP enforces strict per-repo ordering - live events are synchronization barriers that require all prior events to complete first.
17
-
18
-
## message format
19
-
20
-
TAP sends JSON messages over websocket. record events look like:
21
-
22
-
```json
23
-
{
24
-
"type": "record",
25
-
"record": {
26
-
"live": true,
27
-
"did": "did:plc:abc123...",
28
-
"rev": "3mbspmpaidl2a",
29
-
"collection": "pub.leaflet.document",
30
-
"rkey": "3lzyrj6q6gs27",
31
-
"action": "create",
32
-
"record": { ... },
33
-
"cid": "bafyrei..."
34
-
}
35
-
}
36
-
```
37
-
38
-
### field types (important!)
39
-
40
-
| field | type | values | notes |
41
-
|-------|------|--------|-------|
42
-
| type | string | "record", "identity", "account" | message type |
43
-
| action | **string** | "create", "update", "delete" | NOT an enum! |
44
-
| live | bool | true/false | true = firehose, false = resync |
45
-
| collection | string | e.g., "pub.leaflet.document" | lexicon collection |
46
-
47
-
## gotchas
48
-
49
-
1. **action is a string, not an enum** - TAP sends `"action": "create"` as a JSON string. if your parser expects an enum type, extraction will silently fail. use string comparison.
50
-
51
-
2. **collection filters apply to output** - `TAP_COLLECTION_FILTERS` controls which records TAP sends to clients. records from other collections are fetched but not forwarded.
52
-
53
-
3. **signal collection vs collection filters** - `TAP_SIGNAL_COLLECTION` controls auto-discovery of repos (which repos to track), while `TAP_COLLECTION_FILTERS` controls which records from those repos to output. a repo must either be auto-discovered via signal collection OR manually added via `/repos/add`.
54
-
55
-
4. **silent extraction failures** - if using zat's `extractAt`, enable debug logging to see why parsing fails:
56
-
```zig
57
-
pub const std_options = .{
58
-
.log_scope_levels = &.{.{ .scope = .zat, .level = .debug }},
59
-
};
60
-
```
61
-
this will show messages like:
62
-
```
63
-
debug(zat): extractAt: parse failed for Op at path { "op" }: InvalidEnumTag
64
-
```
65
-
66
-
## debugging
67
-
68
-
### check tap connection
69
-
```bash
70
-
fly logs -a leaflet-search-tap --no-tail | tail -30
71
-
```
72
-
73
-
look for:
74
-
- `"connected to firehose"` - successfully connected to bsky relay
75
-
- `"websocket connected"` - backend connected to TAP
76
-
- `"dialing failed"` / `"i/o timeout"` - network issues
77
-
78
-
### check backend is receiving
79
-
```bash
80
-
fly logs -a leaflet-search-backend --no-tail | grep -E "(tap|indexed)"
81
-
```
82
-
83
-
look for:
84
-
- `tap connected!` - connected to TAP
85
-
- `tap: msg_type=record` - receiving messages
86
-
- `indexed document:` - successfully processing
87
-
88
-
### common issues
89
-
90
-
| symptom | cause | fix |
91
-
|---------|-------|-----|
92
-
| `websocket handshake failed: error.Timeout` | TAP not running or network issue | restart TAP, check regions match |
93
-
| `dialing failed: lookup ... i/o timeout` | DNS issues reaching bsky relay | restart TAP, transient network issue |
94
-
| messages received but not indexed | extraction failing (type mismatch) | enable zat debug logging, check field types |
95
-
| repo shows `records: 0` after adding | resync failed or collection not in filters | check TAP logs for resync errors, verify `TAP_COLLECTION_FILTERS` |
96
-
| new platform records not appearing | platform's collection not in `TAP_COLLECTION_FILTERS` | add collection to filters, restart TAP |
97
-
98
-
## TAP API endpoints
99
-
100
-
TAP exposes HTTP endpoints for monitoring and control:
101
-
102
-
| endpoint | description |
103
-
|----------|-------------|
104
-
| `/health` | health check |
105
-
| `/stats/repo-count` | number of tracked repos |
106
-
| `/stats/record-count` | total records processed |
107
-
| `/stats/outbox-buffer` | events waiting to be sent |
108
-
| `/stats/resync-buffer` | DIDs waiting to be resynced |
109
-
| `/stats/cursors` | firehose cursor position |
110
-
| `/info/:did` | repo status: `{"did":"...","state":"active","records":N}` |
111
-
| `/repos/add` | POST with `{"dids":["did:plc:..."]}` to add repos |
112
-
| `/repos/remove` | POST with `{"dids":["did:plc:..."]}` to remove repos |
113
-
114
-
example: check repo status
115
-
```bash
116
-
fly ssh console -a leaflet-search-tap -C "curl -s localhost:2480/info/did:plc:abc123"
117
-
```
118
-
119
-
example: manually add a repo for backfill
120
-
```bash
121
-
fly ssh console -a leaflet-search-tap -C 'curl -X POST -H "Content-Type: application/json" -d "{\"dids\":[\"did:plc:abc123\"]}" localhost:2480/repos/add'
122
-
```
123
-
124
-
## fly.io deployment
125
-
126
-
both TAP and backend should be in the same region for internal networking:
127
-
128
-
```bash
129
-
# check current regions
130
-
fly status -a leaflet-search-tap
131
-
fly status -a leaflet-search-backend
132
-
133
-
# restart TAP if needed
134
-
fly machine restart -a leaflet-search-tap <machine-id>
135
-
```
136
-
137
-
note: changing `primary_region` in fly.toml only affects new machines. to move existing machines, clone to new region and destroy old one.
138
-
139
-
## references
140
-
141
-
- [TAP source (bluesky-social/indigo)](https://github.com/bluesky-social/indigo/tree/main/cmd/tap)
142
-
- [ATProto firehose docs](https://atproto.com/specs/sync#firehose)
+5
-5
mcp/README.md
+5
-5
mcp/README.md
···
1
-
# pub search MCP
1
+
# leaflet-mcp
2
2
3
-
MCP server for [pub search](https://pub-search.waow.tech) - search ATProto publishing platforms (Leaflet, pckt, standard.site).
3
+
MCP server for [Leaflet](https://leaflet.pub) - search decentralized publications on ATProto.
4
4
5
5
## usage
6
6
7
7
### hosted (recommended)
8
8
9
9
```bash
10
-
claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'
10
+
claude mcp add-json leaflet '{"type": "http", "url": "https://leaflet-search-by-zzstoatzz.fastmcp.app/mcp"}'
11
11
```
12
12
13
13
### local
···
15
15
run the MCP server locally with `uvx`:
16
16
17
17
```bash
18
-
uvx --from git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp pub-search
18
+
uvx --from git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp leaflet-mcp
19
19
```
20
20
21
21
to add it to claude code as a local stdio server:
22
22
23
23
```bash
24
-
claude mcp add pub-search -- uvx --from 'git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp' pub-search
24
+
claude mcp add leaflet -- uvx --from 'git+https://github.com/zzstoatzz/leaflet-search#subdirectory=mcp' leaflet-mcp
25
25
```
26
26
27
27
## workflow
+5
-5
mcp/pyproject.toml
+5
-5
mcp/pyproject.toml
···
1
1
[project]
2
-
name = "pub-search"
2
+
name = "leaflet-mcp"
3
3
dynamic = ["version"]
4
-
description = "MCP server for searching ATProto publishing platforms (Leaflet, pckt, and more)"
4
+
description = "MCP server for Leaflet - search decentralized publications on ATProto"
5
5
readme = "README.md"
6
6
authors = [{ name = "zzstoatzz", email = "thrast36@gmail.com" }]
7
7
requires-python = ">=3.10"
8
8
license = "MIT"
9
9
10
-
keywords = ["pub-search", "mcp", "atproto", "publications", "search", "fastmcp", "leaflet", "pckt"]
10
+
keywords = ["leaflet", "mcp", "atproto", "publications", "search", "fastmcp"]
11
11
12
12
classifiers = [
13
13
"Development Status :: 3 - Alpha",
···
27
27
]
28
28
29
29
[project.scripts]
30
-
pub-search = "pub_search.server:main"
30
+
leaflet-mcp = "leaflet_mcp.server:main"
31
31
32
32
[build-system]
33
33
requires = ["hatchling", "uv-dynamic-versioning>=0.7.0"]
34
34
build-backend = "hatchling.build"
35
35
36
36
[tool.hatch.build.targets.wheel]
37
-
packages = ["src/pub_search"]
37
+
packages = ["src/leaflet_mcp"]
38
38
39
39
[tool.hatch.version]
40
40
source = "uv-dynamic-versioning"
+5
mcp/src/leaflet_mcp/__init__.py
+5
mcp/src/leaflet_mcp/__init__.py
+58
mcp/src/leaflet_mcp/_types.py
+58
mcp/src/leaflet_mcp/_types.py
···
1
+
"""Type definitions for Leaflet MCP responses."""
2
+
3
+
from typing import Literal
4
+
5
+
from pydantic import BaseModel, computed_field
6
+
7
+
8
+
class SearchResult(BaseModel):
9
+
"""A search result from the Leaflet API."""
10
+
11
+
type: Literal["article", "looseleaf", "publication"]
12
+
uri: str
13
+
did: str
14
+
title: str
15
+
snippet: str
16
+
createdAt: str = ""
17
+
rkey: str
18
+
basePath: str = ""
19
+
20
+
@computed_field
21
+
@property
22
+
def url(self) -> str:
23
+
"""web URL for this document."""
24
+
if self.basePath:
25
+
return f"https://{self.basePath}/{self.rkey}"
26
+
return ""
27
+
28
+
29
+
class Tag(BaseModel):
30
+
"""A tag with document count."""
31
+
32
+
tag: str
33
+
count: int
34
+
35
+
36
+
class PopularSearch(BaseModel):
37
+
"""A popular search query with count."""
38
+
39
+
query: str
40
+
count: int
41
+
42
+
43
+
class Stats(BaseModel):
44
+
"""Leaflet index statistics."""
45
+
46
+
documents: int
47
+
publications: int
48
+
49
+
50
+
class Document(BaseModel):
51
+
"""Full document content from ATProto."""
52
+
53
+
uri: str
54
+
title: str
55
+
content: str
56
+
createdAt: str = ""
57
+
tags: list[str] = []
58
+
publicationUri: str = ""
+21
mcp/src/leaflet_mcp/client.py
+21
mcp/src/leaflet_mcp/client.py
···
1
+
"""HTTP client for Leaflet search API."""
2
+
3
+
import os
4
+
from contextlib import asynccontextmanager
5
+
from typing import AsyncIterator
6
+
7
+
import httpx
8
+
9
+
# configurable via env var, defaults to production
10
+
LEAFLET_API_URL = os.getenv("LEAFLET_API_URL", "https://leaflet-search-backend.fly.dev")
11
+
12
+
13
+
@asynccontextmanager
14
+
async def get_http_client() -> AsyncIterator[httpx.AsyncClient]:
15
+
"""Get an async HTTP client for Leaflet API requests."""
16
+
async with httpx.AsyncClient(
17
+
base_url=LEAFLET_API_URL,
18
+
timeout=30.0,
19
+
headers={"Accept": "application/json"},
20
+
) as client:
21
+
yield client
+289
mcp/src/leaflet_mcp/server.py
+289
mcp/src/leaflet_mcp/server.py
···
1
+
"""Leaflet MCP server implementation using fastmcp."""
2
+
3
+
from __future__ import annotations
4
+
5
+
from typing import Any
6
+
7
+
from fastmcp import FastMCP
8
+
9
+
from leaflet_mcp._types import Document, PopularSearch, SearchResult, Stats, Tag
10
+
from leaflet_mcp.client import get_http_client
11
+
12
+
mcp = FastMCP("leaflet")
13
+
14
+
15
+
# -----------------------------------------------------------------------------
16
+
# prompts
17
+
# -----------------------------------------------------------------------------
18
+
19
+
20
+
@mcp.prompt("usage_guide")
21
+
def usage_guide() -> str:
22
+
"""instructions for using leaflet MCP tools."""
23
+
return """\
24
+
# Leaflet MCP server usage guide
25
+
26
+
Leaflet is a decentralized publishing platform on ATProto (the protocol behind Bluesky).
27
+
This MCP server provides search and discovery tools for Leaflet publications.
28
+
29
+
## core tools
30
+
31
+
- `search(query, tag)` - search documents and publications by text or tag
32
+
- `get_document(uri)` - get the full content of a document by its AT-URI
33
+
- `find_similar(uri)` - find documents similar to a given document
34
+
- `get_tags()` - list all available tags with document counts
35
+
- `get_stats()` - get index statistics (document/publication counts)
36
+
- `get_popular()` - see popular search queries
37
+
38
+
## workflow for research
39
+
40
+
1. use `search("your topic")` to find relevant documents
41
+
2. use `get_document(uri)` to retrieve full content of interesting results
42
+
3. use `find_similar(uri)` to discover related content
43
+
44
+
## result types
45
+
46
+
search returns three types of results:
47
+
- **publication**: a collection of articles (like a blog or magazine)
48
+
- **article**: a document that belongs to a publication
49
+
- **looseleaf**: a standalone document not part of a publication
50
+
51
+
## AT-URIs
52
+
53
+
documents are identified by AT-URIs like:
54
+
`at://did:plc:abc123/pub.leaflet.document/xyz789`
55
+
56
+
you can also browse documents on the web at leaflet.pub
57
+
"""
58
+
59
+
60
+
@mcp.prompt("search_tips")
61
+
def search_tips() -> str:
62
+
"""tips for effective searching."""
63
+
return """\
64
+
# Leaflet search tips
65
+
66
+
## text search
67
+
- searches both document titles and content
68
+
- uses FTS5 full-text search with prefix matching
69
+
- the last word gets prefix matching: "cat dog" matches "cat dogs"
70
+
71
+
## tag filtering
72
+
- combine text search with tag filter: `search("python", tag="programming")`
73
+
- use `get_tags()` to discover available tags
74
+
- tags are only applied to documents, not publications
75
+
76
+
## finding related content
77
+
- after finding an interesting document, use `find_similar(uri)`
78
+
- similarity is based on semantic embeddings (voyage-3-lite)
79
+
- great for exploring related topics
80
+
81
+
## browsing by popularity
82
+
- use `get_popular()` to see what others are searching for
83
+
- can inspire new research directions
84
+
"""
85
+
86
+
87
+
# -----------------------------------------------------------------------------
88
+
# tools
89
+
# -----------------------------------------------------------------------------
90
+
91
+
92
+
@mcp.tool
93
+
async def search(
94
+
query: str = "",
95
+
tag: str | None = None,
96
+
limit: int = 5,
97
+
) -> list[SearchResult]:
98
+
"""search leaflet documents and publications.
99
+
100
+
searches the full text of documents (titles and content) and publications.
101
+
results include a snippet showing where the match was found.
102
+
103
+
args:
104
+
query: search query (searches titles and content)
105
+
tag: optional tag to filter by (only applies to documents)
106
+
limit: max results to return (default 5, max 40)
107
+
108
+
returns:
109
+
list of search results with uri, title, snippet, and metadata
110
+
"""
111
+
if not query and not tag:
112
+
return []
113
+
114
+
params: dict[str, Any] = {}
115
+
if query:
116
+
params["q"] = query
117
+
if tag:
118
+
params["tag"] = tag
119
+
120
+
async with get_http_client() as client:
121
+
response = await client.get("/search", params=params)
122
+
response.raise_for_status()
123
+
results = response.json()
124
+
125
+
# apply client-side limit since API returns up to 40
126
+
return [SearchResult(**r) for r in results[:limit]]
127
+
128
+
129
+
@mcp.tool
130
+
async def get_document(uri: str) -> Document:
131
+
"""get the full content of a document by its AT-URI.
132
+
133
+
fetches the complete document from ATProto, including full text content.
134
+
use this after finding documents via search to get the complete text.
135
+
136
+
args:
137
+
uri: the AT-URI of the document (e.g., at://did:plc:.../pub.leaflet.document/...)
138
+
139
+
returns:
140
+
document with full content, title, tags, and metadata
141
+
"""
142
+
# use pdsx to fetch the actual record from ATProto
143
+
try:
144
+
from pdsx._internal.operations import get_record
145
+
from pdsx.mcp.client import get_atproto_client
146
+
except ImportError as e:
147
+
raise RuntimeError(
148
+
"pdsx is required for fetching full documents. install with: uv add pdsx"
149
+
) from e
150
+
151
+
# extract repo from URI for PDS discovery
152
+
# at://did:plc:xxx/collection/rkey
153
+
parts = uri.replace("at://", "").split("/")
154
+
if len(parts) < 3:
155
+
raise ValueError(f"invalid AT-URI: {uri}")
156
+
157
+
repo = parts[0]
158
+
159
+
async with get_atproto_client(target_repo=repo) as client:
160
+
record = await get_record(client, uri)
161
+
162
+
value = record.value
163
+
# DotDict doesn't have a working .get(), convert to dict first
164
+
if hasattr(value, "to_dict") and callable(value.to_dict):
165
+
value = value.to_dict()
166
+
elif not isinstance(value, dict):
167
+
value = dict(value)
168
+
169
+
# extract content from leaflet's block structure
170
+
# pages[].blocks[].block.plaintext
171
+
content_parts = []
172
+
for page in value.get("pages", []):
173
+
for block_wrapper in page.get("blocks", []):
174
+
block = block_wrapper.get("block", {})
175
+
plaintext = block.get("plaintext", "")
176
+
if plaintext:
177
+
content_parts.append(plaintext)
178
+
179
+
content = "\n\n".join(content_parts)
180
+
181
+
return Document(
182
+
uri=record.uri,
183
+
title=value.get("title", ""),
184
+
content=content,
185
+
createdAt=value.get("publishedAt", "") or value.get("createdAt", ""),
186
+
tags=value.get("tags", []),
187
+
publicationUri=value.get("publication", ""),
188
+
)
189
+
190
+
191
+
@mcp.tool
192
+
async def find_similar(uri: str, limit: int = 5) -> list[SearchResult]:
193
+
"""find documents similar to a given document.
194
+
195
+
uses vector similarity (voyage-3-lite embeddings) to find semantically
196
+
related documents. great for discovering related content after finding
197
+
an interesting document.
198
+
199
+
args:
200
+
uri: the AT-URI of the document to find similar content for
201
+
limit: max similar documents to return (default 5)
202
+
203
+
returns:
204
+
list of similar documents with uri, title, and metadata
205
+
"""
206
+
async with get_http_client() as client:
207
+
response = await client.get("/similar", params={"uri": uri})
208
+
response.raise_for_status()
209
+
results = response.json()
210
+
211
+
return [SearchResult(**r) for r in results[:limit]]
212
+
213
+
214
+
@mcp.tool
215
+
async def get_tags() -> list[Tag]:
216
+
"""list all available tags with document counts.
217
+
218
+
returns tags sorted by document count (most popular first).
219
+
useful for discovering topics and filtering searches.
220
+
221
+
returns:
222
+
list of tags with their document counts
223
+
"""
224
+
async with get_http_client() as client:
225
+
response = await client.get("/tags")
226
+
response.raise_for_status()
227
+
results = response.json()
228
+
229
+
return [Tag(**t) for t in results]
230
+
231
+
232
+
@mcp.tool
233
+
async def get_stats() -> Stats:
234
+
"""get leaflet index statistics.
235
+
236
+
returns:
237
+
document and publication counts
238
+
"""
239
+
async with get_http_client() as client:
240
+
response = await client.get("/stats")
241
+
response.raise_for_status()
242
+
return Stats(**response.json())
243
+
244
+
245
+
@mcp.tool
246
+
async def get_popular(limit: int = 5) -> list[PopularSearch]:
247
+
"""get popular search queries.
248
+
249
+
see what others are searching for on leaflet.
250
+
can inspire new research directions.
251
+
252
+
args:
253
+
limit: max queries to return (default 5)
254
+
255
+
returns:
256
+
list of popular queries with search counts
257
+
"""
258
+
async with get_http_client() as client:
259
+
response = await client.get("/popular")
260
+
response.raise_for_status()
261
+
results = response.json()
262
+
263
+
return [PopularSearch(**p) for p in results[:limit]]
264
+
265
+
266
+
# -----------------------------------------------------------------------------
267
+
# resources
268
+
# -----------------------------------------------------------------------------
269
+
270
+
271
+
@mcp.resource("leaflet://stats")
272
+
async def stats_resource() -> str:
273
+
"""current leaflet index statistics."""
274
+
stats = await get_stats()
275
+
return f"Leaflet index: {stats.documents} documents, {stats.publications} publications"
276
+
277
+
278
+
# -----------------------------------------------------------------------------
279
+
# entrypoint
280
+
# -----------------------------------------------------------------------------
281
+
282
+
283
+
def main() -> None:
284
+
"""run the MCP server."""
285
+
mcp.run()
286
+
287
+
288
+
if __name__ == "__main__":
289
+
main()
-5
mcp/src/pub_search/__init__.py
-5
mcp/src/pub_search/__init__.py
-58
mcp/src/pub_search/_types.py
-58
mcp/src/pub_search/_types.py
···
1
-
"""Type definitions for Leaflet MCP responses."""
2
-
3
-
from typing import Literal
4
-
5
-
from pydantic import BaseModel, computed_field
6
-
7
-
8
-
class SearchResult(BaseModel):
9
-
"""A search result from the Leaflet API."""
10
-
11
-
type: Literal["article", "looseleaf", "publication"]
12
-
uri: str
13
-
did: str
14
-
title: str
15
-
snippet: str
16
-
createdAt: str = ""
17
-
rkey: str
18
-
basePath: str = ""
19
-
20
-
@computed_field
21
-
@property
22
-
def url(self) -> str:
23
-
"""web URL for this document."""
24
-
if self.basePath:
25
-
return f"https://{self.basePath}/{self.rkey}"
26
-
return ""
27
-
28
-
29
-
class Tag(BaseModel):
30
-
"""A tag with document count."""
31
-
32
-
tag: str
33
-
count: int
34
-
35
-
36
-
class PopularSearch(BaseModel):
37
-
"""A popular search query with count."""
38
-
39
-
query: str
40
-
count: int
41
-
42
-
43
-
class Stats(BaseModel):
44
-
"""Leaflet index statistics."""
45
-
46
-
documents: int
47
-
publications: int
48
-
49
-
50
-
class Document(BaseModel):
51
-
"""Full document content from ATProto."""
52
-
53
-
uri: str
54
-
title: str
55
-
content: str
56
-
createdAt: str = ""
57
-
tags: list[str] = []
58
-
publicationUri: str = ""
-21
mcp/src/pub_search/client.py
-21
mcp/src/pub_search/client.py
···
1
-
"""HTTP client for leaflet-search API."""
2
-
3
-
import os
4
-
from contextlib import asynccontextmanager
5
-
from typing import AsyncIterator
6
-
7
-
import httpx
8
-
9
-
# configurable via env var, defaults to production
10
-
API_URL = os.getenv("LEAFLET_SEARCH_API_URL", "https://leaflet-search-backend.fly.dev")
11
-
12
-
13
-
@asynccontextmanager
14
-
async def get_http_client() -> AsyncIterator[httpx.AsyncClient]:
15
-
"""Get an async HTTP client for API requests."""
16
-
async with httpx.AsyncClient(
17
-
base_url=API_URL,
18
-
timeout=30.0,
19
-
headers={"Accept": "application/json"},
20
-
) as client:
21
-
yield client
-288
mcp/src/pub_search/server.py
-288
mcp/src/pub_search/server.py
···
1
-
"""MCP server for searching ATProto publishing platforms."""
2
-
3
-
from __future__ import annotations
4
-
5
-
from typing import Any
6
-
7
-
from fastmcp import FastMCP
8
-
9
-
from pub_search._types import Document, PopularSearch, SearchResult, Stats, Tag
10
-
from pub_search.client import get_http_client
11
-
12
-
mcp = FastMCP("pub-search")
13
-
14
-
15
-
# -----------------------------------------------------------------------------
16
-
# prompts
17
-
# -----------------------------------------------------------------------------
18
-
19
-
20
-
@mcp.prompt("usage_guide")
21
-
def usage_guide() -> str:
22
-
"""instructions for using pub-search MCP tools."""
23
-
return """\
24
-
# pub-search MCP usage guide
25
-
26
-
search documents across ATProto publishing platforms including Leaflet, pckt, and others.
27
-
28
-
## core tools
29
-
30
-
- `search(query, tag)` - search documents and publications by text or tag
31
-
- `get_document(uri)` - get the full content of a document by its AT-URI
32
-
- `find_similar(uri)` - find documents similar to a given document
33
-
- `get_tags()` - list all available tags with document counts
34
-
- `get_stats()` - get index statistics (document/publication counts)
35
-
- `get_popular()` - see popular search queries
36
-
37
-
## workflow for research
38
-
39
-
1. use `search("your topic")` to find relevant documents
40
-
2. use `get_document(uri)` to retrieve full content of interesting results
41
-
3. use `find_similar(uri)` to discover related content
42
-
43
-
## result types
44
-
45
-
search returns three types of results:
46
-
- **publication**: a collection of articles (like a blog or magazine)
47
-
- **article**: a document that belongs to a publication
48
-
- **looseleaf**: a standalone document not part of a publication
49
-
50
-
## AT-URIs
51
-
52
-
documents are identified by AT-URIs like:
53
-
`at://did:plc:abc123/pub.leaflet.document/xyz789`
54
-
55
-
browse the web UI at pub-search.waow.tech
56
-
"""
57
-
58
-
59
-
@mcp.prompt("search_tips")
60
-
def search_tips() -> str:
61
-
"""tips for effective searching."""
62
-
return """\
63
-
# search tips
64
-
65
-
## text search
66
-
- searches both document titles and content
67
-
- uses FTS5 full-text search with prefix matching
68
-
- the last word gets prefix matching: "cat dog" matches "cat dogs"
69
-
70
-
## tag filtering
71
-
- combine text search with tag filter: `search("python", tag="programming")`
72
-
- use `get_tags()` to discover available tags
73
-
- tags are only applied to documents, not publications
74
-
75
-
## finding related content
76
-
- after finding an interesting document, use `find_similar(uri)`
77
-
- similarity is based on semantic embeddings (voyage-3-lite)
78
-
- great for exploring related topics
79
-
80
-
## browsing by popularity
81
-
- use `get_popular()` to see what others are searching for
82
-
- can inspire new research directions
83
-
"""
84
-
85
-
86
-
# -----------------------------------------------------------------------------
87
-
# tools
88
-
# -----------------------------------------------------------------------------
89
-
90
-
91
-
@mcp.tool
92
-
async def search(
93
-
query: str = "",
94
-
tag: str | None = None,
95
-
limit: int = 5,
96
-
) -> list[SearchResult]:
97
-
"""search documents and publications.
98
-
99
-
searches the full text of documents (titles and content) and publications.
100
-
results include a snippet showing where the match was found.
101
-
102
-
args:
103
-
query: search query (searches titles and content)
104
-
tag: optional tag to filter by (only applies to documents)
105
-
limit: max results to return (default 5, max 40)
106
-
107
-
returns:
108
-
list of search results with uri, title, snippet, and metadata
109
-
"""
110
-
if not query and not tag:
111
-
return []
112
-
113
-
params: dict[str, Any] = {}
114
-
if query:
115
-
params["q"] = query
116
-
if tag:
117
-
params["tag"] = tag
118
-
119
-
async with get_http_client() as client:
120
-
response = await client.get("/search", params=params)
121
-
response.raise_for_status()
122
-
results = response.json()
123
-
124
-
# apply client-side limit since API returns up to 40
125
-
return [SearchResult(**r) for r in results[:limit]]
126
-
127
-
128
-
@mcp.tool
129
-
async def get_document(uri: str) -> Document:
130
-
"""get the full content of a document by its AT-URI.
131
-
132
-
fetches the complete document from ATProto, including full text content.
133
-
use this after finding documents via search to get the complete text.
134
-
135
-
args:
136
-
uri: the AT-URI of the document (e.g., at://did:plc:.../pub.leaflet.document/...)
137
-
138
-
returns:
139
-
document with full content, title, tags, and metadata
140
-
"""
141
-
# use pdsx to fetch the actual record from ATProto
142
-
try:
143
-
from pdsx._internal.operations import get_record
144
-
from pdsx.mcp.client import get_atproto_client
145
-
except ImportError as e:
146
-
raise RuntimeError(
147
-
"pdsx is required for fetching full documents. install with: uv add pdsx"
148
-
) from e
149
-
150
-
# extract repo from URI for PDS discovery
151
-
# at://did:plc:xxx/collection/rkey
152
-
parts = uri.replace("at://", "").split("/")
153
-
if len(parts) < 3:
154
-
raise ValueError(f"invalid AT-URI: {uri}")
155
-
156
-
repo = parts[0]
157
-
158
-
async with get_atproto_client(target_repo=repo) as client:
159
-
record = await get_record(client, uri)
160
-
161
-
value = record.value
162
-
# DotDict doesn't have a working .get(), convert to dict first
163
-
if hasattr(value, "to_dict") and callable(value.to_dict):
164
-
value = value.to_dict()
165
-
elif not isinstance(value, dict):
166
-
value = dict(value)
167
-
168
-
# extract content from leaflet's block structure
169
-
# pages[].blocks[].block.plaintext
170
-
content_parts = []
171
-
for page in value.get("pages", []):
172
-
for block_wrapper in page.get("blocks", []):
173
-
block = block_wrapper.get("block", {})
174
-
plaintext = block.get("plaintext", "")
175
-
if plaintext:
176
-
content_parts.append(plaintext)
177
-
178
-
content = "\n\n".join(content_parts)
179
-
180
-
return Document(
181
-
uri=record.uri,
182
-
title=value.get("title", ""),
183
-
content=content,
184
-
createdAt=value.get("publishedAt", "") or value.get("createdAt", ""),
185
-
tags=value.get("tags", []),
186
-
publicationUri=value.get("publication", ""),
187
-
)
188
-
189
-
190
-
@mcp.tool
191
-
async def find_similar(uri: str, limit: int = 5) -> list[SearchResult]:
192
-
"""find documents similar to a given document.
193
-
194
-
uses vector similarity (voyage-3-lite embeddings) to find semantically
195
-
related documents. great for discovering related content after finding
196
-
an interesting document.
197
-
198
-
args:
199
-
uri: the AT-URI of the document to find similar content for
200
-
limit: max similar documents to return (default 5)
201
-
202
-
returns:
203
-
list of similar documents with uri, title, and metadata
204
-
"""
205
-
async with get_http_client() as client:
206
-
response = await client.get("/similar", params={"uri": uri})
207
-
response.raise_for_status()
208
-
results = response.json()
209
-
210
-
return [SearchResult(**r) for r in results[:limit]]
211
-
212
-
213
-
@mcp.tool
214
-
async def get_tags() -> list[Tag]:
215
-
"""list all available tags with document counts.
216
-
217
-
returns tags sorted by document count (most popular first).
218
-
useful for discovering topics and filtering searches.
219
-
220
-
returns:
221
-
list of tags with their document counts
222
-
"""
223
-
async with get_http_client() as client:
224
-
response = await client.get("/tags")
225
-
response.raise_for_status()
226
-
results = response.json()
227
-
228
-
return [Tag(**t) for t in results]
229
-
230
-
231
-
@mcp.tool
232
-
async def get_stats() -> Stats:
233
-
"""get index statistics.
234
-
235
-
returns:
236
-
document and publication counts
237
-
"""
238
-
async with get_http_client() as client:
239
-
response = await client.get("/stats")
240
-
response.raise_for_status()
241
-
return Stats(**response.json())
242
-
243
-
244
-
@mcp.tool
245
-
async def get_popular(limit: int = 5) -> list[PopularSearch]:
246
-
"""get popular search queries.
247
-
248
-
see what others are searching for.
249
-
can inspire new research directions.
250
-
251
-
args:
252
-
limit: max queries to return (default 5)
253
-
254
-
returns:
255
-
list of popular queries with search counts
256
-
"""
257
-
async with get_http_client() as client:
258
-
response = await client.get("/popular")
259
-
response.raise_for_status()
260
-
results = response.json()
261
-
262
-
return [PopularSearch(**p) for p in results[:limit]]
263
-
264
-
265
-
# -----------------------------------------------------------------------------
266
-
# resources
267
-
# -----------------------------------------------------------------------------
268
-
269
-
270
-
@mcp.resource("pub-search://stats")
271
-
async def stats_resource() -> str:
272
-
"""current index statistics."""
273
-
stats = await get_stats()
274
-
return f"pub search index: {stats.documents} documents, {stats.publications} publications"
275
-
276
-
277
-
# -----------------------------------------------------------------------------
278
-
# entrypoint
279
-
# -----------------------------------------------------------------------------
280
-
281
-
282
-
def main() -> None:
283
-
"""run the MCP server."""
284
-
mcp.run()
285
-
286
-
287
-
if __name__ == "__main__":
288
-
main()
+8
-8
mcp/tests/test_mcp.py
+8
-8
mcp/tests/test_mcp.py
···
1
-
"""tests for pub-search MCP server."""
1
+
"""tests for leaflet MCP server."""
2
2
3
3
import pytest
4
4
from mcp.types import TextContent
···
6
6
from fastmcp.client import Client
7
7
from fastmcp.client.transports import FastMCPTransport
8
8
9
-
from pub_search._types import Document, PopularSearch, SearchResult, Stats, Tag
10
-
from pub_search.server import mcp
9
+
from leaflet_mcp._types import Document, PopularSearch, SearchResult, Stats, Tag
10
+
from leaflet_mcp.server import mcp
11
11
12
12
13
13
class TestTypes:
···
93
93
94
94
def test_mcp_server_imports(self):
95
95
"""mcp server can be imported without errors."""
96
-
from pub_search import mcp
96
+
from leaflet_mcp import mcp
97
97
98
-
assert mcp.name == "pub-search"
98
+
assert mcp.name == "leaflet"
99
99
100
100
def test_exports(self):
101
101
"""all expected exports are available."""
102
-
from pub_search import main, mcp
102
+
from leaflet_mcp import main, mcp
103
103
104
104
assert mcp is not None
105
105
assert main is not None
···
138
138
resources = await client.list_resources()
139
139
140
140
resource_uris = {str(r.uri) for r in resources}
141
-
assert "pub-search://stats" in resource_uris
141
+
assert "leaflet://stats" in resource_uris
142
142
143
143
async def test_usage_guide_prompt_content(self, client):
144
144
"""usage_guide prompt returns helpful content."""
···
148
148
assert len(result.messages) > 0
149
149
content = result.messages[0].content
150
150
assert isinstance(content, TextContent)
151
-
assert "pub-search" in content.text
151
+
assert "Leaflet" in content.text
152
152
assert "search" in content.text
153
153
154
154
async def test_search_tips_prompt_content(self, client):
+32
-32
mcp/uv.lock
+32
-32
mcp/uv.lock
···
691
691
]
692
692
693
693
[[package]]
694
+
name = "leaflet-mcp"
695
+
source = { editable = "." }
696
+
dependencies = [
697
+
{ name = "fastmcp" },
698
+
{ name = "httpx" },
699
+
{ name = "pdsx" },
700
+
]
701
+
702
+
[package.dev-dependencies]
703
+
dev = [
704
+
{ name = "pytest" },
705
+
{ name = "pytest-asyncio" },
706
+
{ name = "pytest-sugar" },
707
+
{ name = "ruff" },
708
+
]
709
+
710
+
[package.metadata]
711
+
requires-dist = [
712
+
{ name = "fastmcp", specifier = ">=2.0" },
713
+
{ name = "httpx", specifier = ">=0.28" },
714
+
{ name = "pdsx", git = "https://github.com/zzstoatzz/pdsx.git" },
715
+
]
716
+
717
+
[package.metadata.requires-dev]
718
+
dev = [
719
+
{ name = "pytest", specifier = ">=8.3.0" },
720
+
{ name = "pytest-asyncio", specifier = ">=0.25.0" },
721
+
{ name = "pytest-sugar" },
722
+
{ name = "ruff", specifier = ">=0.12.0" },
723
+
]
724
+
725
+
[[package]]
694
726
name = "libipld"
695
727
version = "3.3.2"
696
728
source = { registry = "https://pypi.org/simple" }
···
1043
1075
sdist = { url = "https://files.pythonhosted.org/packages/23/53/3edb5d68ecf6b38fcbcc1ad28391117d2a322d9a1a3eff04bfdb184d8c3b/prometheus_client-0.23.1.tar.gz", hash = "sha256:6ae8f9081eaaaf153a2e959d2e6c4f4fb57b12ef76c8c7980202f1e57b48b2ce", size = 80481, upload-time = "2025-09-18T20:47:25.043Z" }
1044
1076
wheels = [
1045
1077
{ url = "https://files.pythonhosted.org/packages/b8/db/14bafcb4af2139e046d03fd00dea7873e48eafe18b7d2797e73d6681f210/prometheus_client-0.23.1-py3-none-any.whl", hash = "sha256:dd1913e6e76b59cfe44e7a4b83e01afc9873c1bdfd2ed8739f1e76aeca115f99", size = 61145, upload-time = "2025-09-18T20:47:23.875Z" },
1046
-
]
1047
-
1048
-
[[package]]
1049
-
name = "pub-search"
1050
-
source = { editable = "." }
1051
-
dependencies = [
1052
-
{ name = "fastmcp" },
1053
-
{ name = "httpx" },
1054
-
{ name = "pdsx" },
1055
-
]
1056
-
1057
-
[package.dev-dependencies]
1058
-
dev = [
1059
-
{ name = "pytest" },
1060
-
{ name = "pytest-asyncio" },
1061
-
{ name = "pytest-sugar" },
1062
-
{ name = "ruff" },
1063
-
]
1064
-
1065
-
[package.metadata]
1066
-
requires-dist = [
1067
-
{ name = "fastmcp", specifier = ">=2.0" },
1068
-
{ name = "httpx", specifier = ">=0.28" },
1069
-
{ name = "pdsx", git = "https://github.com/zzstoatzz/pdsx.git" },
1070
-
]
1071
-
1072
-
[package.metadata.requires-dev]
1073
-
dev = [
1074
-
{ name = "pytest", specifier = ">=8.3.0" },
1075
-
{ name = "pytest-asyncio", specifier = ">=0.25.0" },
1076
-
{ name = "pytest-sugar" },
1077
-
{ name = "ruff", specifier = ">=0.12.0" },
1078
1078
]
1079
1079
1080
1080
[[package]]
-383
scripts/backfill-pds
-383
scripts/backfill-pds
···
1
-
#!/usr/bin/env -S uv run --script --quiet
2
-
# /// script
3
-
# requires-python = ">=3.12"
4
-
# dependencies = ["httpx", "pydantic-settings"]
5
-
# ///
6
-
"""
7
-
Backfill records directly from a PDS.
8
-
9
-
Usage:
10
-
./scripts/backfill-pds did:plc:mkqt76xvfgxuemlwlx6ruc3w
11
-
./scripts/backfill-pds zat.dev
12
-
"""
13
-
14
-
import argparse
15
-
import json
16
-
import os
17
-
import sys
18
-
19
-
import httpx
20
-
from pydantic_settings import BaseSettings, SettingsConfigDict
21
-
22
-
23
-
class Settings(BaseSettings):
24
-
model_config = SettingsConfigDict(
25
-
env_file=os.environ.get("ENV_FILE", ".env"), extra="ignore"
26
-
)
27
-
28
-
turso_url: str
29
-
turso_token: str
30
-
31
-
@property
32
-
def turso_host(self) -> str:
33
-
url = self.turso_url
34
-
if url.startswith("libsql://"):
35
-
url = url[len("libsql://") :]
36
-
return url
37
-
38
-
39
-
def resolve_handle(handle: str) -> str:
40
-
"""Resolve a handle to a DID."""
41
-
resp = httpx.get(
42
-
f"https://bsky.social/xrpc/com.atproto.identity.resolveHandle",
43
-
params={"handle": handle},
44
-
timeout=30,
45
-
)
46
-
resp.raise_for_status()
47
-
return resp.json()["did"]
48
-
49
-
50
-
def get_pds_endpoint(did: str) -> str:
51
-
"""Get PDS endpoint from PLC directory."""
52
-
resp = httpx.get(f"https://plc.directory/{did}", timeout=30)
53
-
resp.raise_for_status()
54
-
data = resp.json()
55
-
for service in data.get("service", []):
56
-
if service.get("type") == "AtprotoPersonalDataServer":
57
-
return service["serviceEndpoint"]
58
-
raise ValueError(f"No PDS endpoint found for {did}")
59
-
60
-
61
-
def list_records(pds: str, did: str, collection: str) -> list[dict]:
62
-
"""List all records from a collection."""
63
-
records = []
64
-
cursor = None
65
-
while True:
66
-
params = {"repo": did, "collection": collection, "limit": 100}
67
-
if cursor:
68
-
params["cursor"] = cursor
69
-
resp = httpx.get(
70
-
f"{pds}/xrpc/com.atproto.repo.listRecords", params=params, timeout=30
71
-
)
72
-
resp.raise_for_status()
73
-
data = resp.json()
74
-
records.extend(data.get("records", []))
75
-
cursor = data.get("cursor")
76
-
if not cursor:
77
-
break
78
-
return records
79
-
80
-
81
-
def turso_exec(settings: Settings, sql: str, args: list | None = None) -> None:
82
-
"""Execute a statement against Turso."""
83
-
stmt = {"sql": sql}
84
-
if args:
85
-
# Handle None values properly - use null type
86
-
stmt["args"] = []
87
-
for a in args:
88
-
if a is None:
89
-
stmt["args"].append({"type": "null"})
90
-
else:
91
-
stmt["args"].append({"type": "text", "value": str(a)})
92
-
93
-
response = httpx.post(
94
-
f"https://{settings.turso_host}/v2/pipeline",
95
-
headers={
96
-
"Authorization": f"Bearer {settings.turso_token}",
97
-
"Content-Type": "application/json",
98
-
},
99
-
json={"requests": [{"type": "execute", "stmt": stmt}, {"type": "close"}]},
100
-
timeout=30,
101
-
)
102
-
if response.status_code != 200:
103
-
print(f"Turso error: {response.text}", file=sys.stderr)
104
-
response.raise_for_status()
105
-
106
-
107
-
def extract_leaflet_blocks(pages: list) -> str:
108
-
"""Extract text from leaflet pages/blocks structure."""
109
-
texts = []
110
-
for page in pages:
111
-
if not isinstance(page, dict):
112
-
continue
113
-
blocks = page.get("blocks", [])
114
-
for wrapper in blocks:
115
-
if not isinstance(wrapper, dict):
116
-
continue
117
-
block = wrapper.get("block", {})
118
-
if not isinstance(block, dict):
119
-
continue
120
-
# Extract plaintext from text, header, blockquote, code blocks
121
-
block_type = block.get("$type", "")
122
-
if block_type in (
123
-
"pub.leaflet.blocks.text",
124
-
"pub.leaflet.blocks.header",
125
-
"pub.leaflet.blocks.blockquote",
126
-
"pub.leaflet.blocks.code",
127
-
):
128
-
plaintext = block.get("plaintext", "")
129
-
if plaintext:
130
-
texts.append(plaintext)
131
-
# Handle lists
132
-
elif block_type == "pub.leaflet.blocks.unorderedList":
133
-
texts.extend(extract_list_items(block.get("children", [])))
134
-
return " ".join(texts)
135
-
136
-
137
-
def extract_list_items(children: list) -> list[str]:
138
-
"""Recursively extract text from list items."""
139
-
texts = []
140
-
for child in children:
141
-
if not isinstance(child, dict):
142
-
continue
143
-
content = child.get("content", {})
144
-
if isinstance(content, dict):
145
-
plaintext = content.get("plaintext", "")
146
-
if plaintext:
147
-
texts.append(plaintext)
148
-
# Recurse into nested children
149
-
nested = child.get("children", [])
150
-
if nested:
151
-
texts.extend(extract_list_items(nested))
152
-
return texts
153
-
154
-
155
-
def extract_document(record: dict, collection: str) -> dict | None:
156
-
"""Extract document fields from a record."""
157
-
value = record.get("value", {})
158
-
159
-
# Get title
160
-
title = value.get("title")
161
-
if not title:
162
-
return None
163
-
164
-
# Get content - try textContent (site.standard), then leaflet blocks, then content/text
165
-
content = value.get("textContent") or ""
166
-
if not content:
167
-
# Try leaflet-style pages/blocks
168
-
pages = value.get("pages", [])
169
-
if pages:
170
-
content = extract_leaflet_blocks(pages)
171
-
if not content:
172
-
# Fall back to simple content/text fields
173
-
content = value.get("content") or value.get("text") or ""
174
-
if isinstance(content, dict):
175
-
# Handle richtext format
176
-
content = content.get("text", "")
177
-
178
-
# Get created_at
179
-
created_at = value.get("createdAt", "")
180
-
181
-
# Get publication reference - try "publication" (leaflet) then "site" (site.standard)
182
-
publication = value.get("publication") or value.get("site")
183
-
publication_uri = None
184
-
if publication:
185
-
if isinstance(publication, dict):
186
-
publication_uri = publication.get("uri")
187
-
elif isinstance(publication, str):
188
-
publication_uri = publication
189
-
190
-
# Get URL path (site.standard.document uses "path" field like "/001")
191
-
path = value.get("path")
192
-
193
-
# Get tags
194
-
tags = value.get("tags", [])
195
-
if not isinstance(tags, list):
196
-
tags = []
197
-
198
-
# Determine platform from collection (site.standard is a lexicon, not a platform)
199
-
if collection.startswith("pub.leaflet"):
200
-
platform = "leaflet"
201
-
elif collection.startswith("blog.pckt"):
202
-
platform = "pckt"
203
-
else:
204
-
# site.standard.* and others - platform will be detected from publication basePath
205
-
platform = "unknown"
206
-
207
-
return {
208
-
"title": title,
209
-
"content": content,
210
-
"created_at": created_at,
211
-
"publication_uri": publication_uri,
212
-
"tags": tags,
213
-
"platform": platform,
214
-
"collection": collection,
215
-
"path": path,
216
-
}
217
-
218
-
219
-
def main():
220
-
parser = argparse.ArgumentParser(description="Backfill records from a PDS")
221
-
parser.add_argument("identifier", help="DID or handle to backfill")
222
-
parser.add_argument("--dry-run", action="store_true", help="Show what would be done")
223
-
args = parser.parse_args()
224
-
225
-
try:
226
-
settings = Settings() # type: ignore
227
-
except Exception as e:
228
-
print(f"error loading settings: {e}", file=sys.stderr)
229
-
print("required env vars: TURSO_URL, TURSO_TOKEN", file=sys.stderr)
230
-
sys.exit(1)
231
-
232
-
# Resolve identifier to DID
233
-
identifier = args.identifier
234
-
if identifier.startswith("did:"):
235
-
did = identifier
236
-
else:
237
-
print(f"resolving handle {identifier}...")
238
-
did = resolve_handle(identifier)
239
-
print(f" -> {did}")
240
-
241
-
# Get PDS endpoint
242
-
print(f"looking up PDS for {did}...")
243
-
pds = get_pds_endpoint(did)
244
-
print(f" -> {pds}")
245
-
246
-
# Collections to fetch
247
-
collections = [
248
-
"pub.leaflet.document",
249
-
"pub.leaflet.publication",
250
-
"site.standard.document",
251
-
"site.standard.publication",
252
-
]
253
-
254
-
total_docs = 0
255
-
total_pubs = 0
256
-
257
-
for collection in collections:
258
-
print(f"fetching {collection}...")
259
-
try:
260
-
records = list_records(pds, did, collection)
261
-
except httpx.HTTPStatusError as e:
262
-
if e.response.status_code == 400:
263
-
print(f" (no records)")
264
-
continue
265
-
raise
266
-
267
-
if not records:
268
-
print(f" (no records)")
269
-
continue
270
-
271
-
print(f" found {len(records)} records")
272
-
273
-
for record in records:
274
-
uri = record["uri"]
275
-
# Parse rkey from URI: at://did/collection/rkey
276
-
parts = uri.split("/")
277
-
rkey = parts[-1]
278
-
279
-
if collection.endswith(".document"):
280
-
doc = extract_document(record, collection)
281
-
if not doc:
282
-
print(f" skip {uri} (no title)")
283
-
continue
284
-
285
-
if args.dry_run:
286
-
print(f" would insert: {doc['title'][:50]}...")
287
-
else:
288
-
# Insert document
289
-
turso_exec(
290
-
settings,
291
-
"""
292
-
INSERT INTO documents (uri, did, rkey, title, content, created_at, publication_uri, platform, source_collection, path)
293
-
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
294
-
ON CONFLICT(did, rkey) DO UPDATE SET
295
-
uri = excluded.uri,
296
-
title = excluded.title,
297
-
content = excluded.content,
298
-
created_at = excluded.created_at,
299
-
publication_uri = excluded.publication_uri,
300
-
platform = excluded.platform,
301
-
source_collection = excluded.source_collection,
302
-
path = excluded.path
303
-
""",
304
-
[uri, did, rkey, doc["title"], doc["content"], doc["created_at"], doc["publication_uri"], doc["platform"], doc["collection"], doc["path"]],
305
-
)
306
-
# Insert tags
307
-
for tag in doc["tags"]:
308
-
turso_exec(
309
-
settings,
310
-
"INSERT OR IGNORE INTO document_tags (document_uri, tag) VALUES (?, ?)",
311
-
[uri, tag],
312
-
)
313
-
# Update FTS index (delete then insert, FTS5 doesn't support ON CONFLICT)
314
-
turso_exec(settings, "DELETE FROM documents_fts WHERE uri = ?", [uri])
315
-
turso_exec(
316
-
settings,
317
-
"INSERT INTO documents_fts (uri, title, content) VALUES (?, ?, ?)",
318
-
[uri, doc["title"], doc["content"]],
319
-
)
320
-
print(f" indexed: {doc['title'][:50]}...")
321
-
total_docs += 1
322
-
323
-
elif collection.endswith(".publication"):
324
-
value = record["value"]
325
-
name = value.get("name", "")
326
-
description = value.get("description")
327
-
# base_path: try leaflet's "base_path", then strip scheme from site.standard's "url"
328
-
base_path = value.get("base_path")
329
-
if not base_path:
330
-
url = value.get("url")
331
-
if url:
332
-
# Strip https:// or http:// prefix
333
-
if url.startswith("https://"):
334
-
base_path = url[len("https://"):]
335
-
elif url.startswith("http://"):
336
-
base_path = url[len("http://"):]
337
-
else:
338
-
base_path = url
339
-
340
-
if args.dry_run:
341
-
print(f" would insert pub: {name}")
342
-
else:
343
-
turso_exec(
344
-
settings,
345
-
"""
346
-
INSERT INTO publications (uri, did, rkey, name, description, base_path)
347
-
VALUES (?, ?, ?, ?, ?, ?)
348
-
ON CONFLICT(uri) DO UPDATE SET
349
-
name = excluded.name,
350
-
description = excluded.description,
351
-
base_path = excluded.base_path
352
-
""",
353
-
[uri, did, rkey, name, description, base_path],
354
-
)
355
-
print(f" indexed pub: {name}")
356
-
total_pubs += 1
357
-
358
-
# post-process: detect platform from publication basePath
359
-
if not args.dry_run and (total_docs > 0 or total_pubs > 0):
360
-
print("detecting platforms from publication basePath...")
361
-
turso_exec(
362
-
settings,
363
-
"""
364
-
UPDATE documents SET platform = 'pckt'
365
-
WHERE platform IN ('standardsite', 'unknown')
366
-
AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%pckt.blog%')
367
-
""",
368
-
)
369
-
turso_exec(
370
-
settings,
371
-
"""
372
-
UPDATE documents SET platform = 'leaflet'
373
-
WHERE platform IN ('standardsite', 'unknown')
374
-
AND publication_uri IN (SELECT uri FROM publications WHERE base_path LIKE '%leaflet.pub%')
375
-
""",
376
-
)
377
-
print(" done")
378
-
379
-
print(f"\ndone! {total_docs} documents, {total_pubs} publications")
380
-
381
-
382
-
if __name__ == "__main__":
383
-
main()
-109
scripts/enumerate-standard-repos
-109
scripts/enumerate-standard-repos
···
1
-
#!/usr/bin/env -S uv run --script --quiet
2
-
# /// script
3
-
# requires-python = ">=3.12"
4
-
# dependencies = ["httpx"]
5
-
# ///
6
-
"""
7
-
Enumerate repos with site.standard.* records and add them to TAP.
8
-
9
-
TAP only signals on one collection, so we use this to discover repos
10
-
that use site.standard.publication (pckt, etc) and add them to TAP.
11
-
12
-
Usage:
13
-
./scripts/enumerate-standard-repos
14
-
./scripts/enumerate-standard-repos --dry-run
15
-
"""
16
-
17
-
import argparse
18
-
import sys
19
-
20
-
import httpx
21
-
22
-
RELAY_URL = "https://relay1.us-east.bsky.network"
23
-
TAP_URL = "http://leaflet-search-tap.internal:2480" # fly internal network
24
-
COLLECTION = "site.standard.publication"
25
-
26
-
27
-
def enumerate_repos(relay_url: str, collection: str) -> list[str]:
28
-
"""Enumerate all repos with records in the given collection."""
29
-
dids = []
30
-
cursor = None
31
-
32
-
print(f"enumerating repos with {collection}...")
33
-
34
-
while True:
35
-
params = {"collection": collection, "limit": 1000}
36
-
if cursor:
37
-
params["cursor"] = cursor
38
-
39
-
resp = httpx.get(
40
-
f"{relay_url}/xrpc/com.atproto.sync.listReposByCollection",
41
-
params=params,
42
-
timeout=60,
43
-
)
44
-
resp.raise_for_status()
45
-
data = resp.json()
46
-
47
-
repos = data.get("repos", [])
48
-
for repo in repos:
49
-
dids.append(repo["did"])
50
-
51
-
if not repos:
52
-
break
53
-
54
-
cursor = data.get("cursor")
55
-
if not cursor:
56
-
break
57
-
58
-
print(f" found {len(dids)} repos so far...")
59
-
60
-
return dids
61
-
62
-
63
-
def add_repos_to_tap(tap_url: str, dids: list[str]) -> None:
64
-
"""Add repos to TAP for syncing."""
65
-
if not dids:
66
-
return
67
-
68
-
# batch in chunks of 100
69
-
batch_size = 100
70
-
for i in range(0, len(dids), batch_size):
71
-
batch = dids[i:i + batch_size]
72
-
resp = httpx.post(
73
-
f"{tap_url}/repos/add",
74
-
json={"dids": batch},
75
-
timeout=30,
76
-
)
77
-
resp.raise_for_status()
78
-
print(f" added batch {i // batch_size + 1}: {len(batch)} repos")
79
-
80
-
81
-
def main():
82
-
parser = argparse.ArgumentParser(description="Enumerate and add standard.site repos to TAP")
83
-
parser.add_argument("--dry-run", action="store_true", help="Show what would be done")
84
-
parser.add_argument("--relay-url", default=RELAY_URL, help="Relay URL")
85
-
parser.add_argument("--tap-url", default=TAP_URL, help="TAP URL")
86
-
args = parser.parse_args()
87
-
88
-
dids = enumerate_repos(args.relay_url, COLLECTION)
89
-
print(f"found {len(dids)} repos with {COLLECTION}")
90
-
91
-
if not dids:
92
-
print("no repos to add")
93
-
return
94
-
95
-
if args.dry_run:
96
-
print("dry run - would add these repos to TAP:")
97
-
for did in dids[:10]:
98
-
print(f" {did}")
99
-
if len(dids) > 10:
100
-
print(f" ... and {len(dids) - 10} more")
101
-
return
102
-
103
-
print(f"adding {len(dids)} repos to TAP...")
104
-
add_repos_to_tap(args.tap_url, dids)
105
-
print("done!")
106
-
107
-
108
-
if __name__ == "__main__":
109
-
main()
-86
scripts/rebuild-pub-fts
-86
scripts/rebuild-pub-fts
···
1
-
#!/usr/bin/env -S uv run --script --quiet
2
-
# /// script
3
-
# requires-python = ">=3.12"
4
-
# dependencies = ["httpx", "pydantic-settings"]
5
-
# ///
6
-
"""Rebuild publications_fts with base_path column for subdomain search."""
7
-
import os
8
-
import httpx
9
-
from pydantic_settings import BaseSettings, SettingsConfigDict
10
-
11
-
12
-
class Settings(BaseSettings):
13
-
model_config = SettingsConfigDict(
14
-
env_file=os.environ.get("ENV_FILE", ".env"), extra="ignore"
15
-
)
16
-
turso_url: str
17
-
turso_token: str
18
-
19
-
@property
20
-
def turso_host(self) -> str:
21
-
url = self.turso_url
22
-
if url.startswith("libsql://"):
23
-
url = url[len("libsql://") :]
24
-
return url
25
-
26
-
27
-
settings = Settings() # type: ignore
28
-
29
-
print("Rebuilding publications_fts with base_path column...")
30
-
31
-
response = httpx.post(
32
-
f"https://{settings.turso_host}/v2/pipeline",
33
-
headers={
34
-
"Authorization": f"Bearer {settings.turso_token}",
35
-
"Content-Type": "application/json",
36
-
},
37
-
json={
38
-
"requests": [
39
-
{"type": "execute", "stmt": {"sql": "DROP TABLE IF EXISTS publications_fts"}},
40
-
{
41
-
"type": "execute",
42
-
"stmt": {
43
-
"sql": """
44
-
CREATE VIRTUAL TABLE publications_fts USING fts5(
45
-
uri UNINDEXED,
46
-
name,
47
-
description,
48
-
base_path
49
-
)
50
-
"""
51
-
},
52
-
},
53
-
{
54
-
"type": "execute",
55
-
"stmt": {
56
-
"sql": """
57
-
INSERT INTO publications_fts (uri, name, description, base_path)
58
-
SELECT uri, name, COALESCE(description, ''), COALESCE(base_path, '')
59
-
FROM publications
60
-
"""
61
-
},
62
-
},
63
-
{"type": "execute", "stmt": {"sql": "SELECT COUNT(*) FROM publications_fts"}},
64
-
{"type": "close"},
65
-
]
66
-
},
67
-
timeout=60,
68
-
)
69
-
response.raise_for_status()
70
-
data = response.json()
71
-
72
-
for i, result in enumerate(data["results"][:-1]): # skip close
73
-
if result["type"] == "error":
74
-
print(f"Step {i} error: {result['error']}")
75
-
elif result["type"] == "ok":
76
-
if i == 3: # count query
77
-
rows = result["response"]["result"].get("rows", [])
78
-
if rows:
79
-
count = (
80
-
rows[0][0].get("value", rows[0][0])
81
-
if isinstance(rows[0][0], dict)
82
-
else rows[0][0]
83
-
)
84
-
print(f"Rebuilt with {count} publications")
85
-
86
-
print("Done!")
+12
-5
site/dashboard.html
+12
-5
site/dashboard.html
···
3
3
<head>
4
4
<meta charset="UTF-8">
5
5
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6
-
<title>pub search / stats</title>
6
+
<title>leaflet search / stats</title>
7
7
<link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 32 32'><rect x='4' y='18' width='6' height='10' fill='%231B7340'/><rect x='13' y='12' width='6' height='16' fill='%231B7340'/><rect x='22' y='6' width='6' height='22' fill='%231B7340'/></svg>">
8
8
<link rel="stylesheet" href="dashboard.css">
9
9
</head>
10
10
<body>
11
11
<div class="container">
12
-
<h1><a href="https://pub-search.waow.tech" class="title">pub search</a> <span class="dim">/ stats</span></h1>
12
+
<h1><a href="https://leaflet-search.pages.dev" class="title">leaflet search</a> <span class="dim">/ stats</span></h1>
13
13
14
14
<section>
15
15
<div class="metrics">
···
30
30
</section>
31
31
32
32
<section>
33
-
<div class="section-title">documents by platform</div>
33
+
<div class="section-title">documents</div>
34
34
<div class="chart-box">
35
-
<div id="platforms"></div>
35
+
<div class="doc-row">
36
+
<span class="doc-type">articles</span>
37
+
<span class="doc-count" id="articles">--</span>
38
+
</div>
39
+
<div class="doc-row">
40
+
<span class="doc-type">looseleafs</span>
41
+
<span class="doc-count" id="looseleafs">--</span>
42
+
</div>
36
43
</div>
37
44
</section>
38
45
···
56
63
</section>
57
64
58
65
<footer>
59
-
<a href="https://pub-search.waow.tech">back</a> ยท source on <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search">tangled</a>
66
+
<a href="https://leaflet-search.pages.dev">back</a> ยท source on <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search">tangled</a>
60
67
</footer>
61
68
</div>
62
69
+3
-14
site/dashboard.js
+3
-14
site/dashboard.js
···
57
57
if (!tags) return;
58
58
59
59
el.innerHTML = tags.slice(0, 20).map(t =>
60
-
'<a class="tag" href="https://pub-search.waow.tech/?tag=' + encodeURIComponent(t.tag) + '">' +
60
+
'<a class="tag" href="https://leaflet-search.pages.dev/?tag=' + encodeURIComponent(t.tag) + '">' +
61
61
escapeHtml(t.tag) + '<span class="n">' + t.count + '</span></a>'
62
62
).join('');
63
-
}
64
-
65
-
function renderPlatforms(platforms) {
66
-
const el = document.getElementById('platforms');
67
-
if (!platforms) return;
68
-
69
-
platforms.forEach(p => {
70
-
const row = document.createElement('div');
71
-
row.className = 'doc-row';
72
-
row.innerHTML = '<span class="doc-type">' + escapeHtml(p.platform) + '</span><span class="doc-count">' + p.count + '</span>';
73
-
el.appendChild(row);
74
-
});
75
63
}
76
64
77
65
function escapeHtml(str) {
···
95
83
96
84
document.getElementById('searches').textContent = data.searches;
97
85
document.getElementById('publications').textContent = data.publications;
86
+
document.getElementById('articles').textContent = data.articles;
87
+
document.getElementById('looseleafs').textContent = data.looseleafs;
98
88
99
-
renderPlatforms(data.platforms);
100
89
renderTimeline(data.timeline);
101
90
renderPubs(data.topPubs);
102
91
renderTags(data.tags);
+44
-316
site/index.html
+44
-316
site/index.html
···
4
4
<meta charset="UTF-8">
5
5
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6
6
<link rel="icon" type="image/svg+xml" href="/favicon.svg">
7
-
<title>pub search</title>
8
-
<meta name="description" content="search atproto publishing platforms">
9
-
<meta property="og:title" content="pub search">
10
-
<meta property="og:description" content="search atproto publishing platforms">
7
+
<title>leaflet search</title>
8
+
<meta name="description" content="search for leaflet">
9
+
<meta property="og:title" content="leaflet search">
10
+
<meta property="og:description" content="search for leaflet">
11
11
<meta property="og:type" content="website">
12
12
<meta name="twitter:card" content="summary">
13
-
<meta name="twitter:title" content="pub search">
14
-
<meta name="twitter:description" content="search atproto publishing platforms">
13
+
<meta name="twitter:title" content="leaflet search">
14
+
<meta name="twitter:description" content="search for leaflet">
15
15
<style>
16
16
* { box-sizing: border-box; margin: 0; padding: 0; }
17
17
···
75
75
flex: 1;
76
76
padding: 0.5rem;
77
77
font-family: monospace;
78
-
font-size: 16px; /* prevents iOS auto-zoom on focus */
78
+
font-size: 14px;
79
79
background: #111;
80
80
border: 1px solid #333;
81
81
color: #ccc;
···
111
111
.result-title {
112
112
color: #fff;
113
113
margin-bottom: 0.5rem;
114
-
/* prevent long titles from breaking layout */
115
-
display: -webkit-box;
116
-
-webkit-line-clamp: 2;
117
-
-webkit-box-orient: vertical;
118
-
overflow: hidden;
119
-
word-break: break-word;
120
114
}
121
115
122
116
.result-title a { color: inherit; }
···
331
325
margin-left: 4px;
332
326
}
333
327
334
-
.platform-filter {
335
-
margin-bottom: 1rem;
336
-
}
337
-
338
-
.platform-filter-label {
339
-
font-size: 11px;
340
-
color: #444;
341
-
margin-bottom: 0.5rem;
342
-
}
343
-
344
-
.platform-filter-list {
345
-
display: flex;
346
-
gap: 0.5rem;
347
-
}
348
-
349
-
.platform-option {
350
-
font-size: 11px;
351
-
padding: 3px 8px;
352
-
background: #151515;
353
-
border: 1px solid #252525;
354
-
border-radius: 3px;
355
-
cursor: pointer;
356
-
color: #777;
357
-
}
358
-
359
-
.platform-option:hover {
360
-
background: #1a1a1a;
361
-
border-color: #333;
362
-
color: #aaa;
363
-
}
364
-
365
-
.platform-option.active {
366
-
background: rgba(180, 100, 64, 0.2);
367
-
border-color: #d4956a;
368
-
color: #d4956a;
369
-
}
370
-
371
328
.active-filter {
372
329
display: flex;
373
330
align-items: center;
···
389
346
.active-filter .clear:hover {
390
347
color: #c44;
391
348
}
392
-
393
-
/* mobile improvements */
394
-
@media (max-width: 600px) {
395
-
body {
396
-
padding: 0.75rem;
397
-
font-size: 13px;
398
-
}
399
-
400
-
.container {
401
-
max-width: 100%;
402
-
}
403
-
404
-
/* ensure minimum 44px touch targets */
405
-
.tag, .platform-option, .suggestion {
406
-
min-height: 44px;
407
-
display: inline-flex;
408
-
align-items: center;
409
-
padding: 0.5rem 0.75rem;
410
-
}
411
-
412
-
button {
413
-
min-height: 44px;
414
-
padding: 0.5rem 0.75rem;
415
-
}
416
-
417
-
/* stack search box on very small screens */
418
-
.search-box {
419
-
flex-direction: column;
420
-
gap: 0.5rem;
421
-
}
422
-
423
-
.search-box input[type="text"] {
424
-
width: 100%;
425
-
}
426
-
427
-
.search-box button {
428
-
width: 100%;
429
-
}
430
-
431
-
/* result card mobile tweaks */
432
-
.result {
433
-
padding: 0.75rem 0;
434
-
}
435
-
436
-
.result:hover {
437
-
margin: 0 -0.75rem;
438
-
padding: 0.75rem;
439
-
}
440
-
441
-
.result-title {
442
-
font-size: 14px;
443
-
line-height: 1.4;
444
-
}
445
-
446
-
.result-snippet {
447
-
font-size: 12px;
448
-
line-height: 1.5;
449
-
}
450
-
451
-
/* badges inline on mobile */
452
-
.entity-type, .platform-badge {
453
-
font-size: 9px;
454
-
padding: 2px 5px;
455
-
margin-right: 6px;
456
-
vertical-align: middle;
457
-
}
458
-
459
-
/* tags wrap better on mobile */
460
-
.tags-list, .platform-filter-list {
461
-
gap: 0.5rem;
462
-
}
463
-
464
-
/* suggestions responsive */
465
-
.suggestions {
466
-
line-height: 2;
467
-
}
468
-
469
-
/* related items more compact */
470
-
.related-item {
471
-
max-width: 150px;
472
-
font-size: 11px;
473
-
padding: 0.5rem;
474
-
}
475
-
}
476
-
477
-
/* ensure touch targets on tablets too */
478
-
@media (hover: none) and (pointer: coarse) {
479
-
.tag, .platform-option, .suggestion, .related-item {
480
-
min-height: 44px;
481
-
display: inline-flex;
482
-
align-items: center;
483
-
}
484
-
}
485
349
</style>
486
350
</head>
487
351
<body>
488
352
<div class="container">
489
-
<h1><a href="/" class="title">pub search</a> <span class="by">by <a href="https://bsky.app/profile/zzstoatzz.io" target="_blank">@zzstoatzz.io</a></span> <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search" target="_blank" class="src">[src]</a></h1>
353
+
<h1><a href="/" class="title">leaflet search</a> <span class="by">by <a href="https://bsky.app/profile/zzstoatzz.io" target="_blank">@zzstoatzz.io</a></span> <a href="https://tangled.sh/@zzstoatzz.io/leaflet-search" target="_blank" class="src">[src]</a></h1>
490
354
491
355
<div class="search-box">
492
356
<input type="text" id="query" placeholder="search content..." autofocus>
···
499
363
500
364
<div id="tags" class="tags"></div>
501
365
502
-
<div id="platform-filter" class="platform-filter"></div>
503
-
504
366
<div id="results" class="results">
505
367
<div class="empty-state">
506
-
<p>search atproto publishing platforms</p>
507
-
<p style="font-size:11px;margin-top:0.5rem"><a href="https://leaflet.pub" target="_blank">leaflet</a> ยท <a href="https://pckt.blog" target="_blank">pckt</a> ยท <a href="https://standard.site" target="_blank">standard.site</a></p>
368
+
<p>search for <a href="https://leaflet.pub" target="_blank">leaflet.pub</a></p>
508
369
</div>
509
370
</div>
510
371
···
523
384
const tagsDiv = document.getElementById('tags');
524
385
const activeFilterDiv = document.getElementById('active-filter');
525
386
const suggestionsDiv = document.getElementById('suggestions');
526
-
const platformFilterDiv = document.getElementById('platform-filter');
527
387
528
388
let currentTag = null;
529
-
let currentPlatform = null;
530
389
let allTags = [];
531
390
let popularSearches = [];
532
391
533
-
async function search(query, tag = null, platform = null) {
534
-
if (!query.trim() && !tag && !platform) return;
392
+
async function search(query, tag = null) {
393
+
if (!query.trim() && !tag) return;
535
394
536
395
searchBtn.disabled = true;
537
396
let searchUrl = `${API_URL}/search?q=${encodeURIComponent(query || '')}`;
538
397
if (tag) searchUrl += `&tag=${encodeURIComponent(tag)}`;
539
-
if (platform) searchUrl += `&platform=${encodeURIComponent(platform)}`;
540
398
resultsDiv.innerHTML = `<div class="status">searching...</div>`;
541
399
542
400
try {
···
559
417
if (results.length === 0) {
560
418
resultsDiv.innerHTML = `
561
419
<div class="empty-state">
562
-
<p>no results${query ? ` for ${formatQueryForDisplay(query)}` : ''}${tag ? ` in #${escapeHtml(tag)}` : ''}${platform ? ` on ${escapeHtml(platform)}` : ''}</p>
420
+
<p>no results${query ? ` for "${escapeHtml(query)}"` : ''}${tag ? ` in #${escapeHtml(tag)}` : ''}</p>
563
421
<p>try different keywords</p>
564
422
</div>
565
423
`;
···
571
429
572
430
for (const doc of results) {
573
431
const entityType = doc.type || 'article';
574
-
const platform = doc.platform || 'leaflet';
575
432
576
-
// build URL based on entity type and platform
577
-
const docUrl = buildDocUrl(doc, entityType, platform);
578
-
// only show platform badge for actual platforms, not for lexicon-only records
579
-
const platformConfig = PLATFORM_CONFIG[platform];
580
-
const platformBadge = platformConfig
581
-
? `<span class="platform-badge">${escapeHtml(platformConfig.label)}</span>`
582
-
: '';
583
-
const date = doc.createdAt ? new Date(doc.createdAt).toLocaleDateString() : '';
584
-
585
-
// platform home URL for meta link
586
-
const platformHome = getPlatformHome(platform, doc.basePath);
433
+
// build URL based on entity type
434
+
let leafletUrl = null;
435
+
if (entityType === 'publication') {
436
+
// publications link to their base path
437
+
leafletUrl = doc.basePath ? `https://${doc.basePath}` : null;
438
+
} else {
439
+
// articles and looseleafs link to specific document
440
+
leafletUrl = doc.basePath && doc.rkey
441
+
? `https://${doc.basePath}/${doc.rkey}`
442
+
: (doc.did && doc.rkey ? `https://leaflet.pub/p/${doc.did}/${doc.rkey}` : null);
443
+
}
587
444
445
+
const date = doc.createdAt ? new Date(doc.createdAt).toLocaleDateString() : '';
446
+
const platform = doc.platform || 'leaflet';
447
+
const platformBadge = platform !== 'leaflet' ? `<span class="platform-badge">${escapeHtml(platform)}</span>` : '';
588
448
html += `
589
449
<div class="result">
590
450
<div class="result-title">
591
451
<span class="entity-type ${entityType}">${entityType}</span>${platformBadge}
592
-
${docUrl
593
-
? `<a href="${docUrl}" target="_blank">${escapeHtml(doc.title || 'Untitled')}</a>`
452
+
${leafletUrl
453
+
? `<a href="${leafletUrl}" target="_blank">${escapeHtml(doc.title || 'Untitled')}</a>`
594
454
: escapeHtml(doc.title || 'Untitled')}
595
455
</div>
596
456
<div class="result-snippet">${highlightTerms(doc.snippet, query)}</div>
597
457
<div class="result-meta">
598
-
${date ? `${date} | ` : ''}${platformHome.url
599
-
? `<a href="${platformHome.url}" target="_blank">${platformHome.label}</a>`
600
-
: platformHome.label}
458
+
${date ? `${date} | ` : ''}${doc.basePath
459
+
? `<a href="https://${doc.basePath}" target="_blank">${doc.basePath}</a>`
460
+
: `<a href="https://leaflet.pub" target="_blank">leaflet.pub</a>`}
601
461
</div>
602
462
</div>
603
463
`;
···
625
485
})[c]);
626
486
}
627
487
628
-
// display query without adding redundant quotes
629
-
function formatQueryForDisplay(query) {
630
-
if (!query) return '';
631
-
const escaped = escapeHtml(query);
632
-
// if query is already fully quoted, don't add more quotes
633
-
if (query.startsWith('"') && query.endsWith('"')) {
634
-
return escaped;
635
-
}
636
-
return `"${escaped}"`;
637
-
}
638
-
639
-
// platform-specific URL patterns
640
-
// note: some platforms use basePath from publication, which we prefer
641
-
// fallback docUrl() is used when basePath is missing
642
-
const PLATFORM_CONFIG = {
643
-
leaflet: {
644
-
home: 'https://leaflet.pub',
645
-
label: 'leaflet.pub',
646
-
// leaflet uses did/rkey pattern for fallback URLs
647
-
docUrl: (did, rkey) => `https://leaflet.pub/p/${did}/${rkey}`
648
-
},
649
-
pckt: {
650
-
home: 'https://pckt.blog',
651
-
label: 'pckt.blog',
652
-
// pckt uses blog slugs + path, not did/rkey - needs basePath from publication
653
-
docUrl: null
654
-
},
655
-
offprint: {
656
-
home: 'https://offprint.app',
657
-
label: 'offprint.app',
658
-
// offprint is in early beta, URL pattern unknown
659
-
docUrl: null
660
-
},
661
-
};
662
-
663
-
function buildDocUrl(doc, entityType, platform) {
664
-
if (entityType === 'publication') {
665
-
return doc.basePath ? `https://${doc.basePath}` : null;
666
-
}
667
-
668
-
// Platform-specific URL patterns:
669
-
// 1. Leaflet: basePath + rkey (e.g., https://dad.leaflet.pub/3mburumcnbs2m)
670
-
if (platform === 'leaflet' && doc.basePath && doc.rkey) {
671
-
return `https://${doc.basePath}/${doc.rkey}`;
672
-
}
673
-
674
-
// 2. pckt: basePath + path (e.g., https://devlog.pckt.blog/some-slug-abc123)
675
-
if (platform === 'pckt' && doc.basePath && doc.path) {
676
-
return `https://${doc.basePath}${doc.path}`;
677
-
}
678
-
679
-
// 3. Other platforms with path: basePath + path
680
-
if (doc.basePath && doc.path) {
681
-
return `https://${doc.basePath}${doc.path}`;
682
-
}
683
-
684
-
// 4. Platform-specific fallback URL (e.g., leaflet.pub/p/did/rkey)
685
-
const config = PLATFORM_CONFIG[platform];
686
-
if (config?.docUrl && doc.did && doc.rkey) {
687
-
return config.docUrl(doc.did, doc.rkey);
688
-
}
689
-
690
-
// 5. Fallback: pdsls.dev universal viewer (always works for any AT Protocol record)
691
-
if (doc.uri) {
692
-
return `https://pdsls.dev/${doc.uri}`;
693
-
}
694
-
695
-
return null;
696
-
}
697
-
698
-
function getPlatformHome(platform, basePath) {
699
-
if (basePath) {
700
-
return { url: `https://${basePath}`, label: basePath };
701
-
}
702
-
const config = PLATFORM_CONFIG[platform];
703
-
if (config) {
704
-
return { url: config.home, label: config.label };
705
-
}
706
-
// unknown platform using standard.site lexicon - link to standard.site
707
-
return { url: 'https://standard.site', label: 'standard.site' };
708
-
}
709
-
710
488
function highlightTerms(text, query) {
711
489
if (!text || !query) return escapeHtml(text);
712
490
const terms = query.toLowerCase().split(/\s+/).filter(t => t.length > 0);
···
725
503
const q = queryInput.value.trim();
726
504
if (q) params.set('q', q);
727
505
if (currentTag) params.set('tag', currentTag);
728
-
if (currentPlatform) params.set('platform', currentPlatform);
729
506
const url = params.toString() ? `?${params}` : '/';
730
507
history.pushState(null, '', url);
731
508
}
732
509
733
510
function doSearch() {
734
511
updateUrl();
735
-
search(queryInput.value, currentTag, currentPlatform);
512
+
search(queryInput.value, currentTag);
736
513
}
737
514
738
515
function setTag(tag) {
739
-
if (currentTag === tag) {
740
-
clearTag();
741
-
return;
742
-
}
743
516
currentTag = tag;
744
517
renderActiveFilter();
745
518
renderTags();
···
751
524
renderActiveFilter();
752
525
renderTags();
753
526
updateUrl();
754
-
if (queryInput.value.trim() || currentPlatform) {
755
-
search(queryInput.value, null, currentPlatform);
756
-
} else {
757
-
renderEmptyState();
758
-
}
759
-
}
760
-
761
-
function setPlatform(platform) {
762
-
if (currentPlatform === platform) {
763
-
clearPlatform();
764
-
return;
765
-
}
766
-
currentPlatform = platform;
767
-
renderActiveFilter();
768
-
renderPlatformFilter();
769
-
doSearch();
770
-
}
771
-
772
-
function clearPlatform() {
773
-
currentPlatform = null;
774
-
renderActiveFilter();
775
-
renderPlatformFilter();
776
-
updateUrl();
777
-
if (queryInput.value.trim() || currentTag) {
778
-
search(queryInput.value, currentTag, null);
527
+
if (queryInput.value.trim()) {
528
+
search(queryInput.value, null);
779
529
} else {
780
530
renderEmptyState();
781
531
}
782
532
}
783
533
784
-
function renderPlatformFilter() {
785
-
const platforms = [
786
-
{ id: 'leaflet', label: 'leaflet' },
787
-
{ id: 'pckt', label: 'pckt' },
788
-
];
789
-
const html = platforms.map(p => `
790
-
<span class="platform-option${currentPlatform === p.id ? ' active' : ''}" onclick="setPlatform('${p.id}')">${p.label}</span>
791
-
`).join('');
792
-
platformFilterDiv.innerHTML = `<div class="platform-filter-label">filter by platform:</div><div class="platform-filter-list">${html}</div>`;
793
-
}
794
-
795
534
function renderActiveFilter() {
796
-
if (!currentTag && !currentPlatform) {
535
+
if (!currentTag) {
797
536
activeFilterDiv.innerHTML = '';
798
537
return;
799
538
}
800
-
let parts = [];
801
-
if (currentTag) parts.push(`tag: <strong>#${escapeHtml(currentTag)}</strong>`);
802
-
if (currentPlatform) parts.push(`platform: <strong>${escapeHtml(currentPlatform)}</strong>`);
803
-
const clearActions = [];
804
-
if (currentTag) clearActions.push(`<span class="clear" onclick="clearTag()">ร tag</span>`);
805
-
if (currentPlatform) clearActions.push(`<span class="clear" onclick="clearPlatform()">ร platform</span>`);
806
539
activeFilterDiv.innerHTML = `
807
540
<div class="active-filter">
808
-
<span>filtering by ${parts.join(', ')} <span style="color:#666;font-size:10px">(documents only)</span></span>
809
-
${clearActions.join(' ')}
541
+
<span>filtering by tag: <strong>#${escapeHtml(currentTag)}</strong> <span style="color:#666;font-size:10px">(documents only)</span></span>
542
+
<span class="clear" onclick="clearTag()">ร clear</span>
810
543
</div>
811
544
`;
812
545
}
···
868
601
function renderEmptyState() {
869
602
resultsDiv.innerHTML = `
870
603
<div class="empty-state">
871
-
<p>search atproto publishing platforms</p>
872
-
<p style="font-size:11px;margin-top:0.5rem"><a href="https://leaflet.pub" target="_blank">leaflet</a> ยท <a href="https://pckt.blog" target="_blank">pckt</a> ยท <a href="https://standard.site" target="_blank">standard.site</a></p>
604
+
<p>search for <a href="https://leaflet.pub" target="_blank">leaflet.pub</a></p>
873
605
</div>
874
606
`;
875
607
}
···
888
620
const params = new URLSearchParams(location.search);
889
621
queryInput.value = params.get('q') || '';
890
622
currentTag = params.get('tag') || null;
891
-
currentPlatform = params.get('platform') || null;
892
623
renderActiveFilter();
893
624
renderTags();
894
-
renderPlatformFilter();
895
-
if (queryInput.value || currentTag || currentPlatform) search(queryInput.value, currentTag, currentPlatform);
625
+
if (queryInput.value || currentTag) search(queryInput.value, currentTag);
896
626
});
897
627
898
628
// init
899
629
const initialParams = new URLSearchParams(location.search);
900
630
const initialQuery = initialParams.get('q');
901
631
const initialTag = initialParams.get('tag');
902
-
const initialPlatform = initialParams.get('platform');
903
632
if (initialQuery) queryInput.value = initialQuery;
904
633
if (initialTag) currentTag = initialTag;
905
-
if (initialPlatform) currentPlatform = initialPlatform;
906
634
renderActiveFilter();
907
-
renderPlatformFilter();
908
635
909
-
if (initialQuery || initialTag || initialPlatform) {
910
-
search(initialQuery || '', initialTag, initialPlatform);
636
+
if (initialQuery || initialTag) {
637
+
search(initialQuery || '', initialTag);
911
638
}
912
639
913
640
async function loadRelated(topResult) {
···
933
660
if (filtered.length === 0) return;
934
661
935
662
const items = filtered.map(doc => {
936
-
const platform = doc.platform || 'leaflet';
937
-
const url = buildDocUrl(doc, doc.type || 'article', platform);
663
+
const url = doc.basePath && doc.rkey
664
+
? `https://${doc.basePath}/${doc.rkey}`
665
+
: (doc.did && doc.rkey ? `https://leaflet.pub/p/${doc.did}/${doc.rkey}` : null);
938
666
return url
939
667
? `<a href="${url}" target="_blank" class="related-item">${escapeHtml(doc.title || 'Untitled')}</a>`
940
668
: `<span class="related-item">${escapeHtml(doc.title || 'Untitled')}</span>`;
+40
-32
site/loading.js
+40
-32
site/loading.js
···
82
82
const style = document.createElement('style');
83
83
style.id = 'loader-styles';
84
84
style.textContent = `
85
-
/* skeleton shimmer - subtle pulse */
85
+
/* skeleton shimmer for loading values */
86
86
.loading .metric-value,
87
87
.loading .doc-count,
88
88
.loading .pub-count {
89
-
color: #333 !important;
90
-
animation: dim-pulse 2s ease-in-out infinite;
89
+
background: linear-gradient(90deg, #1a1a1a 25%, #252525 50%, #1a1a1a 75%);
90
+
background-size: 200% 100%;
91
+
animation: shimmer 1.5s infinite;
92
+
border-radius: 3px;
93
+
color: transparent !important;
94
+
min-width: 3ch;
95
+
display: inline-block;
91
96
}
92
97
93
-
@keyframes dim-pulse {
94
-
0%, 100% { opacity: 0.3; }
95
-
50% { opacity: 0.6; }
98
+
@keyframes shimmer {
99
+
0% { background-position: 200% 0; }
100
+
100% { background-position: -200% 0; }
96
101
}
97
102
98
-
/* wake message - terminal style, ephemeral */
103
+
/* wake message */
99
104
.wake-message {
100
105
position: fixed;
101
-
bottom: 1rem;
102
-
left: 1rem;
103
-
font-family: monospace;
106
+
top: 1rem;
107
+
right: 1rem;
104
108
font-size: 11px;
105
-
color: #444;
109
+
color: #666;
110
+
background: #111;
111
+
border: 1px solid #222;
112
+
padding: 6px 12px;
113
+
border-radius: 4px;
114
+
display: flex;
115
+
align-items: center;
116
+
gap: 8px;
106
117
z-index: 1000;
107
-
animation: fade-in 0.5s ease;
108
-
}
109
-
110
-
.wake-message::before {
111
-
content: '>';
112
-
margin-right: 6px;
113
-
opacity: 0.5;
118
+
animation: fade-in 0.2s ease;
114
119
}
115
120
116
121
.wake-dot {
117
-
display: inline-block;
118
-
width: 4px;
119
-
height: 4px;
120
-
background: #555;
122
+
width: 6px;
123
+
height: 6px;
124
+
background: #4ade80;
121
125
border-radius: 50%;
122
-
margin-left: 4px;
123
-
animation: blink 1s step-end infinite;
126
+
animation: pulse-dot 1s infinite;
124
127
}
125
128
126
-
@keyframes blink {
127
-
0%, 100% { opacity: 1; }
128
-
50% { opacity: 0; }
129
+
@keyframes pulse-dot {
130
+
0%, 100% { opacity: 0.3; }
131
+
50% { opacity: 1; }
129
132
}
130
133
131
134
@keyframes fade-in {
132
-
from { opacity: 0; }
133
-
to { opacity: 1; }
135
+
from { opacity: 0; transform: translateY(-4px); }
136
+
to { opacity: 1; transform: translateY(0); }
134
137
}
135
138
136
139
.wake-message.fade-out {
137
-
animation: fade-out 0.5s ease forwards;
140
+
animation: fade-out 0.3s ease forwards;
138
141
}
139
142
140
143
@keyframes fade-out {
141
-
to { opacity: 0; }
144
+
to { opacity: 0; transform: translateY(-4px); }
142
145
}
143
146
144
147
/* loaded transition */
145
148
.loaded .metric-value,
146
149
.loaded .doc-count,
147
150
.loaded .pub-count {
148
-
animation: none;
151
+
animation: reveal 0.3s ease;
152
+
}
153
+
154
+
@keyframes reveal {
155
+
from { opacity: 0; }
156
+
to { opacity: 1; }
149
157
}
150
158
`;
151
159
document.head.appendChild(style);
+4
-5
tap/fly.toml
+4
-5
tap/fly.toml
···
1
1
app = 'leaflet-search-tap'
2
-
primary_region = 'ewr'
2
+
primary_region = 'iad'
3
3
4
4
[build]
5
5
image = 'ghcr.io/bluesky-social/indigo/tap:latest'
···
9
9
TAP_BIND = ':2480'
10
10
TAP_RELAY_URL = 'https://relay1.us-east.bsky.network'
11
11
TAP_SIGNAL_COLLECTION = 'pub.leaflet.document'
12
-
TAP_COLLECTION_FILTERS = 'pub.leaflet.document,pub.leaflet.publication,site.standard.document,site.standard.publication'
12
+
TAP_COLLECTION_FILTERS = 'pub.leaflet.document,pub.leaflet.publication'
13
+
TAP_DISABLE_ACKS = 'true'
13
14
TAP_LOG_LEVEL = 'info'
14
-
TAP_RESYNC_PARALLELISM = '2'
15
-
TAP_IDENT_CACHE_SIZE = '10000'
16
15
TAP_CURSOR_SAVE_INTERVAL = '5s'
17
16
TAP_REPO_FETCH_TIMEOUT = '600s'
18
17
···
24
23
min_machines_running = 1
25
24
26
25
[[vm]]
27
-
memory = '1gb'
26
+
memory = '2gb'
28
27
cpu_kind = 'shared'
29
28
cpus = 1
30
29