commits
std.Options.debug_io is backed by global_single_threaded which silently
serializes all mutex/sleep/network ops. override with Io.Threaded so
concurrent threads actually run concurrently.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- JetstreamClient.init now takes io parameter
- subscribe returns Io.Cancelable!void
- includes websocket cross-platform fix and io threading
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
major API migrations:
- std.Thread.Mutex → std.Io.Mutex (lockUncancelable/unlock with io)
- std.time.timestamp/milliTimestamp → Io.Timestamp.now helpers
- std.posix.getenv → std.c.getenv + mem.span
- std.heap.GeneralPurposeAllocator → std.heap.smp_allocator
- std.net.Address → std.Io.net.IpAddress
- Thread.Pool → Thread.spawn + detach
- std.fs.* → std.Io.Dir.* (createDirPath, openFileAbsolute, readStreaming)
- std.http.Server.init takes *Io.Reader/*Io.Writer via stream.interface
- ArrayList init .{} → .empty
- HttpTransport.init now takes (io, allocator) per zat v0.3.0-alpha.4
- Dockerfile: zig 0.15.2 → 0.16.0-dev.3059
deps: zat v0.3.0-alpha.4, logfire-zig zig-0.16 branch, otel-zig zig-0.16 fork
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- subscribe to sh.tangled.actor.profile in ingester (jetstream)
- fetchProfileFromPds tries tangled profile collection as fallback
- store bare CIDs, reconstruct avatar URLs at search time using pds column
- sync pds column from turso to local SQLite for zig search backend
- enrichment: try PDS fallback for non-bsky PDS actors (not just takedowns)
- add admin auth bypass for /request-indexing rate limiter
- add scripts/index-domain.py for bulk domain handle discovery + indexing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Explains: we respect bluesky's labels by default, but enrich banned accounts
via PDS-direct fallback so they can be un-hidden case-by-case. Slur handles
are always filtered. Links to atproto's credible exit philosophy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ports bluesky's 7 Unicode-aware slur regexes into hasExplicitSlur(), which
checks both the raw handle and a separator-stripped version. Integrated into
extractProfileFields() so all enrichment paths auto-hide slur handles.
Also adds !suspend to MOD_HIDE_VALS (was missing, functionally identical to !takedown).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reads DIDs from a file, resolves PDS via slingshot in parallel,
writes back to Turso. used to bootstrap PDS data for the 44K actors
that enrichment tried but bsky refused to serve.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
three changes:
1. handleRequestIndexing now persists PDS from slingshot response
2. enrichment phase 1b: backfill PDS via slingshot for actors that
have handles but no PDS (100/run, 24h backoff)
3. enrichment phase 2 + refreshModeration: for DIDs not returned by
getProfiles, probe getProfile for AccountTakedown. if confirmed
and PDS is available, fetch profile directly — no show override
needed. just protocol-level data from their own PDS.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the zig search backend unconditionally wrapped avatar_url in the CDN
prefix. now checks for https:// prefix and passes full URLs through
as-is, matching the worker's avatarUrl() behavior. also removes
temporary diagnostic logging from admin handler.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
extractAvatarCid was stripping PDS blob URLs down to a useless path
segment. now detects non-CDN https:// URLs and stores them as-is —
avatarUrl() already handles full URLs on the read path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bsky API returns nothing for banned/suspended accounts, so actors with
show overrides get indexed without avatar or displayName. now fetches
profile data directly from the actor's PDS via com.atproto.repo.getRecord
when bsky refuses and a show override exists. zero cost in the normal
case — only triggers for the intersection of bsky-missing + show-override.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
update expected stats heading (5 min not hourly), allow labels/createdAt/
associated fields in actor response shape (intentional additions).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
add X-Client: bench header to bench.py, filter from pie chart query.
update architecture.md with two-phase search strategy docs.
add justfile, targeted-backfill script, gitignore cleanup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
extract bloom filter, ingest handler, and search into separate files.
main.zig is now just the entrypoint + thread orchestration (617→155 lines).
also includes two-phase search fix for FTS5 and LIKE prefix queries,
eliminating pathological full-scan sorts on broad prefixes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cached (edge) vs cold (turso) avg latency in the same card.
cache hits were previously invisible — only turso round-trips
were recorded, making the displayed avg misleadingly high.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the 7-day window was dominated by write-contention spikes from
bulk operations, showing 439ms when current performance is ~111ms.
24h window lets transient spikes age out in a day.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
background thread on :8080 responds with JSON status
(ingested count, deleted count, bloom filter size, RSS).
added [http_service] to fly.toml with shared IPv4.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- conditional FTS trigger: only fires when handle/display_name
actually change (was: every UPDATE, even labels-only)
- chunk handleDelete into batches of 200 with 100ms pauses
(was: all 10k DIDs in one transaction)
- bulk-enrich.py: WRITE_BATCH=50, WRITE_PAUSE=0.2
(was: 500/0 — hammered turso with no breathing room)
- refreshModeration: skip no-op updates when labels/hidden
unchanged, add correlation logging (skipped, ms)
- search: 3-tier ranking (exact handle → handle prefix → FTS)
fixes jay.bsky.team being buried under offor-jay.bsky.social
- tombstones table for deletion propagation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cron writes metrics/snapshots/deltas to KV hourly. /stats reads from
KV (edge-fast) + single turso query for real-time traffic pie chart.
~76ms cold miss, down from 4-6s.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- lazy singleton for turso client in worker isolate — eliminates
per-request endpoint discovery round-trip (~6s → ~900ms on cache miss)
- cleanup script: scan by rowid instead of WHERE handle='' to avoid
turso timeout on full table scan, filter client-side
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
refreshModeration now deletes empty-handle actors missing from getProfiles.
new cleanup-dead-actors.py script for bulk removal with --dry-run/--limit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CF's zone-level Browser Cache TTL was overriding our max-age=60 to
max-age=14400 (4 hours), causing stale stats in browsers. split cache
strategy: Cache API entry uses max-age=60 for edge TTL, browser gets
no-store so refreshes always revalidate through the edge.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- add created_at (ISO 8601) and associated (JSON) columns to actors table
- extract shared extractProfileFields() for 4 callsites (enrichment, cron, backfill, admin)
- cleanAssociated() strips zero/false fields to match bsky's compact typeahead shape
- search response now returns full profileViewBasic surface minus viewer
- db.batch() supports read/write mode param (was hardcoded to "write")
- stats handler uses read-mode batch (3-6s → 50-250ms warm)
- stats handler uses CF edge cache (60s TTL) with Server-Timing headers
- search handler cache.put moved to ctx.waitUntil (non-blocking)
- add PLC export streaming backfill script for bulk created_at population
- update docs, architecture, README for new fields
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bsky's ImageUriBuilder doesn't append a format suffix — the CDN
defaults to webp which is smaller. our @jpeg override was forcing
unnecessary JPEG transcoding and producing URLs that differed from
bsky's own searchActorsTypeahead response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- add labels column to actors, hidden index to schema
- remove stale !no-unauthenticated self-label check from ingester
(only API-based paths can correctly determine hidden)
- fix toArrayList() → written() memory leak in ingester HTTP calls
- add RSS logging to ingester flush loop
- disable HTTP keep-alive (fly.io proxy compat)
- add add-labels-column.sql migration and backfill-profiles.py script
- update architecture docs for labels, enrichment phase 2, moderation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
break 1849-line single file into 16 modules with clean DAG
dependency structure. add backfill drift detection (discovered
counter). fix README and docs page to reflect that labels are
returned in search results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
replaces 3 ad-hoc std.http.Client instantiations with a single shared
HttpTransport from zat v0.2.18. gives connection keepalive to the worker,
centralizes the gzip workaround, and isolates the 0.16 migration surface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- move INCIDENT.md → docs/notes/ingester-oom-firehose-widening.md
- update architecture.md: 2x enrichment throughput, metrics section
(5-min actor deltas, 5-min search buckets, traffic normalization),
add actor_deltas migration to scripts list
- track add-enrichment-columns.sql migration script
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- actor_deltas table tracks actors/handles/avatars at 5-min granularity
- instrument ingest, delete, and enrichment paths to record deltas
- stitch deltas after last hourly snapshot for interpolated trend points
- search metrics switch from hourly to 5-min buckets (LIMIT 2016)
- stats layout: pills co-located with their charts (3 under trend, 2 under sparkline)
- legend + tooltip sort dynamically by value descending
- fix x-axis timezone bug (UTC midnight labels were off by a day)
- normalize loopback IPs to "unknown" in traffic sources
- singular/plural in sparkline + pie tooltips
- 2x enrichment throughput (100 identity / 20 avatar per run)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
replace unbounded hash map dedup (grew to 256MB → OOM every ~4h) with a
fixed-size bloom filter (~1.2MB, 10M bits, 7 hashes). split bare-DID
events in worker to use INSERT OR IGNORE (0 Turso writes for known actors)
instead of full UPSERT that triggered FTS5 churn on every hit.
also clean up /docs page: accurate indexing description, remove speculative
comparisons, add syntax highlighting to code blocks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
previously only subscribed to app.bsky.actor.profile — now also
ingests app.bsky.feed.post, app.bsky.feed.like, app.bsky.graph.follow.
non-profile commits extract just the DID (bare upsert), with a bounded
dedup set (500k cap) to avoid redundant writes. also bumps hourly
handle resolution from 1000→5000 with 10-concurrent parallel calls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
same-origin fetches don't send Origin, so fall back to Referer to
detect searches from our own homepage. both "homepage" and "unknown"
get muted gray colors and aren't rendered as links in the legend.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
non-unknown domains link to their site (subtle dotted underline,
brightens on hover). unknown stays plain text.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
prefer X-Client over Origin for tracking, works from any context
(server-side, CLI, native). documented in /docs with code example.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
explains how to swap the base URL, documents the response field
differences, and includes the plyr.fm migration as a worked example.
linked from homepage footer and stats page pie chart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
track search API traffic by Origin header domain, display as an
animated donut chart with hover tooltips and a flex-wrap legend.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
!no-unauthenticated is not filtered — it applies to content, not
identity. only bluesky mod service labels (!hide, !takedown, spam)
hide actors. also removed stale ingester description about detecting
!no-unauthenticated self-labels.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
D1 abstraction layer (d1Db, dualWriteDb, getBackend, createDb,
StorageBackend) was migration scaffolding — Turso is serving all
traffic. also adds @libsql/client dep, gitignores .dev.vars,
updates smoke tests for !no-unauthenticated inclusion policy,
and includes the migration script for historical reference.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
moderation: hide actors with bluesky moderation labels (!hide, !takedown,
spam) and self-applied !no-unauthenticated from search results. ingester
detects self-labels on ingest; hourly cron refreshes labels by walking the
index. request-indexing endpoint returns JSON instead of HTML.
avatar CIDs: store ~59-byte CIDs instead of ~130-byte full CDN URLs in
avatar_url column. reconstruct full URLs at query time via avatarUrl().
helper handles both formats for safe deploy-then-migrate ordering.
saves ~70 bytes/row (~2.8GB at 40M actors vs the 10GB D1 ceiling).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ingester adds ~2k actors/hour without handles, 200/hour wasn't
keeping up.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
iOS Safari auto-zooms on inputs below 16px.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
profile commits from Jetstream carry avatar/display_name but not
handles, so most ingested actors lack handles (22% coverage). the
hourly cron now resolves up to 200 missing handles per run via
slingshot, prioritizing recently updated actors. at 200/hour the
~22k backlog should clear in ~4-5 days.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
snapshot collection was piggybacking on uncached searches, so hours
with no traffic produced no data points. now a cron trigger runs
every hour at :00 and calls recordSnapshot directly. also removed
the KV-gated snapshot logic from recordMetric since the cron handles
it unconditionally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
replace the 4-metric grid on /stats with a canvas line chart showing
total actors, with handles, and with avatars over time. chart uses
multi-layered glow rendering inspired by relay-eval, with hover
crosshair + tooltip and touch support for mobile.
new snapshots table records actor counts hourly (first uncached search
per hour triggers a snapshot via KV flag). live counts appended as the
latest point so the chart always extends to now.
also fixes /request-indexing rate limiting: returns HTML with a friendly
message instead of raw JSON, and uses the standard rate limiter (60/min)
with a namespaced key instead of the strict limiter.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
writeJsonEscaped + ~45 lines of hand-rolled brace/comma/quote
juggling → jw.write(...) with emit_null_optional_fields = false.
stdlib handles all escaping per RFC 8259.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
clarify that we match the endpoint shape but return a subset of
profileViewBasic fields (no moderation labels, viewer state, etc).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- server-rendered stats page: actor counts, sparkline (7d searches/hour),
avg latency, handle/avatar coverage with CSS tooltips
- metrics table + fire-and-forget hourly recording via ctx.waitUntil
- move handle != '' filter into SQL WHERE (before LIMIT) so results
aren't short-changed by empty-handle rows consuming limit slots
- smoke test for /stats endpoint
- stats link in homepage footer
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- bind empty string instead of null for missing handles (fixes batch failures)
- use COALESCE(NULLIF(...)) to preserve existing handles on partial updates
- filter empty-handle actors from search results
- reject limit <= 0, NaN, and non-numeric values with 400
- restore SLINGSHOT_URL constant for /request-indexing
- trim readme to match project style
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
unhandled KV quota errors (free tier: 1000 writes/day) were causing
the entire /admin/ingest endpoint to return 500, which made it look
like D1 was rejecting writes. the actual ingest was never reached.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
500 upserts with FTS5 triggers = ~1500 write ops per batch, which
trips D1's write rate limit (error 1101). 100 is well within limits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the previous retry logic accumulated all events into a single growing
buffer and re-sent the entire thing on every incoming event. D1 rejects
large batches (1101), so this created a death spiral: bigger batch →
rejection → buffer grows → even bigger batch.
now flush sends at most MAX_BATCH (500) items per attempt, shifts
remaining items forward, and waits 5 seconds before retrying after
a failure. during catch-up this drains the backlog incrementally
instead of choking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ingester: retain failed buffers for retry instead of dropping them.
cursor only advances after both ingest and delete batches succeed.
backlog overflow cap (5000 events) prevents unbounded memory growth
when D1 is persistently unavailable.
cursor fetch retries 3 times with backoff (1s, 3s, 10s) before
falling back to live.
smoke tests: empty query now expects 400 (not 200), added test for
limit>100 returning 400.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
documents the response shape gap (missing labels, associated, createdAt)
and lack of moderation filtering as explicit known limitations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the ingester now fetches the last cursor from the worker on startup
and passes it to jetstream, so restarts resume where they left off
instead of starting from live.
removed synchronous slingshot call from profile commit handling —
identity events already carry handles, and the backfill/request-indexing
paths cover any gaps. this eliminates an external dependency from the
hot path and unblocks processing during slingshot downtime.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- schema.sql: avatar_cid → avatar_url to match production
- sanitize(): use unicode-aware \p{L}\p{N} instead of ASCII-only \w
- return 400 for empty query and limit>100 to match Bluesky's behavior
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KV key "backfill" = "off" disables without redeploy. global rate
limit (10/min) caps total backfill writes regardless of user count.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the /request-indexing endpoint interpolated user input and slingshot
response data directly into html. added escHtml() for entity encoding.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
major API migrations:
- std.Thread.Mutex → std.Io.Mutex (lockUncancelable/unlock with io)
- std.time.timestamp/milliTimestamp → Io.Timestamp.now helpers
- std.posix.getenv → std.c.getenv + mem.span
- std.heap.GeneralPurposeAllocator → std.heap.smp_allocator
- std.net.Address → std.Io.net.IpAddress
- Thread.Pool → Thread.spawn + detach
- std.fs.* → std.Io.Dir.* (createDirPath, openFileAbsolute, readStreaming)
- std.http.Server.init takes *Io.Reader/*Io.Writer via stream.interface
- ArrayList init .{} → .empty
- HttpTransport.init now takes (io, allocator) per zat v0.3.0-alpha.4
- Dockerfile: zig 0.15.2 → 0.16.0-dev.3059
deps: zat v0.3.0-alpha.4, logfire-zig zig-0.16 branch, otel-zig zig-0.16 fork
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- subscribe to sh.tangled.actor.profile in ingester (jetstream)
- fetchProfileFromPds tries tangled profile collection as fallback
- store bare CIDs, reconstruct avatar URLs at search time using pds column
- sync pds column from turso to local SQLite for zig search backend
- enrichment: try PDS fallback for non-bsky PDS actors (not just takedowns)
- add admin auth bypass for /request-indexing rate limiter
- add scripts/index-domain.py for bulk domain handle discovery + indexing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ports bluesky's 7 Unicode-aware slur regexes into hasExplicitSlur(), which
checks both the raw handle and a separator-stripped version. Integrated into
extractProfileFields() so all enrichment paths auto-hide slur handles.
Also adds !suspend to MOD_HIDE_VALS (was missing, functionally identical to !takedown).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
three changes:
1. handleRequestIndexing now persists PDS from slingshot response
2. enrichment phase 1b: backfill PDS via slingshot for actors that
have handles but no PDS (100/run, 24h backoff)
3. enrichment phase 2 + refreshModeration: for DIDs not returned by
getProfiles, probe getProfile for AccountTakedown. if confirmed
and PDS is available, fetch profile directly — no show override
needed. just protocol-level data from their own PDS.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the zig search backend unconditionally wrapped avatar_url in the CDN
prefix. now checks for https:// prefix and passes full URLs through
as-is, matching the worker's avatarUrl() behavior. also removes
temporary diagnostic logging from admin handler.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bsky API returns nothing for banned/suspended accounts, so actors with
show overrides get indexed without avatar or displayName. now fetches
profile data directly from the actor's PDS via com.atproto.repo.getRecord
when bsky refuses and a show override exists. zero cost in the normal
case — only triggers for the intersection of bsky-missing + show-override.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
extract bloom filter, ingest handler, and search into separate files.
main.zig is now just the entrypoint + thread orchestration (617→155 lines).
also includes two-phase search fix for FTS5 and LIKE prefix queries,
eliminating pathological full-scan sorts on broad prefixes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- conditional FTS trigger: only fires when handle/display_name
actually change (was: every UPDATE, even labels-only)
- chunk handleDelete into batches of 200 with 100ms pauses
(was: all 10k DIDs in one transaction)
- bulk-enrich.py: WRITE_BATCH=50, WRITE_PAUSE=0.2
(was: 500/0 — hammered turso with no breathing room)
- refreshModeration: skip no-op updates when labels/hidden
unchanged, add correlation logging (skipped, ms)
- search: 3-tier ranking (exact handle → handle prefix → FTS)
fixes jay.bsky.team being buried under offor-jay.bsky.social
- tombstones table for deletion propagation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- lazy singleton for turso client in worker isolate — eliminates
per-request endpoint discovery round-trip (~6s → ~900ms on cache miss)
- cleanup script: scan by rowid instead of WHERE handle='' to avoid
turso timeout on full table scan, filter client-side
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CF's zone-level Browser Cache TTL was overriding our max-age=60 to
max-age=14400 (4 hours), causing stale stats in browsers. split cache
strategy: Cache API entry uses max-age=60 for edge TTL, browser gets
no-store so refreshes always revalidate through the edge.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- add created_at (ISO 8601) and associated (JSON) columns to actors table
- extract shared extractProfileFields() for 4 callsites (enrichment, cron, backfill, admin)
- cleanAssociated() strips zero/false fields to match bsky's compact typeahead shape
- search response now returns full profileViewBasic surface minus viewer
- db.batch() supports read/write mode param (was hardcoded to "write")
- stats handler uses read-mode batch (3-6s → 50-250ms warm)
- stats handler uses CF edge cache (60s TTL) with Server-Timing headers
- search handler cache.put moved to ctx.waitUntil (non-blocking)
- add PLC export streaming backfill script for bulk created_at population
- update docs, architecture, README for new fields
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- add labels column to actors, hidden index to schema
- remove stale !no-unauthenticated self-label check from ingester
(only API-based paths can correctly determine hidden)
- fix toArrayList() → written() memory leak in ingester HTTP calls
- add RSS logging to ingester flush loop
- disable HTTP keep-alive (fly.io proxy compat)
- add add-labels-column.sql migration and backfill-profiles.py script
- update architecture docs for labels, enrichment phase 2, moderation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- move INCIDENT.md → docs/notes/ingester-oom-firehose-widening.md
- update architecture.md: 2x enrichment throughput, metrics section
(5-min actor deltas, 5-min search buckets, traffic normalization),
add actor_deltas migration to scripts list
- track add-enrichment-columns.sql migration script
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- actor_deltas table tracks actors/handles/avatars at 5-min granularity
- instrument ingest, delete, and enrichment paths to record deltas
- stitch deltas after last hourly snapshot for interpolated trend points
- search metrics switch from hourly to 5-min buckets (LIMIT 2016)
- stats layout: pills co-located with their charts (3 under trend, 2 under sparkline)
- legend + tooltip sort dynamically by value descending
- fix x-axis timezone bug (UTC midnight labels were off by a day)
- normalize loopback IPs to "unknown" in traffic sources
- singular/plural in sparkline + pie tooltips
- 2x enrichment throughput (100 identity / 20 avatar per run)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
replace unbounded hash map dedup (grew to 256MB → OOM every ~4h) with a
fixed-size bloom filter (~1.2MB, 10M bits, 7 hashes). split bare-DID
events in worker to use INSERT OR IGNORE (0 Turso writes for known actors)
instead of full UPSERT that triggered FTS5 churn on every hit.
also clean up /docs page: accurate indexing description, remove speculative
comparisons, add syntax highlighting to code blocks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
previously only subscribed to app.bsky.actor.profile — now also
ingests app.bsky.feed.post, app.bsky.feed.like, app.bsky.graph.follow.
non-profile commits extract just the DID (bare upsert), with a bounded
dedup set (500k cap) to avoid redundant writes. also bumps hourly
handle resolution from 1000→5000 with 10-concurrent parallel calls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
D1 abstraction layer (d1Db, dualWriteDb, getBackend, createDb,
StorageBackend) was migration scaffolding — Turso is serving all
traffic. also adds @libsql/client dep, gitignores .dev.vars,
updates smoke tests for !no-unauthenticated inclusion policy,
and includes the migration script for historical reference.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
moderation: hide actors with bluesky moderation labels (!hide, !takedown,
spam) and self-applied !no-unauthenticated from search results. ingester
detects self-labels on ingest; hourly cron refreshes labels by walking the
index. request-indexing endpoint returns JSON instead of HTML.
avatar CIDs: store ~59-byte CIDs instead of ~130-byte full CDN URLs in
avatar_url column. reconstruct full URLs at query time via avatarUrl().
helper handles both formats for safe deploy-then-migrate ordering.
saves ~70 bytes/row (~2.8GB at 40M actors vs the 10GB D1 ceiling).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
profile commits from Jetstream carry avatar/display_name but not
handles, so most ingested actors lack handles (22% coverage). the
hourly cron now resolves up to 200 missing handles per run via
slingshot, prioritizing recently updated actors. at 200/hour the
~22k backlog should clear in ~4-5 days.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
snapshot collection was piggybacking on uncached searches, so hours
with no traffic produced no data points. now a cron trigger runs
every hour at :00 and calls recordSnapshot directly. also removed
the KV-gated snapshot logic from recordMetric since the cron handles
it unconditionally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
replace the 4-metric grid on /stats with a canvas line chart showing
total actors, with handles, and with avatars over time. chart uses
multi-layered glow rendering inspired by relay-eval, with hover
crosshair + tooltip and touch support for mobile.
new snapshots table records actor counts hourly (first uncached search
per hour triggers a snapshot via KV flag). live counts appended as the
latest point so the chart always extends to now.
also fixes /request-indexing rate limiting: returns HTML with a friendly
message instead of raw JSON, and uses the standard rate limiter (60/min)
with a namespaced key instead of the strict limiter.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- server-rendered stats page: actor counts, sparkline (7d searches/hour),
avg latency, handle/avatar coverage with CSS tooltips
- metrics table + fire-and-forget hourly recording via ctx.waitUntil
- move handle != '' filter into SQL WHERE (before LIMIT) so results
aren't short-changed by empty-handle rows consuming limit slots
- smoke test for /stats endpoint
- stats link in homepage footer
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- bind empty string instead of null for missing handles (fixes batch failures)
- use COALESCE(NULLIF(...)) to preserve existing handles on partial updates
- filter empty-handle actors from search results
- reject limit <= 0, NaN, and non-numeric values with 400
- restore SLINGSHOT_URL constant for /request-indexing
- trim readme to match project style
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the previous retry logic accumulated all events into a single growing
buffer and re-sent the entire thing on every incoming event. D1 rejects
large batches (1101), so this created a death spiral: bigger batch →
rejection → buffer grows → even bigger batch.
now flush sends at most MAX_BATCH (500) items per attempt, shifts
remaining items forward, and waits 5 seconds before retrying after
a failure. during catch-up this drains the backlog incrementally
instead of choking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ingester: retain failed buffers for retry instead of dropping them.
cursor only advances after both ingest and delete batches succeed.
backlog overflow cap (5000 events) prevents unbounded memory growth
when D1 is persistently unavailable.
cursor fetch retries 3 times with backoff (1s, 3s, 10s) before
falling back to live.
smoke tests: empty query now expects 400 (not 200), added test for
limit>100 returning 400.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the ingester now fetches the last cursor from the worker on startup
and passes it to jetstream, so restarts resume where they left off
instead of starting from live.
removed synchronous slingshot call from profile commit handling —
identity events already carry handles, and the backfill/request-indexing
paths cover any gaps. this eliminates an external dependency from the
hot path and unblocks processing during slingshot downtime.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>