extracts named entities (people, organizations, places, events) from Bluesky posts and tracks how they cluster together. entities that get discussed together form edges. when a real-world event spans multiple topics, clusters merge - discourse percolates into a unified conversation.

how it works#

NER bridge consumes the turbostream firehose, runs spaCy NER, extracts entities
labeler integration drops spam before it hits the graph (via Hailey's labeler)
entity graph tracks co-occurrences (entities in same post = edge), computes clusters via union-find
pheromone edges - edge weights decay exponentially, reinforced on repeated co-occurrence (ant colony optimization inspired)
surprise trending - entities ranked by statistical surprise vs baseline (z‑like), not raw counts
frontend visualizes entity activity, cluster structure, and firehose health

theoretical background#

the system draws from several sources:

percolation theory - we use the Newman-Ziff algorithm for efficient cluster detection. on lattices, percolation has a sharp phase transition at p_c ≈ 0.593. our graph isn't a lattice, so we calibrate empirically.

heterogeneous activity - Xie et al. 2021 showed that real social networks percolate at ~1/10th the uniform-theory threshold due to heterogeneous user activity. we weight mentions by user activity rate following this insight.

NER for topic detection - inspired by Hailey's trending topics. rather than embeddings on raw text (too noisy), extract structured entities to reduce surface area.

ATProto labeler system - spam filtering via com.atproto.label. we subscribe to Hailey's labeler stream and drop posts from accounts labeled as spam before NER processing.

design decisions

these are documented as arbitrary choices to be revisited:

decision	choice	why
edge definition	same-post co-occurrence	simplest, captures "discussed together"
edge weights	pheromone decay (configurable half-life)	ant colony inspired, recent co-occurrences matter more
activity threshold	0.01 mentions/sec (~3 per 5 min)	rate normalizes across quiet/busy periods
trending metric	surprise vs baseline (UI), trend ratio (backend)	anomaly detection, not popularity contest
percolation threshold	largest_cluster / active > 50%	placeholder, needs empirical calibration
entity position	hash(text) → (x, y)	deterministic, stable, no semantic meaning yet
user weighting	planned (currently off)	power users count more (Xie 2021)

see docs/02-semantic-percolation-plan.md for full rationale.

stack#

ner (python): turbostream consumer + spaCy NER + labeler gate → POST to backend
backend (zig): entity graph + websocket server + SQLite persistence
site: static html/css/js on cloudflare pages

run locally#

cd backend && zig build run                 # backend (entity graph + websocket)
cd ner && uv run coral-bridge               # NER bridge (turbostream → spaCy → backend)
cd site && npx wrangler pages dev .         # frontend

deploy#

cd backend && fly deploy
cd ner && fly deploy
cd site && npx wrangler pages deploy . --project-name coral

future work#

ideas being explored (not commitments):

semantic positioning - currently entities hash to arbitrary grid positions. could use embeddings to place semantically similar entities near each other, making the 2D layout a meaningful projection of topic space. unclear whether to embed entity names, representative posts, or cluster centroids.
temporal co-activity edges - entities that spike together might be related even without same-post co-occurrence. "earthquake" and "LA" could both trend during an event without always appearing together.
percolation calibration - the 50% threshold is arbitrary. need to correlate cluster merges with real-world events to understand what "discourse unification" actually looks like in the data.

references#

Newman & Ziff, Efficient Monte Carlo algorithm and high-precision results for percolation, Phys. Rev. Lett. 85 (2000)
Xie et al., Detecting and Modelling Real Percolation and Phase Transitions of Information on Social Media, Nature Human Behaviour (2021)
Hailey, Bluesky Trending Topics - NER approach for topic detection
Stauffer & Aharony, Introduction to Percolation Theory - theoretical foundations
ATProto Labels - moderation architecture

Clone this repository

coral#

what it does#