social graph filtering research#

research notes on filtering feed posts by social graph distance ("kevin bacon hops").

the idea#

instead of showing all music posts, show only posts from people within N hops of the viewer's social graph. this would make the feed more relevant/trusted.

what we learned#

feed generators receive viewer identity#

feed requests include a JWT with the viewer's DID in the Authorization header. this enables per-viewer personalization.

AT Protocol graph APIs#

  • app.bsky.graph.getFollows - paginated list of who a user follows
  • app.bsky.graph.getFollowers - paginated list of who follows a user

jaz's graphd#

https://github.com/ericvolp12/bsky-experiments

jaz (works at bluesky) built an in-memory graph service using roaring bitmaps for fast set operations:

  • each user stored as two bitmaps: following and followers
  • endpoints: /moots, /intersect_followers, /follows_following, /does_follow
  • set operations (intersection, union) are very fast on compressed bitmaps
  • used to power the atlas visualization at https://bsky.jazco.dev/

spacecowboy's "for you" feed#

https://bsky.app/profile/spacecowboy17.bsky.social/feed/for-you

uses collaborative filtering: "finds people who liked the same posts as you, and shows you what else they liked recently." source code not public.

the core problem#

to answer "is post author within N hops of viewer?":

  • N=1 (direct follows): ~500-2000 people per user - manageable
  • N=2 (friends of friends): 100K-1M+ people
  • N=3: explodes to millions

key insight: we don't need exact distance, just "<= N or not"

possible approaches#

  1. external graphd - deploy jaz's graphd, query at feed-serve time
  2. sqlite graph - store follows in sqlite, compute distance on-demand
  3. pre-computed neighborhoods - cache each viewer's N-hop set
  4. bloom filters - approximate set membership (false positives ok)

references#

status#

implemented - we added a "music (following)" feed that filters to N=1 (direct follows only).

implementation:

  • extract viewer DID from JWT in authorization header
  • fetch viewer's follows via app.bsky.graph.getFollows (paginated)
  • store author_did on each post in sqlite
  • query with WHERE author_did IN (...) for the viewer's follows

this avoids the complexity of N>=2 while providing useful personalization.