Template of a custom feed generator service for the Bluesky network in Ruby

Compare changes

Choose any two refs to compare.

+7 -2
Gemfile
··· 1 1 source "https://rubygems.org" 2 2 3 - gem 'blue_factory', '~> 0.1.5' 4 - gem 'skyfall', '~> 0.5' 3 + gem 'blue_factory', '~> 0.1.6' 4 + gem 'skyfall', '~> 0.6' 5 + gem 'didkit', '~> 0.2' 5 6 6 7 gem 'activerecord', '~> 8.0' 7 8 gem 'sinatra-activerecord', '~> 2.0' ··· 14 15 gem 'debug' 15 16 gem 'thin' 16 17 gem 'capistrano', '~> 2.0' 18 + 19 + # for net-ssh & capistrano 20 + gem 'ed25519', '>= 1.2', '< 2.0' 21 + gem 'bcrypt_pbkdf', '>= 1.0', '< 2.0' 17 22 end 18 23 19 24 gem 'ostruct' # for rake, to remove when rake is updated
+43 -33
Gemfile.lock
··· 21 21 tzinfo (~> 2.0, >= 2.0.5) 22 22 uri (>= 0.13.1) 23 23 base32 (0.3.4) 24 - base64 (0.2.0) 25 - benchmark (0.4.0) 26 - bigdecimal (3.1.9) 27 - blue_factory (0.1.5) 24 + base64 (0.3.0) 25 + bcrypt_pbkdf (1.1.1) 26 + benchmark (0.4.1) 27 + bigdecimal (3.2.2) 28 + blue_factory (0.1.6) 28 29 sinatra (~> 3.0) 29 30 capistrano (2.15.11) 30 31 highline ··· 34 35 net-ssh-gateway (>= 1.1.0) 35 36 cbor (0.5.9.8) 36 37 concurrent-ruby (1.3.5) 37 - connection_pool (2.5.0) 38 + connection_pool (2.5.3) 38 39 daemons (1.4.1) 39 40 date (3.4.1) 40 - debug (1.10.0) 41 + debug (1.11.0) 41 42 irb (~> 1.10) 42 43 reline (>= 0.3.8) 43 - drb (2.2.1) 44 + didkit (0.2.3) 45 + drb (2.2.3) 46 + ed25519 (1.4.0) 47 + erb (5.0.2) 44 48 eventmachine (1.2.7) 45 - faye-websocket (0.11.3) 49 + faye-websocket (0.12.0) 46 50 eventmachine (>= 0.12.0) 47 - websocket-driver (>= 0.5.1) 48 - highline (2.1.0) 51 + websocket-driver (>= 0.8.0) 52 + highline (3.1.2) 53 + reline 49 54 i18n (1.14.7) 50 55 concurrent-ruby (~> 1.0) 51 - io-console (0.8.0) 52 - irb (1.15.1) 56 + io-console (0.8.1) 57 + irb (1.15.2) 53 58 pp (>= 0.6.0) 54 59 rdoc (>= 4.0.0) 55 60 reline (>= 0.4.2) 56 - logger (1.6.6) 61 + logger (1.7.0) 57 62 minitest (5.25.5) 58 63 mustermann (3.0.3) 59 64 ruby2_keywords (~> 0.0.1) 60 - net-scp (4.0.0) 65 + net-scp (4.1.0) 61 66 net-ssh (>= 2.6.5, < 8.0.0) 62 67 net-sftp (4.0.0) 63 68 net-ssh (>= 5.0.0, < 8.0.0) 64 - net-ssh (7.2.0) 69 + net-ssh (7.3.0) 65 70 net-ssh-gateway (2.0.0) 66 71 net-ssh (>= 4.0.0) 67 - ostruct (0.6.1) 72 + ostruct (0.6.2) 68 73 pp (0.6.2) 69 74 prettyprint 70 75 prettyprint (0.2.0) 71 - psych (5.2.3) 76 + psych (5.2.6) 72 77 date 73 78 stringio 74 - rack (2.2.13) 79 + rack (2.2.17) 75 80 rack-protection (3.2.0) 76 81 base64 (>= 0.1.0) 77 82 rack (~> 2.2, >= 2.2.4) 78 83 rainbow (3.1.1) 79 - rake (13.2.1) 80 - rdoc (6.13.0) 84 + rake (13.3.0) 85 + rdoc (6.14.2) 86 + erb 81 87 psych (>= 4.0.0) 82 - reline (0.6.0) 88 + reline (0.6.1) 83 89 io-console (~> 0.5) 84 90 ruby2_keywords (0.0.5) 85 91 securerandom (0.4.1) ··· 91 97 sinatra-activerecord (2.0.28) 92 98 activerecord (>= 4.1) 93 99 sinatra (>= 1.0) 94 - skyfall (0.5.0) 100 + skyfall (0.6.0) 95 101 base32 (~> 0.3, >= 0.3.4) 96 102 base64 (~> 0.1) 97 103 cbor (~> 0.5, >= 0.5.9.6) 98 104 eventmachine (~> 1.2, >= 1.2.7) 99 - faye-websocket (~> 0.11) 100 - sqlite3 (2.6.0-arm64-darwin) 101 - sqlite3 (2.6.0-x86_64-linux-gnu) 102 - stringio (3.1.6) 103 - thin (1.8.2) 105 + faye-websocket (~> 0.12) 106 + sqlite3 (2.7.2-arm64-darwin) 107 + sqlite3 (2.7.2-x86_64-linux-gnu) 108 + stringio (3.1.7) 109 + thin (2.0.1) 104 110 daemons (~> 1.0, >= 1.0.9) 105 111 eventmachine (~> 1.0, >= 1.0.4) 106 - rack (>= 1, < 3) 107 - tilt (2.6.0) 112 + logger 113 + rack (>= 1, < 4) 114 + tilt (2.6.1) 108 115 timeout (0.4.3) 109 116 tzinfo (2.0.6) 110 117 concurrent-ruby (~> 1.0) 111 118 uri (1.0.3) 112 - websocket-driver (0.7.7) 119 + websocket-driver (0.8.0) 113 120 base64 114 121 websocket-extensions (>= 0.1.0) 115 122 websocket-extensions (0.1.5) ··· 120 127 121 128 DEPENDENCIES 122 129 activerecord (~> 8.0) 123 - blue_factory (~> 0.1.5) 130 + bcrypt_pbkdf (>= 1.0, < 2.0) 131 + blue_factory (~> 0.1.6) 124 132 capistrano (~> 2.0) 125 133 debug 134 + didkit (~> 0.2) 135 + ed25519 (>= 1.2, < 2.0) 126 136 irb 127 137 ostruct 128 138 rainbow 129 139 rake 130 140 sinatra-activerecord (~> 2.0) 131 - skyfall (~> 0.5) 141 + skyfall (~> 0.6) 132 142 sqlite3 (~> 2.5) 133 143 thin 134 144 135 145 BUNDLED WITH 136 - 2.6.6 146 + 2.6.9
+158 -21
README.md
··· 1 - <h1>Bluesky feeds in Ruby <img src="https://github.com/mackuba/bluesky-feeds-rb/assets/28465/81159f5a-82f6-4520-82c1-434057905a2c" style="width: 28px; margin-left: 5px; position: relative; top: 1px;"></h1> 1 + <h1>Bluesky feeds in Ruby &nbsp;<img src="https://raw.githubusercontent.com/mackuba/bluesky-feeds-rb/refs/heads/master/images/ruby.png" width="26"></h1> 2 + 3 + This repo is an example or template that you can use to create a "feed generator" service for the Bluesky social network which hosts custom feeds. It's a reimplementation of the official TypeScript [feed-generator](https://github.com/bluesky-social/feed-generator) example in Ruby. 4 + 5 + This app is extracted from my personal feed service app running on [blue.mackuba.eu](https://blue.mackuba.eu) which hosts all my custom feeds. My own project has the exact same structure, it just has more feeds, models and stuff in it (and I recently migrated it to Postgres). 2 6 3 - This repo is an example or template that you can use to create a "feed generator" service for the Bluesky social network that hosts custom feeds. It's a reimplementation of the official TypeScript [feed-generator](https://github.com/bluesky-social/feed-generator) example in Ruby. 7 + 8 + ## How feed generators work 9 + 10 + This is well explained on the Bluesky documentation site, in the section "[Custom Feeds](https://docs.bsky.app/docs/starter-templates/custom-feeds)", and in the readme of the official TypeScript [feed-generator](https://github.com/bluesky-social/feed-generator) project. 11 + 12 + The gist is this: 4 13 14 + - you (feed operator) run a service on your server, which implements a few specific XRPC endpoints 15 + - a feed record is uploaded to your account, including metadata and location of the feed generator service 16 + - when the user wants to load the feed, the AppView makes a request to your service on their behalf 17 + - your service looks at the request params, and returns a list of posts it selected in the form of at:// URIs 18 + - the AppView takes those URIs and maps them to full posts, which it returns to the user's app 5 19 6 - ## How do feed generators work 20 + How exactly those posts are selected to be returned in the given request is completely up to you, the only requirement is that these are posts that the AppView will have in its database, since you only send URIs, not actual post data. In most cases, these will be "X latest posts matching some condition". In the request, you get the URI of the specific feed (there can be, and usually is, more than one on the service), `limit`, `cursor`, and an authentication token from which you can extract the DID of the calling user (in case the feed is a personalized one). 7 21 8 - **\#TODO** - please read the README of the official [feed-generator](https://github.com/bluesky-social/feed-generator) project. 22 + It's not a strict requirement, but in order to be able to pick and return those post URIs, in almost all cases the feed service also needs to have a separate component that streams posts from the relay "firehose" and saves some or all of them to a local database. 9 23 10 24 11 25 ## Architecture of the app 12 26 13 27 The project can be divided into three major parts: 14 28 15 - 1. The "input" part, which subscribes to the firehose stream on the Bluesky server, reads and processes all incoming messages, and saves relevant posts and any other data to a local database. 29 + 1. The "input" part, which subscribes to the firehose stream on the Bluesky relay, reads and processes all incoming messages, and saves relevant posts and any other data to a local database. 16 30 2. The "output" part, which makes the list of posts available as a feed server that implements the required "feed generator" endpoints. 17 - 3. Everything else in the middle - the database, the models and feed classes. 31 + 3. Everything else in the middle โ€“ the database, the models and feed classes. 32 + 33 + The first two parts were mostly abstracted away in the forms of two Ruby gems, namely [skyfall](https://github.com/mackuba/skyfall) for connecting to the firehose and [blue_factory](https://github.com/mackuba/blue_factory) for hosting the feed generator interface. See the repositories of these two projects for more info on what they implement and how you can configure and use them. 34 + 35 + The part in the middle is mostly up to you, since it depends greatly on what exactly you want to achieve (what kind of feed algorithms to implement, what data you need to keep, what database to use and so on) โ€“ but you can use this project as a good starting point. 36 + 37 + (The rest of the readme assumes you know Ruby to some degree and are at least somewhat familiar with ActiveRecord.) 18 38 19 - The first two parts were mostly abstracted away in the forms of two Ruby gems, namely [skyfall](https://github.com/mackuba/skyfall) for connecting to the firehose and [blue_factory](https://github.com/mackuba/blue_factory) for hosting the feed generator interface. The part in the middle is mostly up to you, since it depends greatly on what exactly you want to achieve (what kind of feed algorithms to implement, what data you need to keep, what database to use and so on) - but you can use this project as a good starting point. 20 39 21 - See the repositories of these two projects for more info on what they implement and how you can configure and use them. 40 + ### Feeds 41 + 42 + The Bluesky API allows you to run feeds using basically any algorithm you want, and there are several main categories of feeds: chronological feeds based on keywords or some simple conditions, "top of the week" feeds sorted by number of likes, or Discover-like feeds with a personalized algorithm and a random-ish order. 43 + 44 + The [blue_factory](https://github.com/mackuba/blue_factory) gem used here should allow you to build any kind of feed you want; its main API is a `get_posts` method that you implement in a feed class, which gets request params and returns an array of hashes with post URIs. The decision on how to pick these URIs is up to you. 45 + 46 + However, the sample code in this project is mostly targeted at the most common type of feeds, the keyword-based chronological ones. It defines a base feed class [Feed](app/feeds/feed.rb), which includes an implementation of `get_posts` that loads post records from the database, where they have been already assigned earlier to the given feed, so the request handling involves only a very simple query filtered by feed ID. The subclasses of `Feed` provide their versions of a `post_matches?` method, which is used by the firehose client process to determine where a post should be added. 47 + 48 + If you want to implement a different kind of feed, e.g. a Following-style feed like my "[Follows & Replies](https://bsky.app/profile/did:plc:oio4hkxaop4ao4wz2pp3f4cr/feed/follows-replies)", that should also be possible with this architecture, but you need to implement a custom version of `get_posts` in a given feed class that does some more complex queries. 49 + 50 + The feed classes also include a set of simple getter methods that return metadata about the given feed, like name, description, avatar etc. 51 + 52 + 53 + ### Database & models 54 + 55 + By default, the app is configured to use SQLite and to create database files in `db` directory. Using MySQL or PostgreSQL should also be possible with some minor changes in the code (I've tried both) โ€“ but SQLite has been working mostly fine for me in production with a database as large as 200+ GB (>200 mln post records). The important thing here is that there's only one "writer" (the firehose process), and the Sinatra server process(es) only read data, so you don't normally run into concurrent write issues, unless you add different unrelated features. 56 + 57 + There are two main tables/models: [Post](app/models/post.rb) and [FeedPost](app/models/feed_post.rb). `Post` stores post records as received from the Bluesky relay โ€“ the DID of the author, "rkey" of the post record, the post text, and other data as JSON. 58 + 59 + `FeedPost` records link specific posts into specific feeds. When a post is saved, the post instance is passed to all configured feed classes, and each of them checks (via `post_matches?`) if the post matches the feed's keywords and if it should be included in that feed. In such case, a matching `FeedPost` is also created. `feed_posts` is kind of like a many-to-many join table, except there is no `feeds` table, it's sort of virtual (feeds are defined in code). `FeedPost` records have a `post_id` referencing a `Post`, and a `feed_id` with the feed number, which is defined in subclasses of the `Feed` class (which is *not* an AR model). Each `Feed` class has one different `feed_id` assigned in code. 60 + 61 + The app can be configured to either save every single post, with only some of them having `FeedPost` records referencing them, or to save only the posts which have been added to at least one feed. The mode is selected by using the options `-da` or `-dm` respectively in [`bin/firehose`](bin/firehose). By default, the app uses `-da` (all) mode in development and the `-dm` (matching) mode in production. 62 + 63 + Saving all posts allows you to rescan posts when you make changes to a feed and include older posts that were skipped before, but at the cost of storing orders of magnitude more data (around 4 mln posts daily as of July 2025). Saving only matching posts keeps the database much more compact and manageable, but without the ability to re-check missed older posts (or to build feeds using different algorithms than plain keyword matching, e.g. Following-style feeds). 64 + 65 + There is an additional `subscriptions` table, which stores the most recent cursor for the relay you're connecting to. This is used when reconnecting after network connection issues or downtime, so you can catch up the missed events added in the meantime since last known position. 66 + 67 + 68 + ### Firehose client 69 + 70 + The firehose client service, using the [skyfall](https://github.com/mackuba/skyfall) gem, is implemented in [`app/firehose_stream.rb`](app/firehose_stream.rb). Skyfall handles things like connecting to the websocket and parsing the returned messages; what you need is mostly to provide lifecycle callbacks (which mostly print logs), and to handle the incoming messages by checking the message type and included data. 71 + 72 + The most important message type is `:commit`, which includes record operations. For those messages, we check the record type and process the record accordingly โ€“ in this case, we're only really looking at post records (`:bsky_post`). For "delete" events we find and delete a `Post` record, for "create" events we build one from the provided data, then pass it through all configured feeds to see if it matches any, then optionally create `FeedPost` references. 73 + 74 + All processing here is done inline, single-threaded, within the event processing loop. This should be [more than fine](https://journal.mackuba.eu/2025/06/24/firehose-go-brrr/) in practice even with firehose traffic several times bigger than it is now, as long as you aren't doing (a lot of) network requests within the loop. This could be expanded into a multi-process setup with a Redis queue and multiple workers, but right now there's no need for that. 75 + 76 + 77 + ### XRPC Server 78 + 79 + The server implementation is technically in the [blue_factory](https://github.com/mackuba/blue_factory) gem. It's based on [Sinatra](https://sinatrarb.com), and the Sinatra class implementing the 3 required endpoints is included there in [`server.rb`](https://github.com/mackuba/blue_factory/blob/master/lib/blue_factory/server.rb) and can be used as is. It's configured using static methods on the `BlueFactory` module, which is done in [`app/config.rb`](app/config.rb) here. 80 + 81 + As an optional thing, the [`app/server.rb`](app/server.rb) file includes some slightly convoluted code that lets you run a block of code in the context of the Sinatra server class, where you can use any normal [Sinatra APIs](https://sinatrarb.com/intro.html) to define additional routes, helpers, change Sinatra configuration, and so on. The example there adds a root `/` endpoint, which returns simple HTML listing available feeds. 22 82 23 83 24 84 ## Setting up 25 85 26 - First, you need to set up the database. By default, the app is configured to use SQLite and to create database files in `db` directory. If you want to use e.g. MySQL or PostgreSQL, you need to add a different database adapter gem to the [`Gemfile`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/Gemfile) and change the configuration in [`config/database.yml`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/config/database.yml). 86 + This app should run on any somewhat recent version of Ruby, but of course it's recommended to run one that's still maintained, ideally the latest one. It's also recommended to install it with [YJIT support](https://www.leemeichin.com/posts/ruby-32-jit.html), and on Linux also with [jemalloc](https://scalingo.com/blog/improve-ruby-application-memory-jemalloc). 87 + 88 + First, you need to install the dependencies of course: 89 + 90 + ``` 91 + bundle install 92 + ``` 93 + 94 + Next, set up the SQLite database. If you want to use e.g. MySQL or PostgreSQL, you need to add a different database adapter gem to the [`Gemfile`](./Gemfile) and change the configuration in [`config/database.yml`](config/database.yml). 27 95 28 96 To create the database, run the migrations: 29 97 ··· 31 99 bundle exec rake db:migrate 32 100 ``` 33 101 34 - The feed configuration is done in [`app/config.rb`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/app/config.rb). You need to set there: 102 + 103 + ### Configuration 104 + 105 + The feed configuration is done in [`app/config.rb`](app/config.rb). You need to set there: 35 106 36 107 - the DID identifier of the publisher (your account) 37 108 - the hostname on which the service will be running 38 109 39 - Next, you need to create some feed classes in [`app/feeds`](https://github.com/mackuba/bluesky-feeds-rb/tree/master/app/feeds). See the included feeds like [`StarWarsFeed`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/app/feeds/star_wars_feed.rb) as an example. The [`Feed`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/app/feeds/feed.rb) superclass provides a `#get_posts` implementation which loads the posts in a feed in response to a request and passes the URIs to the server. 110 + Next, you need to create some feed classes in [`app/feeds`](app/feeds). See the included feeds like [StarWarsFeed](app/feeds/star_wars_feed.rb) as an example. 40 111 41 112 Once you have the feeds prepared, configure them in `app/config.rb`: 42 113 ··· 44 115 BlueFactory.add_feed 'starwars', StarWarsFeed.new 45 116 ``` 46 117 118 + The first argument is the "rkey" which will be visible at the end of the feed URL. 119 + 120 + If you want to implement some kind of authentication or personalization in your feeds, uncomment the `:enable_unsafe_auth` line in the `config.rb`, and see the commented out alternative implementation of `get_posts` in [`app/feeds/feed.rb`](app/feeds/feed.rb#L68-L99). 47 121 48 - ### Running in development 122 + (Note: as the "unsafe" part in the name implies, this does not currently fully validate the user tokens โ€“ see the "[Authentication](https://github.com/mackuba/blue_factory#authentication)" section in the `blue_factory` readme for more info.) 123 + 124 + 125 + ## Running in development 126 + 127 + The app uses two separate processes, for the firehose stream client, and for the XRPC server that handles incoming feed requests. 49 128 50 - To run the firehose stream, use the [`firehose.rb`](https://github.com/mackuba/bluesky-feeds-rb/tree/master/firehose.rb) script. By default, it will save all posts to the database and print progress dots for every saved post, and will print the text of each post that matches any feed's conditions. See the options in the file to change this. 129 + To run the firehose client, use the [`bin/firehose`](bin/firehose) script. By default, it will save all posts to the database and print progress dots for every saved post, and will print the text of each post that matches any feed's conditions. See the options in the file or in `--help` output to change this. 51 130 52 - In another terminal window, use the [`server.rb`](https://github.com/mackuba/bluesky-feeds-rb/tree/master/server.rb) script to run the server. It should respond to such requests: 131 + The app uses one of Bluesky's official [Jetstream](https://github.com/bluesky-social/jetstream) servers as the source by default. If you want to use a different Jetstream server, edit `DEFAULT_JETSTREAM` in [`app/firehose_stream.rb`](app/firehose_stream.rb#L14), or pass a `FIREHOSE=...` env variable on the command line. You can also use a full ATProto relay instead โ€“ in that case you will also need to replace the [initializer in the `start` method](app/firehose_stream.rb#L39-L45). 132 + 133 + In another terminal window, use the [`bin/server`](bin/server) script to run the server. It should respond to such requests: 53 134 54 135 ``` 55 136 curl -i http://localhost:3000/.well-known/did.json ··· 57 138 curl -i http://localhost:3000/xrpc/app.bsky.feed.getFeedSkeleton?feed=at://did:plc:.../app.bsky.feed.generator/starwars 58 139 ``` 59 140 60 - ### Running in production 141 + ### Useful Rake tasks 142 + 143 + While working on feeds, you may find these two Rake tasks useful: 144 + 145 + ``` 146 + bundle exec rake print_feed KEY=starwars | less -r 147 + ``` 148 + 149 + This task prints the posts included in a feed in a readable format, reverse-chronologically. Optionally, add e.g. `N=500` to include more entries (default is 100). 150 + 151 + ``` 152 + bundle exec rake rebuild_feed ... 153 + ``` 154 + 155 + This task rescans the posts in the database after you edit some feed code, and adds/removes posts to the feed if they now match / no longer match. It has three main modes: 156 + 157 + ``` 158 + bundle exec rake rebuild_feed KEY=starwars DAYS=7 159 + ``` 160 + 161 + - removes all current posts from the feed, scans the given number of days back and re-adds matching posts to the feed 162 + 163 + ``` 164 + bundle exec rake rebuild_feed KEY=starwars DAYS=7 APPEND_ONLY=1 165 + ``` 166 + 167 + - scans the given number of days back and adds additional matching posts to the feed, but without touching existing posts in the feed 61 168 62 - First, you need to make sure that the firehose script is always running and is restarted if necessary. One option to do this could be writing a `systemd` service config file and adding it to `/etc/systemd/system`. You can find an example service file in [`dist/bsky_feeds.service`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/dist/bsky_feeds.service). 169 + ``` 170 + bundle exec rake rebuild_feed KEY=starwars ONLY_EXISTING=1 171 + ``` 63 172 64 - To run the server part, you need an HTTP server and a Ruby app server. The choice is up to you and the configuration will depend on your selected config. My recommendation is Nginx with either Passenger (runs your app automatically from Nginx) or something like Puma (needs to be started by e.g. `systemd` like the firehose). You can find an example of Nginx configuration for Passenger in [`dist/feeds-nginx.conf`](https://github.com/mackuba/bluesky-feeds-rb/blob/master/dist/feeds-nginx.conf). 173 + - quickly checks only posts included currently in the feed, and removes them if needed 65 174 175 + There are also `DRY_RUN`, `VERBOSE` and `UNSAFE` env options, see [`feeds.rake`](lib/tasks/feeds.rake) for more info. 66 176 67 - ## Publishing the feed 177 + 178 + ## Running in production 179 + 180 + In my Ruby projects I'm using the classic [Capistrano](https://capistranorb.com) tool to deploy to production servers (and in the ancient 2.x version, since I still haven't found the time to migrate my setup scripts to 3.xโ€ฆ). There is a sample [`deploy.rb`](config/deploy.rb) config file included in the `config` directory. To use something like Docker or a service like Heroku, you'll need to adapt the config for your specific setup. 181 + 182 + On the server, you need to make sure that the firehose process is always running and is restarted if necessary. One option to do this could be writing a systemd service config file and adding it to `/etc/systemd/system`. You can find an example service file in [`dist/bsky_feeds.service`](dist/bsky_feeds.service). 183 + 184 + To run the XRPC service, you need an HTTP server (reverse proxy) and a Ruby app server. The choice is up to you and the configuration will depend on your selected config. My recommendation is Nginx with either Passenger (runs your app automatically from Nginx) or something like Puma (needs to be started by e.g. systemd like the firehose). You can find an example of Nginx configuration for Passenger in [`dist/feeds-nginx.conf`](dist/feeds-nginx.conf). 185 + 186 + 187 + ### Publishing the feed 68 188 69 189 Once you have the feed deployed to the production server, you can use the `bluesky:publish` Rake task (from the [blue_factory](https://github.com/mackuba/blue_factory) gem) to upload the feed configuration to the Bluesky network. 70 190 71 191 You need to make sure that you have configured the feed's metadata in the feed class: 72 192 73 - - `display_name` (required) - the publicly visible name of your feed, e.g. "Star Wars Feed" (should be something short) 74 - - `description` (optional) - a longer (~1-2 lines) description of what the feed does, displayed on the feed page as the "bio" 75 - - `avatar_file` (optional) - path to an avatar image from the project's root (PNG or JPG) 193 + - `display_name` (required) โ€“ the publicly visible name of your feed, e.g. "Star Wars Feed" (should be something short) 194 + - `description` (optional) โ€“ a longer (~1-2 lines) description of what the feed does, displayed on the feed page as the "bio" 195 + - `avatar_file` (optional) โ€“ path to an avatar image from the project's root (PNG or JPG) 76 196 77 197 When you're ready, run the rake task passing the feed key (you will be asked for the uploader account's password): 78 198 79 199 ``` 80 200 bundle exec rake bluesky:publish KEY=starwars 201 + ``` 202 + 203 + 204 + ### App maintenance 205 + 206 + If you're running the app in "save all" mode in production, at some point you will probably need to start cleaning up older posts periodically. You can use this Rake task for this: 207 + 208 + ``` 209 + bundle exec rake cleanup_posts DAYS=30 210 + ``` 211 + 212 + This will delete posts older than 30 days, but only if they aren't assigned to any feed. 213 + 214 + Another Rake task lets you remove a specific post manually from a feed โ€“ this might be useful e.g. if you notice that something unexpected ๐Ÿ† has been included in your feed, and you want to quickly delete it from there without having to edit & redeploy the code: 215 + 216 + ``` 217 + bundle exec rake delete_feed_item URL=https://bsky.app/profile/example.com/post/xxx 81 218 ``` 82 219 83 220
+4
Rakefile
··· 4 4 require 'sinatra/activerecord/rake' 5 5 6 6 Rake.add_rakelib File.join(__dir__, 'lib', 'tasks') 7 + 8 + if ENV['ARLOG'] == '1' 9 + ActiveRecord::Base.logger = Logger.new(STDOUT) 10 + end
+16 -1
app/feeds/linux_feed.rb
··· 13 13 /the linux of/i, /linux (bros|nerds)/i, /ubuntu tv/i 14 14 ] 15 15 16 + LINK_EXCLUDES = [ 17 + /\bamzn\.to\b/i, /\bwww\.amazon\.com\b/i, /\bmercadolivre\.com\b/i, 18 + ] 19 + 16 20 MUTED_PROFILES = [ 17 21 'did:plc:35c6qworuvguvwnpjwfq3b5p', # Linux Kernel Releases 18 22 'did:plc:ppuqidjyabv5iwzeoxt4fq5o', # GitHub Trending JS/TS ··· 40 44 def post_matches?(post) 41 45 return false if MUTED_PROFILES.include?(post.repo) 42 46 43 - REGEXPS.any? { |r| post.text =~ r } && !(EXCLUDE.any? { |r| post.text =~ r }) 47 + REGEXPS.any? { |r| post.text =~ r } && !(EXCLUDE.any? { |r| post.text =~ r }) && !has_forbidden_links?(post) 48 + end 49 + 50 + def has_forbidden_links?(post) 51 + if embed = post.record['embed'] 52 + if link = (embed['external'] || embed['media'] && embed['media']['external']) 53 + return true if LINK_EXCLUDES.any? { |r| r =~ link['uri'] } 54 + end 55 + end 56 + 57 + return false 44 58 end 45 59 46 60 def colored_text(t) 47 61 text = t.dup 48 62 49 63 EXCLUDE.each { |r| text.gsub!(r) { |s| Rainbow(s).red }} 64 + LINK_EXCLUDES.each { |r| text.gsub!(r) { |s| Rainbow(s).red }} 50 65 REGEXPS.each { |r| text.gsub!(r) { |s| Rainbow(s).green }} 51 66 52 67 text
+39 -18
app/firehose_stream.rb
··· 6 6 require_relative 'models/feed_post' 7 7 require_relative 'models/post' 8 8 require_relative 'models/subscription' 9 + require_relative 'utils' 9 10 10 11 class FirehoseStream 11 12 attr_accessor :start_cursor, :show_progress, :log_status, :log_posts, :save_posts, :replay_events 12 13 13 14 DEFAULT_JETSTREAM = 'jetstream2.us-east.bsky.network' 14 15 POSTS_BATCH_SIZE = 100 16 + 17 + include Utils 15 18 16 19 def initialize(service = nil) 17 20 @env = (ENV['APP_ENV'] || ENV['RACK_ENV'] || :development).to_sym ··· 54 57 @sky.on_connecting { |u| log "Connecting to #{u}..." } 55 58 56 59 @sky.on_connect { 60 + @message_counter = 0 57 61 @replaying = !!(cursor) 58 62 log "Connected โœ“" 59 63 } ··· 62 66 log "Disconnected." 63 67 } 64 68 65 - @sky.on_timeout { log "Trying to reconnect..." } 66 - @sky.on_reconnect { log "Connection lost, reconnecting..." } 69 + @sky.on_reconnect { 70 + log "Connection lost, reconnecting..." 71 + } 72 + 73 + @sky.on_timeout { 74 + log "Trying to reconnect..." 75 + } 76 + 67 77 @sky.on_error { |e| log "ERROR: #{e.class} #{e.message}" } 68 78 end 69 79 ··· 93 103 94 104 def process_message(msg) 95 105 if msg.type == :info 96 - # AtProto error, the only one right now is "OutdatedCursor" 106 + # ATProto error, the only one right now is "OutdatedCursor" 97 107 log "InfoMessage: #{msg}" 98 108 99 109 elsif msg.type == :identity ··· 104 114 # tracking account status changes, e.g. suspensions, deactivations and deletes 105 115 process_account_message(msg) 106 116 107 - elsif msg.is_a?(Skyfall::Firehose::UnknownMessage) 117 + elsif msg.unknown? 108 118 log "Unknown message type: #{msg.type} (#{msg.seq})" 109 119 end 110 120 ··· 115 125 @replaying = false 116 126 end 117 127 128 + @message_counter += 1 129 + 130 + if @message_counter % 100 == 0 131 + # save current cursor every 100 events 132 + save_cursor(msg.seq) 133 + end 134 + 118 135 msg.operations.each do |op| 119 136 case op.type 120 137 when :bsky_post ··· 122 139 123 140 when :bsky_like, :bsky_repost 124 141 # if you want to use the number of likes and/or reposts for filtering or sorting: 125 - # add a likes/reposts column to feeds, then do +1 / -1 here depending on op.action 142 + # add a likes/reposts table, then add/remove records here depending on op.action 143 + # (you'll need to track like records and not just have a single numeric "likes" field, 144 + # because delete events only include the uri/rkey of the like, not of the liked post) 126 145 127 146 when :bsky_follow 128 147 # if you want to make a personalized feed that needs info about given user's follows/followers: ··· 170 189 return 171 190 end 172 191 173 - text = op.raw_record['text'] 192 + record = op.raw_record 193 + text = record['text'] 174 194 175 195 # to save space, delete redundant post text and type from the saved data JSON 176 - trimmed_record = op.raw_record.dup 177 - trimmed_record.delete('$type') 178 - trimmed_record.delete('text') 179 - trimmed_json = JSON.generate(trimmed_record) 196 + record.delete('$type') 197 + record.delete('text') 198 + trimmed_json = JSON.generate(record) 180 199 181 200 # tip: if you don't need full record data for debugging, delete the data column in posts 182 201 post = Post.new( ··· 185 204 text: text, 186 205 rkey: op.rkey, 187 206 data: trimmed_json, 188 - record: op.raw_record 207 + record: record 189 208 ) 190 209 191 210 if !post.valid? ··· 213 232 puts text 214 233 end 215 234 216 - if @save_posts == :all || @save_posts && matched 235 + if @save_posts == :all 236 + # wait until we have 100 posts and then save them all in one insert, if possible 217 237 @post_queue << post 218 - end 219 238 220 - # wait until we have 100 posts and then save them all in one insert, if possible 221 - if @post_queue.length >= POSTS_BATCH_SIZE 222 - save_queued_posts 223 - save_cursor(@sky.cursor) 239 + if @post_queue.length >= POSTS_BATCH_SIZE 240 + save_queued_posts 241 + end 242 + elsif @save_posts == :matching && matched 243 + # save immediately because matched posts might be rare; we've already checked validations 244 + post.save!(validate: false) 224 245 end 225 246 226 247 print '.' if @show_progress && @log_posts != :all ··· 285 306 end 286 307 287 308 def inspect 288 - vars = instance_variables - [:@feeds, :@timer] 309 + vars = instance_variables - [:@feeds] 289 310 values = vars.map { |v| "#{v}=#{instance_variable_get(v).inspect}" }.join(", ") 290 311 "#<#{self.class}:0x#{object_id} #{values}>" 291 312 end
+2
app/models/post.rb
··· 1 1 require 'active_record' 2 2 require 'json' 3 3 4 + require_relative 'feed_post' 5 + 4 6 class Post < ActiveRecord::Base 5 7 validates_presence_of :repo, :time, :data, :rkey 6 8 validates :text, length: { minimum: 0, allow_nil: false }
+1 -2
app/post_console_printer.rb
··· 14 14 if langs.nil? 15 15 print '[nil] * ' 16 16 elsif langs != ['en'] 17 - label = langs.map { |ln| ln.upcase.tr('A-Z', "\u{1F1E6}-\u{1F1FF}") }.join 18 - print "#{label} * " 17 + print "[#{langs.join(', ')}] * " 19 18 end 20 19 21 20 puts Rainbow("https://bsky.app/profile/#{post.repo}/post/#{post.rkey}").darkgray
+3 -8
app/utils.rb
··· 1 - require 'json' 2 - require 'open-uri' 1 + require 'didkit' 3 2 4 3 module Utils 5 4 def handle_from_did(did) 6 - url = "https://plc.directory/#{did}" 7 - json = JSON.parse(URI.open(url).read) 8 - json['alsoKnownAs'][0].gsub('at://', '') 5 + DID.new(did).get_validated_handle 9 6 end 10 7 11 8 def did_from_handle(handle) 12 - url = "https://bsky.social/xrpc/com.atproto.identity.resolveHandle?handle=#{handle}" 13 - json = JSON.parse(URI.open(url).read) 14 - json['did'] 9 + DID.resolve_handle(handle).did 15 10 end 16 11 17 12 extend self
+6 -1
bin/firehose
··· 7 7 8 8 $stdout.sync = true 9 9 10 - ActiveRecord::Base.logger = nil 10 + if ENV['ARLOG'] == '1' 11 + ActiveRecord::Base.logger = Logger.new(STDOUT) 12 + else 13 + ActiveRecord::Base.logger = nil 14 + end 11 15 12 16 def print_help 13 17 puts "Usage: #{$0} [options...]" ··· 76 80 end 77 81 78 82 trap("SIGINT") { 83 + puts 79 84 firehose.log "Stopping..." 80 85 81 86 EM.add_timer(0) {
images/kitkat.jpg

This is a binary file and will not be displayed.

images/ruby.png

This is a binary file and will not be displayed.