Don't forget to lycansubscribe

Compare changes

Choose any two refs to compare.

+16 -16
Gemfile.lock
··· 7 7 GEM 8 8 remote: https://rubygems.org/ 9 9 specs: 10 - activemodel (7.2.2.2) 11 - activesupport (= 7.2.2.2) 12 - activerecord (7.2.2.2) 13 - activemodel (= 7.2.2.2) 14 - activesupport (= 7.2.2.2) 10 + activemodel (7.2.3) 11 + activesupport (= 7.2.3) 12 + activerecord (7.2.3) 13 + activemodel (= 7.2.3) 14 + activesupport (= 7.2.3) 15 15 timeout (>= 0.4.0) 16 - activesupport (7.2.2.2) 16 + activesupport (7.2.3) 17 17 base64 18 18 benchmark (>= 0.3) 19 19 bigdecimal ··· 41 41 concurrent-ruby (1.3.5) 42 42 connection_pool (2.5.4) 43 43 daemons (1.4.1) 44 - date (3.4.1) 44 + date (3.5.0) 45 45 dotenv (3.1.8) 46 46 drb (2.2.3) 47 47 ed25519 (1.4.0) 48 - erb (5.1.1) 48 + erb (6.0.0) 49 49 eventmachine (1.2.7) 50 50 faye-websocket (0.12.0) 51 51 eventmachine (>= 0.12.0) ··· 55 55 i18n (1.14.7) 56 56 concurrent-ruby (~> 1.0) 57 57 io-console (0.8.1) 58 - irb (1.15.2) 58 + irb (1.15.3) 59 59 pp (>= 0.6.0) 60 60 rdoc (>= 4.0.0) 61 61 reline (>= 0.4.2) ··· 64 64 logger (1.7.0) 65 65 minisky (0.5.0) 66 66 base64 (~> 0.1) 67 - minitest (5.26.0) 67 + minitest (5.26.1) 68 68 mustermann (3.0.4) 69 69 ruby2_keywords (~> 0.0.1) 70 70 net-scp (4.1.0) ··· 87 87 psych (5.2.6) 88 88 date 89 89 stringio 90 - rack (3.2.3) 90 + rack (3.2.4) 91 91 rack-protection (4.2.1) 92 92 base64 (>= 0.1.0) 93 93 logger (>= 1.6.0) ··· 98 98 rackup (2.2.1) 99 99 rack (>= 3) 100 100 rainbow (3.1.1) 101 - rake (13.3.0) 102 - rdoc (6.15.0) 101 + rake (13.3.1) 102 + rdoc (6.15.1) 103 103 erb 104 104 psych (>= 4.0.0) 105 105 tsort 106 - reline (0.6.2) 106 + reline (0.6.3) 107 107 io-console (~> 0.5) 108 108 ruby2_keywords (0.0.5) 109 109 securerandom (0.4.1) ··· 123 123 cbor (~> 0.5, >= 0.5.9.6) 124 124 eventmachine (~> 1.2, >= 1.2.7) 125 125 faye-websocket (~> 0.12) 126 - stringio (3.1.7) 126 + stringio (3.1.8) 127 127 thin (2.0.1) 128 128 daemons (~> 1.0, >= 1.0.9) 129 129 eventmachine (~> 1.0, >= 1.0.4) 130 130 logger 131 131 rack (>= 1, < 4) 132 132 tilt (2.6.1) 133 - timeout (0.4.3) 133 + timeout (0.4.4) 134 134 tsort (0.2.0) 135 135 tzinfo (2.0.6) 136 136 concurrent-ruby (~> 1.0)
+3
Procfile
··· 1 + server: bin/server 2 + firehose: bin/firehose 3 + worker: bin/worker
+110
README.md
··· 1 + # Lycan ๐Ÿบ 2 + 3 + A service which downloads and indexes the Bluesky posts you've liked, reposted, quoted or bookmarked, and allows you to search in that archive. 4 + 5 + 6 + ## How it works 7 + 8 + Lycan is kind of like a tiny specialized AppView, which only indexes some specific things from some specific people. To avoid having to keep a full-network AppView, it only indexes posts and likes on demand from people who request to use it. So the first time you want to use it, you need to ask it to run an import process, which can take anything between a few minutes and an hour, depending on how much data there is to download. After that, new likes are being indexed live from the firehose. 9 + 10 + At the moment, Lycan indexes four types of content: 11 + 12 + - posts you've liked 13 + - posts you've reposted 14 + - posts you've quoted 15 + - your old-style bookmarks (using the ๐Ÿ“Œ emoji method) 16 + 17 + New bookmarks are private data, so at the moment they can't be imported until support for OAuth is added. 18 + 19 + Lycan is written in Ruby, using Sinatra and ActiveRecord, with Postgres as the database. The official instance runs at [lycan.feeds.blue](https://lycan.feeds.blue) (this service only implements an XRPC API โ€“ the UI is implemented as part of [Skythread](https://skythread.mackuba.eu)). 20 + 21 + The service consists of three separate components: 22 + 23 + - a **firehose client**, which streams events from a relay/Jetstream and saves new data for the users whose data is/has been imported 24 + - a **background worker**, which runs the import process 25 + - an **HTTP server**, which serves the XRPC endpoints (currently there are 3: `startImport`, `getImportStatus` and `searchPosts`, plus a `did.json`); all the endpoints require service authentication through PDS proxying 26 + 27 + 28 + ## Setting up on localhost 29 + 30 + This app should run on any somewhat recent version of Ruby, though of course it's recommended to run one that's still getting maintenance updates, ideally the latest one. It's also recommended to install it with [YJIT support](https://shopify.engineering/ruby-yjit-is-production-ready), and on Linux also with [jemalloc](https://scalingo.com/blog/improve-ruby-application-memory-jemalloc). You will probably need to have some familiarity with the Ruby ecosystem in order to set it up and run it. 31 + 32 + A Postgres database is also required (again, any non-ancient version should work). 33 + 34 + Download or clone the repository, then install the dependencies: 35 + 36 + ``` 37 + bundle install 38 + ``` 39 + 40 + Next, create the database โ€“ the configuration is defined in [`config/database.yml`](config/database.yml), for development it's `lycan_development`. Create it either manually, or with a rake task: 41 + 42 + ``` 43 + bundle exec rake db:create 44 + ``` 45 + 46 + Then, run the migrations: 47 + 48 + ``` 49 + bundle exec rake db:migrate 50 + ``` 51 + 52 + To run an import, you will need to run three separate processes, probably in separate terminal tabs: 53 + 54 + 1) the firehose client, [`bin/firehose`](bin/firehose) 55 + 2) the background worker, [`bin/worker`](bin/worker) 56 + 3) the Sinatra HTTP server, [`bin/server`](bin/server) 57 + 58 + The UI can be accessed through Skythread, either on the official site on [skythread.mackuba.eu](https://skythread.mackuba.eu), or a copy you can download [from the repo](https://tangled.org/mackuba.eu/skythread). Log in and open "[Archive search](https://skythread.mackuba.eu/?page=search&mode=likes)" from the account menu โ€“ but importantly, to use the `localhost` Lycan instance, add `&lycan=local` to the URL. 59 + 60 + You should then be able to start an import from there, and see the worker process printing some logs as it starts to download the data. (The firehose process needs to be running too, because the import job needs to pass through it first.) 61 + 62 + 63 + ## Configuration 64 + 65 + There's a few things you can configure through ENV variables: 66 + 67 + - `RELAY_HOST` โ€“ hostname of the relay to use for the firehose (default: `bsky.network`) 68 + - `JETSTREAM_HOST` โ€“ alternatively, instead of `RELAY_HOST`, set this to a hostname of a [Jetstream](https://github.com/bluesky-social/jetstream) instance 69 + - `FIREHOSE_USER_AGENT` โ€“ when running in production, it's recommended that you set this to some name that identifies who is running the service 70 + - `APPVIEW_HOST` โ€“ hostname of the AppView used to download posts (default: `public.api.bsky.app`) 71 + - `SERVER_HOSTNAME` โ€“ hostname of the server on which you're running the service in production 72 + 73 + 74 + ## Rake tasks 75 + 76 + Some Rake tasks that might be useful: 77 + 78 + ``` 79 + bundle exec rake enqueue_user DID=did:plc:qweqwe 80 + ``` 81 + 82 + - request an import of the given account (to be handled by firehose + worker) 83 + 84 + ``` 85 + bundle exec rake import_user DID=did:plc:qweqwe COLLECTION=likes/reposts/posts/all 86 + ``` 87 + 88 + - run a complete import synchronously 89 + 90 + ``` 91 + bundle exec rake process_posts 92 + ``` 93 + 94 + - process all previously queued and unfinished or failed items 95 + 96 + 97 + ## Running in production 98 + 99 + This will probably heavily depend on where and how you prefer to run it, I'm using a Capistrano deploy config in [`config/deploy.rb`](config/deploy.rb) to deploy to a VPS at [lycan.feeds.blue](https://lycan.feeds.blue). To use something like Docker or a service like Fly or Railway, you'll need to adapt the config for your specific setup. 100 + 101 + On the server, you need to make sure that the firehose & worker processes are always running and are restarted if necessary. One option to do this (which I'm using) may be writing a `systemd` service config file and adding it to `/etc/systemd/system`. To run the HTTP server, you need Nginx/Caddy/Apache and a Ruby app server โ€“ my recommendation is Nginx with either Passenger (runs your app automatically from Nginx) or something like Puma (needs to be started by e.g. systemd like the firehose). 102 + 103 + 104 + ## Credits 105 + 106 + Copyright ยฉ 2025 Kuba Suder ([@mackuba.eu](https://bsky.app/profile/did:plc:oio4hkxaop4ao4wz2pp3f4cr)). 107 + 108 + The code is available under the terms of the [zlib license](https://choosealicense.com/licenses/zlib/) (permissive, similar to MIT). 109 + 110 + Bug reports and pull requests are welcome ๐Ÿ˜Ž
+1 -1
app/importers/base_importer.rb
··· 38 38 import_items 39 39 40 40 @import.update!(last_completed: @import.started_from) unless requested_time_limit 41 - @import.update!(cursor: nil, started_from: nil) 41 + @import.update!(cursor: nil, started_from: nil, fetched_until: nil) 42 42 @report&.update(importers: { importer_name => { :finished => true }}) 43 43 end 44 44
+1 -1
app/importers/likes_importer.rb
··· 33 33 @report&.update(importers: { importer_name => { :oldest_date => oldest_date }}) if oldest_date 34 34 35 35 params[:cursor] = cursor 36 - @import.update!(cursor: cursor) 36 + @import.update!(cursor: cursor, fetched_until: oldest_date) 37 37 38 38 break if !cursor 39 39 break if @time_limit && oldest_date && oldest_date < @time_limit
+1 -1
app/importers/posts_importer.rb
··· 41 41 @report&.update(importers: { importer_name => { :oldest_date => oldest_date }}) if oldest_date 42 42 43 43 params[:cursor] = cursor 44 - @import.update!(cursor: cursor) 44 + @import.update!(cursor: cursor, fetched_until: oldest_date) 45 45 46 46 break if !cursor 47 47 break if @time_limit && oldest_date && oldest_date < @time_limit
+1 -1
app/importers/reposts_importer.rb
··· 33 33 @report&.update(importers: { importer_name => { :oldest_date => oldest_date }}) if oldest_date 34 34 35 35 params[:cursor] = cursor 36 - @import.update!(cursor: cursor) 36 + @import.update!(cursor: cursor, fetched_until: oldest_date) 37 37 38 38 break if !cursor 39 39 break if @time_limit && oldest_date && oldest_date < @time_limit
+26
app/models/import.rb
··· 9 9 validates_uniqueness_of :collection, scope: :user_id 10 10 11 11 scope :unfinished, -> { where('(started_from IS NOT NULL) OR (last_completed IS NULL)') } 12 + 13 + IMPORT_END = Time.at(0) 14 + 15 + def imported_until 16 + return nil if cursor.nil? && last_completed.nil? 17 + 18 + groups = case collection 19 + when 'likes' 20 + [:likes] 21 + when 'reposts' 22 + [:reposts] 23 + when 'posts' 24 + [:pins, :quotes] 25 + end 26 + 27 + newest_queued_items = groups.map { |g| user.send(g).where(queue: :import).order(:time).last } 28 + newest_queued = newest_queued_items.compact.sort_by(&:time).last 29 + 30 + if newest_queued 31 + newest_queued.time 32 + elsif fetched_until 33 + fetched_until 34 + else 35 + IMPORT_END 36 + end 37 + end 12 38 end
+3 -16
app/models/user.rb
··· 50 50 end 51 51 52 52 def imported_until 53 - return nil unless self.imports.exists? 54 - 55 - oldest_imported_items = [] 56 - started = false 53 + import_positions = self.imports.map(&:imported_until) 57 54 58 - [:likes, :reposts, :pins, :quotes].each do |group| 59 - if self.send(group).where(queue: :import).exists? 60 - oldest_imported_items << self.send(group).where(queue: nil).order(:time).first 61 - end 62 - end 63 - 64 - earliest_oldest = oldest_imported_items.compact.sort_by(&:time).last 65 - 66 - if earliest_oldest 67 - earliest_oldest.time 68 - elsif self.imports.merge(Import.unfinished).exists? 55 + if import_positions.empty? || import_positions.any? { |x| x.nil? } 69 56 nil 70 57 else 71 - :end 58 + import_positions.sort.last 72 59 end 73 60 end 74 61
+5 -1
app/post_downloader.rb
··· 78 78 def save_post(post_uri, record) 79 79 did, _, rkey = AT_URI(post_uri) 80 80 81 - author = User.find_or_create_by!(did: did) 81 + begin 82 + author = User.find_or_create_by!(did: did) 83 + rescue ActiveRecord::RecordInvalid => e 84 + raise InvalidRecordError 85 + end 82 86 83 87 if post = Post.find_by(user: author, rkey: rkey) 84 88 return post
+2 -2
app/server.rb
··· 8 8 9 9 class Server < Sinatra::Application 10 10 register Sinatra::ActiveRecordExtension 11 - set :port, 3000 11 + set :port, ENV['PORT'] || 3000 12 12 13 13 PAGE_LIMIT = 25 14 14 HOSTNAME = ENV['SERVER_HOSTNAME'] || 'lycan.feeds.blue' ··· 162 162 else 163 163 json_response(status: 'not_started') 164 164 end 165 - when :end 165 + when Import::IMPORT_END 166 166 json_response(status: 'finished') 167 167 else 168 168 progress = 1 - (until_date - user.registered_at) / (Time.now - user.registered_at)
+5
db/migrate/20251027134657_add_fetched_until_to_imports.rb
··· 1 + class AddFetchedUntilToImports < ActiveRecord::Migration[7.2] 2 + def change 3 + add_column :imports, :fetched_until, :datetime 4 + end 5 + end
+2 -1
db/schema.rb
··· 10 10 # 11 11 # It's strongly recommended that you check this file into your version control system. 12 12 13 - ActiveRecord::Schema[7.2].define(version: 2025_09_23_180153) do 13 + ActiveRecord::Schema[7.2].define(version: 2025_10_27_134657) do 14 14 # These are extensions that must be enabled in order to support this database 15 15 enable_extension "plpgsql" 16 16 ··· 25 25 t.datetime "started_from" 26 26 t.datetime "last_completed" 27 27 t.string "collection", limit: 20, null: false 28 + t.datetime "fetched_until" 28 29 t.index ["user_id", "collection"], name: "index_imports_on_user_id_and_collection", unique: true 29 30 end 30 31