Fast and robust atproto CAR file processing in rust
15
fork

Configure Feed

Select the types of activity you want to include in your feed.

Rust 100.0%
22 7 2

Clone this repository

https://tangled.org/microcosm.blue/repo-stream https://tangled.org/did:plc:lulmyldiq4sb2ikags5sfb25/repo-stream
git@tangled.org:microcosm.blue/repo-stream git@tangled.org:did:plc:lulmyldiq4sb2ikags5sfb25/repo-stream

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
readme.md

repo-stream#

a futures atproto record stream from CAR file

current notes

  • just buffering all the blocks is 2.5x faster than interleaving optimistic walking

    • at least, this is true on huge CARs with the current (stream-unfriendly) pds export behaviour
  • transform function is a little tricky because we can't know if a block is a record or a node until we actually walk the tree to it (after they're all buffered in memory anyway).

    • still might as well benchmark a test with optimistic block probing+transform on the way in

original ideas:

  • tries to walk and emit the MST while streaming in the CAR
  • drops intermediate mst blocks after reading to reduce total memory
  • user-provided transform function on record blocks from IPLD

future work:

redb has an in-memory backend, so it would be possible to always use it for block caching. user can choose if they want to allow disk or just do memory, and then "spilling" from the cache to disk would be mostly free?