Rust 100.0%
1 1 0

Clone this repository

https://tangled.org/tjh.dev/trap
git@knot.tjh.dev:tjh.dev/trap

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Trap#

Traverse records received from a Tap service and dump into a PostgreSQL database.

Example Usage#

In this example we'll tap into (😉) everything in the "sh.tangled.*" NSID starting from the @tangled.org repo (ATproto repo, not git repo).

  1. Setup a PostgreSQL cluster and create a database

...

Let's assume you've created a DB called trap_tangled.

  1. Tap
TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run

trap will collect any records the Tap service sends. You can control this with the TAP_COLLECTION_FILTERS variable.

  1. Trap

Run trap, seeding from the DID of @tangled.org:

RUST_LOG=debug,sqlx=warn INDEX_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli

trap will submit the seed DIDs to the Tap service. Each record return by Tap will be scanned, and any DIDs found will also be added to the Tap service.

  1. Wait.

Eventually, and I mean eventually, you'll end up with a table named record filled with every "sh.tangled.*" record reachable from the @tangled.org repo.

  1. Perform Data Science

Time to jump into psql!

The record_by_collection view counts how many records have been indexed for each collection.

trap_tangled=# select * from record_by_collection ;
          collection           | count
-------------------------------+-------
 sh.tangled.feed.star          |  5350
 sh.tangled.spindle.member     |  4821
 sh.tangled.graph.follow       |  4425
 sh.tangled.knot.member        |  3607
 sh.tangled.repo               |  2618
 sh.tangled.repo.pull          |  1785
 sh.tangled.repo.issue         |  1390
 sh.tangled.repo.issue.comment |  1386
 sh.tangled.publicKey          |  1298
 sh.tangled.repo.pull.comment  |  1127
 sh.tangled.actor.profile      |   713
 sh.tangled.label.op           |   628
 sh.tangled.feed.reaction      |   479
 sh.tangled.string             |   364
 sh.tangled.repo.issue.state   |   320
 sh.tangled.knot               |   158
 sh.tangled.repo.collaborator  |   146
 sh.tangled.label.definition   |   106
 sh.tangled.repo.artifact      |    69
 sh.tangled.spindle            |    51
(20 rows)

trap_tangled=#  

Analyse SSH public-key statistics:

trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
    count(*) AS count
   FROM record
  WHERE collection = 'sh.tangled.publicKey'
  GROUP BY (split_part(data->>'key', ' ', 1))
  ORDER BY (count(*)) DESC;
              key_type              | count
------------------------------------+-------
 ssh-ed25519                        |   989
 ssh-rsa                            |   239
 sk-ssh-ed25519@openssh.com         |    44
 ecdsa-sha2-nistp256                |    22
 sh-ed25519                         |     2
 sk-ecdsa-sha2-nistp256@openssh.com |     1
 ecdsa-sha2-nistp521                |     1
(7 rows)

trap_tangled=#

Fascinating!

Future work#

????

Suggestions and PRs welcome!