Trap#
Traverse records received from a Tap service and dump into a PostgreSQL database.
Example Usage#
In this example we'll tap into (😉) everything in the "sh.tangled.*" NSID starting from the @tangled.org repo (ATproto repo, not git repo).
- Setup a PostgreSQL cluster and create a database
...
Let's assume you've created a DB called trap_tangled.
- Tap
TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
trap will collect any records the Tap service sends. You can control this with the TAP_COLLECTION_FILTERS variable.
- Trap
Run trap, seeding from the DID of @tangled.org:
RUST_LOG=debug,sqlx=warn INDEX_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
trap will submit the seed DIDs to the Tap service. Each record return by Tap will be scanned, and any DIDs found will also be added to the Tap service.
- Wait.
Eventually, and I mean eventually, you'll end up with a table named record filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
- Perform Data Science
Time to jump into psql!
The record_by_collection view counts how many records have been indexed for each collection.
trap_tangled=# select * from record_by_collection ;
collection | count
-------------------------------+-------
sh.tangled.feed.star | 5350
sh.tangled.spindle.member | 4821
sh.tangled.graph.follow | 4425
sh.tangled.knot.member | 3607
sh.tangled.repo | 2618
sh.tangled.repo.pull | 1785
sh.tangled.repo.issue | 1390
sh.tangled.repo.issue.comment | 1386
sh.tangled.publicKey | 1298
sh.tangled.repo.pull.comment | 1127
sh.tangled.actor.profile | 713
sh.tangled.label.op | 628
sh.tangled.feed.reaction | 479
sh.tangled.string | 364
sh.tangled.repo.issue.state | 320
sh.tangled.knot | 158
sh.tangled.repo.collaborator | 146
sh.tangled.label.definition | 106
sh.tangled.repo.artifact | 69
sh.tangled.spindle | 51
(20 rows)
trap_tangled=#
Analyse SSH public-key statistics:
trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
count(*) AS count
FROM record
WHERE collection = 'sh.tangled.publicKey'
GROUP BY (split_part(data->>'key', ' ', 1))
ORDER BY (count(*)) DESC;
key_type | count
------------------------------------+-------
ssh-ed25519 | 989
ssh-rsa | 239
sk-ssh-ed25519@openssh.com | 44
ecdsa-sha2-nistp256 | 22
sh-ed25519 | 2
sk-ecdsa-sha2-nistp256@openssh.com | 1
ecdsa-sha2-nistp521 | 1
(7 rows)
trap_tangled=#
Fascinating!
Future work#
????
Suggestions and PRs welcome!