Trap#
Traverse records received from a Tap service and dump into a PostgreSQL database.
Example Usage#
In this example we'll tap into everything in the "sh.tangled.*" NSID starting from the @tangled.org AT repo.
1. Setup a PostgreSQL cluster and create a database#
...
Let's assume you've created a DB called trap_tangled.
2. Tap#
TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
Trap will collect any records the Tap service sends. Control the records you want to collect with the TAP_COLLECTION_FILTERS variable.
3. Trap#
Run trap, seeding from the DID of @tangled.org:
RUST_LOG=debug,sqlx=warn TRAP_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
Trap will submit the seed DID to the Tap service. Each record returned by Tap is scanned, and any DIDs found will also be added to the Tap service.
4. Wait...#
Eventually, and I mean eventually, you'll end up with a table named record filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
5. Perform Data Science#
Time to jump into psql!
The record_by_collection view counts how many records have been indexed for each collection:
trap_tangled=# select * from record_by_collection ;
collection | count
-------------------------------+-------
sh.tangled.feed.star | 6572
sh.tangled.graph.follow | 5530
sh.tangled.spindle.member | 4982
sh.tangled.knot.member | 3798
sh.tangled.repo | 3617
sh.tangled.repo.pull | 2006
sh.tangled.publicKey | 1960
sh.tangled.repo.issue | 1698
sh.tangled.repo.issue.comment | 1626
sh.tangled.repo.pull.comment | 1208
sh.tangled.actor.profile | 1178
sh.tangled.label.op | 691
sh.tangled.feed.reaction | 600
sh.tangled.string | 498
sh.tangled.repo.issue.state | 345
sh.tangled.knot | 251
sh.tangled.repo.collaborator | 244
sh.tangled.label.definition | 133
sh.tangled.repo.artifact | 72
sh.tangled.spindle | 71
(20 rows)
trap_tangled=#
Analyse SSH public-key statistics:
trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
count(*) AS count
FROM record
WHERE collection = 'sh.tangled.publicKey'
GROUP BY (split_part(data->>'key', ' ', 1))
ORDER BY (count(*)) DESC;
key_type | count
------------------------------------+-------
ssh-ed25519 | 1528
ssh-rsa | 348
sk-ssh-ed25519@openssh.com | 49
ecdsa-sha2-nistp256 | 28
ecdsa-sha2-nistp521 | 4
sh-ed25519 | 2
sk-ecdsa-sha2-nistp256@openssh.com | 1
(7 rows)
trap_tangled=#
Fascinating!