1# Trap
2
3Traverse records received from a Tap service and dump into a PostgreSQL database.
4
5## Example Usage
6
7In this example we'll *tap into* everything in the "sh.tangled.*" NSID starting from the @tangled.org AT repo.
8
9### 1. Setup a PostgreSQL cluster and create a database
10
11 ```bash
12 ...
13 ```
14
15 Let's assume you've created a DB called `trap_tangled`.
16
17### 2. Tap
18
19 ```bash
20 TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
21 ```
22
23 Trap will collect *any* records the Tap service sends. Control the records you want to collect with the `TAP_COLLECTION_FILTERS` variable.
24
25### 3. Trap
26
27 Run `trap`, seeding from the DID of @tangled.org:
28
29 ```bash
30 RUST_LOG=debug,sqlx=warn TRAP_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
31 ```
32
33 Trap will submit the seed DID to the Tap service. Each record returned by Tap is scanned, and any DIDs found will also be added to the Tap service.
34
35### 4. Wait...
36
37 *Eventually*, and I mean *eventually*, you'll end up with a table named `record` filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
38
39### 5. Perform *Data Science*
40
41 Time to jump into psql!
42
43 The `record_by_collection` view counts how many records have been indexed for each collection:
44
45 ```
46 trap_tangled=# select * from record_by_collection ;
47 collection | count
48 -------------------------------+-------
49 sh.tangled.feed.star | 6572
50 sh.tangled.graph.follow | 5530
51 sh.tangled.spindle.member | 4982
52 sh.tangled.knot.member | 3798
53 sh.tangled.repo | 3617
54 sh.tangled.repo.pull | 2006
55 sh.tangled.publicKey | 1960
56 sh.tangled.repo.issue | 1698
57 sh.tangled.repo.issue.comment | 1626
58 sh.tangled.repo.pull.comment | 1208
59 sh.tangled.actor.profile | 1178
60 sh.tangled.label.op | 691
61 sh.tangled.feed.reaction | 600
62 sh.tangled.string | 498
63 sh.tangled.repo.issue.state | 345
64 sh.tangled.knot | 251
65 sh.tangled.repo.collaborator | 244
66 sh.tangled.label.definition | 133
67 sh.tangled.repo.artifact | 72
68 sh.tangled.spindle | 71
69 (20 rows)
70
71 trap_tangled=#
72 ```
73
74 Analyse SSH public-key statistics:
75
76 ```
77 trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
78 count(*) AS count
79 FROM record
80 WHERE collection = 'sh.tangled.publicKey'
81 GROUP BY (split_part(data->>'key', ' ', 1))
82 ORDER BY (count(*)) DESC;
83 key_type | count
84 ------------------------------------+-------
85 ssh-ed25519 | 1528
86 ssh-rsa | 348
87 sk-ssh-ed25519@openssh.com | 49
88 ecdsa-sha2-nistp256 | 28
89 ecdsa-sha2-nistp521 | 4
90 sh-ed25519 | 2
91 sk-ecdsa-sha2-nistp256@openssh.com | 1
92 (7 rows)
93
94 trap_tangled=#
95 ```
96
97 Fascinating!
98