···4455## Example Usage
6677-In this example we'll *tap into* (😉) everything in the "sh.tangled.*" NSID starting from the @tangled.org repo (ATproto repo, not git repo).
77+In this example we'll *tap into* everything in the "sh.tangled.*" NSID starting from the @tangled.org AT repo.
8899-1. Setup a PostgreSQL cluster and create a database
99+### 1. Setup a PostgreSQL cluster and create a database
10101111-...
1111+ ```bash
1212+ ...
1313+ ```
12141313-Let's assume you've created a DB called `trap_tangled`.
1515+ Let's assume you've created a DB called `trap_tangled`.
14161515-2. Tap
1717+### 2. Tap
16181717-```bash
1818-TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
1919-```
1919+ ```bash
2020+ TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
2121+ ```
20222121-`trap` will collect *any* records the Tap service sends. You can control this with the `TAP_COLLECTION_FILTERS` variable.
2323+ Trap will collect *any* records the Tap service sends. Control the records you want to collect with the `TAP_COLLECTION_FILTERS` variable.
22242323-3. Trap
2525+### 3. Trap
24262525-Run `trap`, seeding from the DID of @tangled.org:
2727+ Run `trap`, seeding from the DID of @tangled.org:
26282727-```bash
2828-RUST_LOG=debug,sqlx=warn INDEX_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
2929-```
2929+ ```bash
3030+ RUST_LOG=debug,sqlx=warn TRAP_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
3131+ ```
30323131-`trap` will submit the seed DIDs to the Tap service. Each record return by Tap will be scanned, and any DIDs found will also be added to the Tap service.
3333+ Trap will submit the seed DID to the Tap service. Each record return by Tap will be scanned, and any DIDs found will also be added to the Tap service.
32343333-4. Wait.
3535+### 4. Wait...
34363535-*Eventually*, and I mean *eventually*, you'll end up with a table named `record` filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
3737+ *Eventually*, and I mean *eventually*, you'll end up with a table named `record` filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
36383737-5. Perform *Data Science*
3939+### 5. Perform *Data Science*
38403939-Time to jump into `psql`!
4141+ Time to jump into psql!
40424141-The `record_by_collection` view counts how many records have been indexed for each collection.
4343+ The `record_by_collection` view counts how many records have been indexed for each collection:
42444343-```
4444-trap_tangled=# select * from record_by_collection ;
4545- collection | count
4646--------------------------------+-------
4747- sh.tangled.feed.star | 5350
4848- sh.tangled.spindle.member | 4821
4949- sh.tangled.graph.follow | 4425
5050- sh.tangled.knot.member | 3607
5151- sh.tangled.repo | 2618
5252- sh.tangled.repo.pull | 1785
5353- sh.tangled.repo.issue | 1390
5454- sh.tangled.repo.issue.comment | 1386
5555- sh.tangled.publicKey | 1298
5656- sh.tangled.repo.pull.comment | 1127
5757- sh.tangled.actor.profile | 713
5858- sh.tangled.label.op | 628
5959- sh.tangled.feed.reaction | 479
6060- sh.tangled.string | 364
6161- sh.tangled.repo.issue.state | 320
6262- sh.tangled.knot | 158
6363- sh.tangled.repo.collaborator | 146
6464- sh.tangled.label.definition | 106
6565- sh.tangled.repo.artifact | 69
6666- sh.tangled.spindle | 51
6767-(20 rows)
4545+ ```
4646+ trap_tangled=# select * from record_by_collection ;
4747+ collection | count
4848+ -------------------------------+-------
4949+ sh.tangled.feed.star | 5350
5050+ sh.tangled.spindle.member | 4821
5151+ sh.tangled.graph.follow | 4425
5252+ sh.tangled.knot.member | 3607
5353+ sh.tangled.repo | 2618
5454+ sh.tangled.repo.pull | 1785
5555+ sh.tangled.repo.issue | 1390
5656+ sh.tangled.repo.issue.comment | 1386
5757+ sh.tangled.publicKey | 1298
5858+ sh.tangled.repo.pull.comment | 1127
5959+ sh.tangled.actor.profile | 713
6060+ sh.tangled.label.op | 628
6161+ sh.tangled.feed.reaction | 479
6262+ sh.tangled.string | 364
6363+ sh.tangled.repo.issue.state | 320
6464+ sh.tangled.knot | 158
6565+ sh.tangled.repo.collaborator | 146
6666+ sh.tangled.label.definition | 106
6767+ sh.tangled.repo.artifact | 69
6868+ sh.tangled.spindle | 51
6969+ (20 rows)
68706969-trap_tangled=#
7070-```
7171+ trap_tangled=#
7272+ ```
71737272-Analyse SSH public-key statistics:
7474+ Analyse SSH public-key statistics:
73757474-```
7575-trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
7676- count(*) AS count
7777- FROM record
7878- WHERE collection = 'sh.tangled.publicKey'
7979- GROUP BY (split_part(data->>'key', ' ', 1))
8080- ORDER BY (count(*)) DESC;
8181- key_type | count
8282-------------------------------------+-------
8383- ssh-ed25519 | 989
8484- ssh-rsa | 239
8585- sk-ssh-ed25519@openssh.com | 44
8686- ecdsa-sha2-nistp256 | 22
8787- sh-ed25519 | 2
8888- sk-ecdsa-sha2-nistp256@openssh.com | 1
8989- ecdsa-sha2-nistp521 | 1
9090-(7 rows)
7676+ ```
7777+ trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
7878+ count(*) AS count
7979+ FROM record
8080+ WHERE collection = 'sh.tangled.publicKey'
8181+ GROUP BY (split_part(data->>'key', ' ', 1))
8282+ ORDER BY (count(*)) DESC;
8383+ key_type | count
8484+ ------------------------------------+-------
8585+ ssh-ed25519 | 989
8686+ ssh-rsa | 239
8787+ sk-ssh-ed25519@openssh.com | 44
8888+ ecdsa-sha2-nistp256 | 22
8989+ sh-ed25519 | 2
9090+ sk-ecdsa-sha2-nistp256@openssh.com | 1
9191+ ecdsa-sha2-nistp521 | 1
9292+ (7 rows)
91939292-trap_tangled=#
9393-```
9494+ trap_tangled=#
9595+ ```
94969595-Fascinating!
9797+ Fascinating!
96989799## Future work
98100