+70
-68
README.md
+70
-68
README.md
···
4
5
## Example Usage
6
7
-
In this example we'll *tap into* (😉) everything in the "sh.tangled.*" NSID starting from the @tangled.org repo (ATproto repo, not git repo).
8
9
-
1. Setup a PostgreSQL cluster and create a database
10
11
-
...
12
13
-
Let's assume you've created a DB called `trap_tangled`.
14
15
-
2. Tap
16
17
-
```bash
18
-
TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
19
-
```
20
21
-
`trap` will collect *any* records the Tap service sends. You can control this with the `TAP_COLLECTION_FILTERS` variable.
22
23
-
3. Trap
24
25
-
Run `trap`, seeding from the DID of @tangled.org:
26
27
-
```bash
28
-
RUST_LOG=debug,sqlx=warn INDEX_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
29
-
```
30
31
-
`trap` will submit the seed DIDs to the Tap service. Each record return by Tap will be scanned, and any DIDs found will also be added to the Tap service.
32
33
-
4. Wait.
34
35
-
*Eventually*, and I mean *eventually*, you'll end up with a table named `record` filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
36
37
-
5. Perform *Data Science*
38
39
-
Time to jump into `psql`!
40
41
-
The `record_by_collection` view counts how many records have been indexed for each collection.
42
43
-
```
44
-
trap_tangled=# select * from record_by_collection ;
45
-
collection | count
46
-
-------------------------------+-------
47
-
sh.tangled.feed.star | 5350
48
-
sh.tangled.spindle.member | 4821
49
-
sh.tangled.graph.follow | 4425
50
-
sh.tangled.knot.member | 3607
51
-
sh.tangled.repo | 2618
52
-
sh.tangled.repo.pull | 1785
53
-
sh.tangled.repo.issue | 1390
54
-
sh.tangled.repo.issue.comment | 1386
55
-
sh.tangled.publicKey | 1298
56
-
sh.tangled.repo.pull.comment | 1127
57
-
sh.tangled.actor.profile | 713
58
-
sh.tangled.label.op | 628
59
-
sh.tangled.feed.reaction | 479
60
-
sh.tangled.string | 364
61
-
sh.tangled.repo.issue.state | 320
62
-
sh.tangled.knot | 158
63
-
sh.tangled.repo.collaborator | 146
64
-
sh.tangled.label.definition | 106
65
-
sh.tangled.repo.artifact | 69
66
-
sh.tangled.spindle | 51
67
-
(20 rows)
68
69
-
trap_tangled=#
70
-
```
71
72
-
Analyse SSH public-key statistics:
73
74
-
```
75
-
trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
76
-
count(*) AS count
77
-
FROM record
78
-
WHERE collection = 'sh.tangled.publicKey'
79
-
GROUP BY (split_part(data->>'key', ' ', 1))
80
-
ORDER BY (count(*)) DESC;
81
-
key_type | count
82
-
------------------------------------+-------
83
-
ssh-ed25519 | 989
84
-
ssh-rsa | 239
85
-
sk-ssh-ed25519@openssh.com | 44
86
-
ecdsa-sha2-nistp256 | 22
87
-
sh-ed25519 | 2
88
-
sk-ecdsa-sha2-nistp256@openssh.com | 1
89
-
ecdsa-sha2-nistp521 | 1
90
-
(7 rows)
91
92
-
trap_tangled=#
93
-
```
94
95
-
Fascinating!
96
97
## Future work
98
···
4
5
## Example Usage
6
7
+
In this example we'll *tap into* everything in the "sh.tangled.*" NSID starting from the @tangled.org AT repo.
8
9
+
### 1. Setup a PostgreSQL cluster and create a database
10
11
+
```bash
12
+
...
13
+
```
14
15
+
Let's assume you've created a DB called `trap_tangled`.
16
17
+
### 2. Tap
18
19
+
```bash
20
+
TAP_COLLECTION_FILTERS="sh.tangled.*" TAP_BIND=127.0.0.1:2480 tap run
21
+
```
22
23
+
Trap will collect *any* records the Tap service sends. Control the records you want to collect with the `TAP_COLLECTION_FILTERS` variable.
24
25
+
### 3. Trap
26
27
+
Run `trap`, seeding from the DID of @tangled.org:
28
29
+
```bash
30
+
RUST_LOG=debug,sqlx=warn TRAP_DATABASE_URL=postgresql:///trap_tangled trap --seed did:plc:wshs7t2adsemcrrd4snkeqli
31
+
```
32
33
+
Trap will submit the seed DID to the Tap service. Each record return by Tap will be scanned, and any DIDs found will also be added to the Tap service.
34
35
+
### 4. Wait...
36
37
+
*Eventually*, and I mean *eventually*, you'll end up with a table named `record` filled with every "sh.tangled.*" record reachable from the @tangled.org repo.
38
39
+
### 5. Perform *Data Science*
40
41
+
Time to jump into psql!
42
43
+
The `record_by_collection` view counts how many records have been indexed for each collection:
44
45
+
```
46
+
trap_tangled=# select * from record_by_collection ;
47
+
collection | count
48
+
-------------------------------+-------
49
+
sh.tangled.feed.star | 5350
50
+
sh.tangled.spindle.member | 4821
51
+
sh.tangled.graph.follow | 4425
52
+
sh.tangled.knot.member | 3607
53
+
sh.tangled.repo | 2618
54
+
sh.tangled.repo.pull | 1785
55
+
sh.tangled.repo.issue | 1390
56
+
sh.tangled.repo.issue.comment | 1386
57
+
sh.tangled.publicKey | 1298
58
+
sh.tangled.repo.pull.comment | 1127
59
+
sh.tangled.actor.profile | 713
60
+
sh.tangled.label.op | 628
61
+
sh.tangled.feed.reaction | 479
62
+
sh.tangled.string | 364
63
+
sh.tangled.repo.issue.state | 320
64
+
sh.tangled.knot | 158
65
+
sh.tangled.repo.collaborator | 146
66
+
sh.tangled.label.definition | 106
67
+
sh.tangled.repo.artifact | 69
68
+
sh.tangled.spindle | 51
69
+
(20 rows)
70
71
+
trap_tangled=#
72
+
```
73
74
+
Analyse SSH public-key statistics:
75
76
+
```
77
+
trap_tangled=# SELECT split_part(data->>'key', ' ', 1) AS key_type,
78
+
count(*) AS count
79
+
FROM record
80
+
WHERE collection = 'sh.tangled.publicKey'
81
+
GROUP BY (split_part(data->>'key', ' ', 1))
82
+
ORDER BY (count(*)) DESC;
83
+
key_type | count
84
+
------------------------------------+-------
85
+
ssh-ed25519 | 989
86
+
ssh-rsa | 239
87
+
sk-ssh-ed25519@openssh.com | 44
88
+
ecdsa-sha2-nistp256 | 22
89
+
sh-ed25519 | 2
90
+
sk-ecdsa-sha2-nistp256@openssh.com | 1
91
+
ecdsa-sha2-nistp521 | 1
92
+
(7 rows)
93
94
+
trap_tangled=#
95
+
```
96
97
+
Fascinating!
98
99
## Future work
100