···11+# Osprey ATProto Ruleset
22+33+This is a ruleset for [Osprey](https://github.com/roostorg/osprey) for use with ATProto, and specifically for [Bluesky](https://bsky.app). It is the ruleset that is used live on
44+the [labeler that I personally run](https://bsky.app/profile/labeler.hailey.at). It may be used in conjunction with [my fork of Osprey](https://github.com/haileyok/osprey), which
55+has implemented various components required for these rules (ATProto labels output sink, ML model calling, Redis counter system, etc).
66+77+## Using This Ruleset
88+99+The easiest way to get started with these rules is to clone them into your Osprey rules directory, wherever that is located. For example, cloning down the official (or forked)
1010+version of Osprey will leave you with an `example_rules/` directory. Replacing the contents of that directory with this repository's contents will allow these rules to run.
1111+1212+Again, note that you will need to have the required sinks and UDFs that these rules required, which are maintained inside of my Osprey fork for the time being.
1313+1414+# Writing Rules
1515+1616+> [!NOTE]
1717+> This documentation is a WIP for _generic_ rule writing, not ATProto specific rules. More documentation will come for ATProto specific rules.
1818+1919+Osprey rules are written in SML, a sort of subset of Python (think Starlark). You can write rules that are specific to certain types of events that happen on a network or rules that take effect regardless of event type, depending on the type of behavior or patterns you are looking for.
2020+2121+## Structuring Rules
2222+2323+You will likely find it useful to maintain two subdirectories inside of your main rules directory - a `rules` directory where actual logic will be added and a `models` directory for defining the various features that occur in any or specific event types. For example, your structure may look something like this:
2424+2525+```bash
2626+example-rules/
2727+| rules/
2828+| | record/
2929+| | | post/
3030+| | | | first_post_link.sml
3131+| | | | index.sml
3232+| | | like/
3333+| | | | like_own_post.sml
3434+| | | | index.sml
3535+| | account/
3636+| | | signup/
3737+| | | | high_risk_signup.sml
3838+| | | | index.sml
3939+| | index.sml
4040+| models/
4141+| | record/
4242+| | | post.sml
4343+| | | like.sml
4444+| | account/
4545+| | | signup.sml
4646+| main.sml
4747+```
4848+4949+This sort of structure lets you define rules and models that are specific to certain event types so that only the necessary rules are run for various event types. For example, you likely have some rules that should only be run on a `post` event, since only a `post` will have features like `text` or `mention_count`.
5050+5151+Inside of each directory, you may maintain an `index.sml` file that will define the conditional logic in which the rules inside that directory are actually included for execution. Although you could handle all of this conditional logic inside of a single file, maintaining separate `index.sml`s per directory greatly helps with neat organization.
5252+5353+## Models
5454+5555+Before you actually write a rule, you’ll need to define a “model” for an event type. For this example, we will assume that you run a social media website that lets users create posts, either at the “top level” or as a reply to another top level post. Each post may include text, mentions of other users on your network, and an optional link embed in the post. Let’s say that the event’s JSON structure looks like this:
5656+5757+```json
5858+{
5959+ "eventType": "userPost",
6060+ "user": {
6161+ "userId": "user_id_789",
6262+ "handle": "carol",
6363+ "postCount": 3,
6464+ "accountAgeSeconds": 9002
6565+ },
6666+ "postId": "abc123xyz",
6767+ "replyId": null,
6868+ "text": "Is anyone online right now? @alice or @bob, you there? If so check this video out",
6969+ "mentionIds": ["user_id_123", "user_id_456"],
7070+ "embedLink": "https://youtube.com/watch?id=1"
7171+}
7272+```
7373+7474+Inside of our `models/record` directory, we should now create a `post.sml` file where we will define the features for a post.
7575+7676+```python
7777+PostId: Entity[str] = EntityJson(
7878+ type='PostId',
7979+ path='$.postId',
8080+)
8181+8282+PostText: str = JsonData(
8383+ path='$.text',
8484+)
8585+8686+MentionIds: List[str] = JsonData(
8787+ path='$.mentionIds',
8888+)
8989+9090+EmbedLink: Optional[str] = JsonData(
9191+ path='$.embedLink',
9292+ required=False,
9393+)
9494+9595+ReplyId: Entity[str] = JsonData(
9696+ path='$.replyId',
9797+ required=False,
9898+)
9999+```
100100+101101+The `JsonData` UDF (more on UDFs to follow) lets us take the event’s JSON and define features based on the contents of that JSON. These features can then be referenced in other rules that we import the `models/record/post.sml` model into. If you have any values inside your JSON object that may not always be present, you can set `required` to `False`, and these features will be `None` whenever the feature is not present.
102102+103103+Note that we did not actually create any features for things like `userId` or `handle`. That is because these values will be present in *any* event. It wouldn’t be very nice to have to copy these features into each event type’s model. Therefore, we will actually create a `base.sml` model that defines these features which are always present. Inside of `models/base.sml`, let’s define these.
104104+105105+```python
106106+EventType = JsonData(
107107+ path='$.eventType',
108108+)
109109+110110+UserId: Entity[str] = EntityJson(
111111+ type='UserId',
112112+ path='$.user.userId',
113113+)
114114+115115+Handle: Entity[str] = EntityJson(
116116+ type='Handle',
117117+ path='$.user.handle',
118118+)
119119+120120+PostCount: int = JsonData(
121121+ path='$.user.postCount',
122122+)
123123+124124+AccountAgeSeconds: int = JsonData(
125125+ path='$.user.accountAgeSeconds',
126126+)
127127+```
128128+129129+Here, instead of simply using `JsonData`, we instead use the `EntityJson` UDF. More on this later, but as a rule of thumb, you likely will want to have values for things like a user’s ID set to be entities. This will help more later, such as when doing data explorations within the Osprey UI.
130130+131131+### Model Hierarchy
132132+133133+In practice, you may find it useful to create a hierarchy of base models:
134134+135135+- `base.sml` for features present in every event (user IDs, handles, account stats, etc.)
136136+- `account_base.sml` for features that appear only in account related events, but always appear in each account related event. Similarly, you may add one like `record_base.sml` for those features which appear in all record events.
137137+138138+This type of hierarchy prevents duplication (which Osprey does not allow) and ensures features are defined at the appropriate level of abstraction.
139139+140140+## Rules
141141+142142+More in-depth documentation on rule writing can be found in `docs/WRITING-RULES.md`, however we’ll offer a brief overview here.
143143+144144+Let's imagine we want to flag accounts whose first post mentions at least one user and includes a link. We’ll create a `.sml` file at `rules/record/post/first_post_link.sml` for our rules logic. This file will include both the conditions which will result in the rule evaluating to `True`, as well as the actions that we want to take if that rule does indeed evaluate to `True`.
145145+146146+```python
147147+# First, import the models that you will need inside of this rule
148148+Import(
149149+ rules=[
150150+ 'models/base.sml',
151151+ 'models/record/post.sml',
152152+ ],
153153+)
154154+155155+# Next, define a variable that uses the `Rule` UDF
156156+FirstPostLinkRule = Rule(
157157+ # Set the conditions in which this rule will be `True`
158158+ when_all=[
159159+ PostCount == 1, # if this is the user's first post
160160+ EmbedLink != None, # if there is a link inside of the post
161161+ ListLength(list=MentionIds) >= 1, # if there is at least one mention in the post
162162+ ],
163163+ description='First post for user includes a link embed',
164164+)
165165+166166+# Finally, set which effect UDFs (more on this later) will be triggered
167167+WhenRules(
168168+ rules_any=[FirstPostLinkRule],
169169+ then=[
170170+ ReportRecord(
171171+ entity=PostId,
172172+ comment='This was the first post by a user and included a link',
173173+ severity=3,
174174+ ),
175175+ ],
176176+)
177177+```
178178+179179+We also want to make sure this rule runs *only* whenever the event is a post event. Since we have a well defined project structure, this is pretty easy. We’ll start by modifying the `main.sml` at the project root to include a single, simple `Require` statement.
180180+181181+```bash
182182+Require(
183183+ rule='rules/index.sml',
184184+)
185185+```
186186+187187+Next, inside of `rules/index.sml` we will define the conditions that result in post rules executing:
188188+189189+```bash
190190+Import(
191191+ rules=[
192192+ 'models/base.sml',
193193+ ],
194194+)
195195+196196+Require(
197197+ rule='rules/record/post/index.sml',
198198+ require_if=EventType == 'userPost',
199199+)
200200+```
201201+202202+Finally, inside of `rules/record/post/index.sml` we will require this new rule that we have written.
203203+204204+```bash
205205+Import(
206206+ rules=[
207207+ 'models/base.sml',
208208+ 'models/record/post.sml',
209209+ ],
210210+)
211211+212212+Require(
213213+ rule='rules/record/post/first_post_link.sml',
214214+)
+39
config/config.yaml
···11+ # For uris you can use {did}, {collection}, and {rkey} to parse at uris
22+ # For uris you can use {did}, {collection}, and {rkey} to parse at uris
33+ui_config:
44+ default_summary_features:
55+ - actions: ['operation#*']
66+ features:
77+ - UserId
88+ - Handle
99+ - DisplayName
1010+ - Collection
1111+ - AtUri
1212+ - AccountCreatedAt
1313+ - PdsHost
1414+ - FollowersCount
1515+ - FollowingCount
1616+ - PostsCount
1717+ - PostText
1818+ - PostReplyRoot
1919+ - PostReplyParent
2020+ - PostExternalTitle
2121+ - PostExternalDescription
2222+ - PostExternalLink
2323+ - SentimentScore
2424+ - FollowSubjectDid
2525+ - ListName
2626+ - ListPurpose
2727+ - ListitemList
2828+ - ListitemSubjectDid
2929+ - LikeSubject
3030+ - LikeSubjectDid
3131+ - RepostSubject
3232+ - RepostSubjectDid
3333+ - ProfileDisplayName
3434+ - ProfileDescription
3535+3636+ # For uris you can use {did}, {collection}, and {rkey} to parse at uris
3737+ external_links:
3838+ UserId: 'https://bsky.app/profile/{entity_id}'
3939+ AtUri: 'https://pdsls.dev/{entity_id}'
+57
config/labels.yaml
···11+labels:
22+ men-facet-abuse:
33+ valid_for: [UserId]
44+ connotation: neutral
55+ description: Account has been abusing facet mentions
66+ mass-follow-mid:
77+ valid_for: [UserId]
88+ connotation: neutral
99+ description: Account has followed 300+ accounts in 30 minutes
1010+ mass-follow-high:
1111+ valid_for: [UserId]
1212+ connotation: neutral
1313+ description: Account has followed 1000+ accounts in 30 minutes
1414+ shopping-spam:
1515+ valid_for: [UserId]
1616+ connotation: neutral
1717+ description: Account has posted 15+ shopping links in 30 minutes
1818+ inauth-fundraising:
1919+ valid_for: [UserId]
2020+ connotation: neutral
2121+ description: Account is likely performing inauthentic fundraising
2222+ reply-link-spam:
2323+ valid_for: [UserId]
2424+ connotation: neutral
2525+ description: Account has replied with a link twenty or more times in a 24 hour period
2626+ stpk-creations:
2727+ valid_for: [UserId]
2828+ connotation: neutral
2929+ description: Account has made more than two starterpacks in a week
3030+ some-blocks:
3131+ valid_for: [UserId]
3232+ connotation: neutral
3333+ description: Account was blocked 20+ times in 24 hours
3434+ mass-blocks:
3535+ valid_for: [UserId]
3636+ connotation: neutral
3737+ description: Account was blocked 100+ times in 24 hours
3838+ handle-changed:
3939+ valid_for: [UserId]
4040+ connotation: neutral
4141+ description: Account has changed their handle recently.
4242+ many-handle-chgs:
4343+ valid_for: [UserId]
4444+ connotation: neutral
4545+ description: Account has changed their handle 3+ times in a 24 hour period.
4646+ suss-handle-change:
4747+ valid_for: [UserId]
4848+ connotation: neutral
4949+ description: Suspicious handle change
5050+ new-acct-replies:
5151+ valid_for: [UserId]
5252+ connotation: neutral
5353+ description: Account made 10+ replies in their first hour with low top level count.
5454+ new-acct-slurs:
5555+ valid_for: [UserId]
5656+ connotation: neutral
5757+ description: Account that is relatively new found to making slur posts.