fork of indigo with slightly nicer lexgen
1 2HOWTO: Write automod Rules 3========================== 4 5The short version is: 6 7- identity a behavior pattern or type of content in the network, and an action that should be taken in response 8- write a "rule" function, in golang, which will detect this pattern (usually start by copying an existing rule) 9- register the new rule with the rule engine 10- test triggering the rule, either in a test network or using "captured" content from a real network 11- deploy the rule, first with reduced "effects" (actions) to monitor impact 12 13The `automod/rules` package contains a set of example rules and some shared helper functions, and demonstrates some patterns for how to use counters, sets, filters, and account metadata to compose a rule pattern. 14 15## How Rules Work 16 17Automod rules are golang functions which get called every time a relevant event takes place in the network. Rule functions receive static metadata about the event; can fetch additional state or metadata as needed; and can optionally output "effects". These effects can include state mutations (such as incrementing counters), or taking moderation actions. 18 19There are multiple rule function types (eg, specifically for bsky "posts", or for atproto identity updates), but they all receive a `c` "Context" argument as the primary API for the rules system, including both accessing metadata and recording effects. 20 21Multiple rules for the same event may be run concurrently, or in arbitrary order. Effects *are not* visible between rule execution on the same event, and are only persisted after all rules have finished executing. This means that if one rule increments a counter or adds a label, other rules will not "see" that effect when processing the same event. 22 23Effects are automatically de-duplicated by the rules engine, both between concurrent rules and against the current state of an effect's subject. This means that rules can generally "trigger" continuously (eg, report an account on the basis of multiple posts), and the action will only take place once (not reported multiple times). 24 25It is expected that some rules will act together, for example paired rules on record creation and record deletion. 26 27The design philosophy of rules are that they mostly contain their own configuration, as code. Rules are not expected to be directly configurable, and changing the "effects" or action of a rule is a change to the rule code itself. 28 29 30## Rule APIs 31 32There are two general categories of rules and effects: at the account-level, and at the record-level, with the later being a superset of the former. 33 34Note that none of the Context methods return errors. If errors are encountered (for example, network faults), error state is persisted internally to the Context object, a placeholder value is returned, and no effects will be persisted for the overall event execution. This is to keep rule code simple and readable. 35 36 37### Rule Types 38 39The notable rule function types are: 40 41- `type IdentityRuleFunc = func(c *AccountContext) error`: triggers on events like handle updates or account migrations 42- `type RecordRuleFunc = func(c *RecordContext) error`: triggers on every repo operation: create, update, or delete. Triggers for every record type, including posts and profiles 43- `type PostRuleFunc = func(c *RecordContext, post *appbsky.FeedPost) error`: triggers on creation or update of any `app.bsky.feed.post` record. The post record is de-serialized for convenience, but otherwise this is basically just `RecordRuleFunc` 44- `type ProfileRuleFunc = func(c *RecordContext, profile *appbsky.ActorProfile) error`: same as `PostRuleFunc`, but for profile 45 46The `PostRuleFunc` and `ProfileRuleFunc` are simply affordances so that rules for those common record types don't all need to filter and type-cast. Rules for other record types (such as `app.bsky.graph.follow`) do need to use `RecordRuleFunc` and implement that filtering and type-casting. 47 48### Pre-Hydrated Metadata 49 50The `c *automod.AccountContext` parameter provides the following pre-hydrated metadata: 51 52- `c.Account.Identity`: atproto identity for the account, including `DID` and `Handle` fields, and the PDS endpoint URL (if declared) 53- `c.Account.Private` (optional): contains things like `.IndexedAt` (account first seen), `.Email` (the current registered account email), and `.EmailConfirmed` (boolean). Only hydrated when the rule engine is configured with admin privileges, and the account is on a PDS those privileges have access to 54- `c.Account.Profile` (optional): a cached subset of the account's bsky profile record 55- `c.Account.AccountLabels` (array of strings): cached view of any moderation labels applied to the account, by the relevant "local" moderation service 56- `c.Account.AccountNegatedLabels` (array of strings) 57- `c.Account.Takendown` (bool): if the account is currently taken down or not 58- `c.Account.FollowersCount` (int64): cached 59- `c.Account.PostsCount` (int64): cached 60 61The `c *automod.RecordContext` parameter is a superset of `AccountContext` and also includes: 62 63- `c.RecordOp.Action`: one of "create", "update", or "delete" 64- `c.RecordOp.DID` 65- `c.RecordOp.Collection` 66- `c.RecordOp.RecordKey` 67- `c.RecordOp.CID` (optional): not included for "delete" 68- `c.RecordOp.Value` (optional): the record itself, usually as a pointer to an un-marshalled struct 69 70### Counters 71 72All `Context` objects provide access to counters. Rules don't need to pre-configure counter namespaces or values, they can just start using them. The default value for a counter which has never been incremented is `0`. 73 74The datastore providing counters is an internal implementation/configuration detail of the rule engine, but is usually Redis. Reads (`GetCount`) may hit the network but are pretty fast. 75 76Incrementing a counter is an "effect" and is not persisted until the end of all rule execution for an event. That is, if you read, increment, and read again, you will read the same count. 77 78The counter API has distinct "namespace" and "value" fields, which are combined to form a key. You generally chose a unique namespace specific to your rule and counter type, and then values are either a fixed string or a normalized field like a DID or hash. The keyspace is global, so rules can access and mutate each other's counters, and need to avoid namespace collisions. 79 80Time periods for counters: 81 82- `automod.PeriodHour`: time bucket of current hour 83- `automod.PeriodDay`: time bucket of current day 84- `automod.PeriodTotal`: all-time counts 85 86Basic counters: 87 88- `c.GetCount(<namespace-string>, <value-string>, <time-period>)`: reads count for the specific time period 89- `c.Increment(<namespace-string>, <value-string>)`: increments all time periods 90- `c.IncrementPeriod(<namespace-string>, <value-string>, <time-period>)`: increments only a single time period bucket, as a resource optimization. You should generally use the full `Increment` method. 91 92"Distinct value" counters use a statistical data structure (hyperloglog) to estimate the number of unique strings incremented for the given bucket. These counters consume more memory (up to a couple KBytes per counter), though they are generally smaller for small-N buckets. 93 94- `c.GetCountDistinct(<namespace>, <bucket>, <time-period>)` 95- `c.IncrementDistinct(<namespace>, <bucket>, <value>)` 96 97### Sets 98 99Sets are a mechanism to separate configuration from rule implementation. They are simply named arrays of strings. Membership checks are very fast, and won't hit the network more than once per set per rule invocation. 100 101- `c.InSet(<set-name>, <value>)`: checks if a string is in a named set, returning a `bool` 102 103### Moderation Effects (Actions) 104 105"Flags" are a concept invented for automod. They are essentially private labels: string values attached to a subject (account or record) and persisted. 106 107Rules can take account-level actions using the following methods: 108 109- `c.AddAccountFlag(val string)` 110- `c.AddAccountLabel(val string)` 111- `c.ReportAccount(reason string, comment string)` 112- `c.TakedownAccount()` 113 114The `RecordContext` additionally has record-level equivalents for all these methods. 115 116### Other Stuff 117 118- `c.Logger`: a `log/slog` logging interface. Logging currently happens immediately, instead of being accumulated as an "effect" 119- `c.Directory()`: returns an `identity.Directory` (interface), which can be used for (cached) identity resolution 120 121## Development Process 122 123When deploying a new rule, it is recommended to start with a minimal action, like setting a flag or just logging. Any "action" (including new flag creation) can result in a Slack notification. You can gain confidence in the rule by running against the full firehose with these limited actions, tweaking the rule until it seems to have acceptable sensitivity (eg, few false positives), and then escalate the actions to reporting (adds to the human review queue), or action-and-report (label or takedown, and concurrently report for humans to review the action). 124 125### Network Data 126 127The `hepa` command provides `process-record` and `process-recent` sub-commands which will pull an existing individual record (by AT-URI) or all recent bsky posts for an account (by handle or DID), which can be helpful for testing. 128 129There is also a `capture-recent` sub-command which will save a snapshot ("capture") of the current account identity and profile, and recent bsky posts, as JSON. This can be combined with testing helpers (which will load the capture and push it through a mock rules engine) to test that new rules actually trigger as expected against real-world data. 130 131Note that, of course, any real-world captures should have identifying or otherwise sensitive information redacted or replaced before committing to git. 132 133 134## Examples 135 136Here is a trivial post record rule: 137 138```golang 139// the GTUBE string is a special value historically used to test email spam filtering behavior 140var gtubeString = "XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X" 141 142func GtubePostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 143 if strings.Contains(post.Text, gtubeString) { 144 c.AddRecordLabel("spam") 145 } 146 return nil 147} 148``` 149 150Every new (or updated) post is checked for an exact string match, and is labeled "spam" if found.