Weighs the soul of incoming HTTP requests to stop AI crawlers

feat(lib): implement request weight (#621)

* feat(lib): implement request weight

Replaces #608

This is a big one and will be what makes Anubis a generic web
application firewall. This introduces the WEIGH option, allowing
administrators to have facets of request metadata add or remove
"weight", or the level of suspicion. This really makes Anubis weigh
the soul of requests.

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): maintain legacy challenge behavior

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): make weight have dedicated checkers for the hashes

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(data): convert some rules over to weight points

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document request weight

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(CHANGELOG): spelling error

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: fix links to challenge information

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(policies): fix formatting

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(config): make default weight adjustment 5

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>

authored by Xe Iaso and committed by GitHub c6386531 0fe46b48

+2
.github/actions/spelling/expect.txt
··· 158 158 mojeekbot 159 159 mozilla 160 160 nbf 161 + netsurf 161 162 nginx 162 163 nobots 163 164 NONINFRINGEMENT ··· 170 171 openai 171 172 openrc 172 173 pag 174 + palemoon 173 175 Pangu 174 176 parseable 175 177 passthrough
+3 -1
data/botPolicies.yaml
··· 55 55 - name: generic-browser 56 56 user_agent_regex: >- 57 57 Mozilla|Opera 58 - action: CHALLENGE 58 + action: WEIGH 59 + weight: 60 + adjust: 10 59 61 60 62 dnsbl: false 61 63
+23 -25
data/bots/aggressive-brazilian-scrapers.yaml
··· 1 1 - name: deny-aggressive-brazilian-scrapers 2 - action: DENY 3 - expression: 4 - any: 5 - # Internet Explorer should be out of support 6 - - userAgent.contains("MSIE") 7 - # Trident is the Internet Explorer browser engine 8 - - userAgent.contains("Trident") 9 - # Opera is a fork of chrome now 10 - - userAgent.contains("Presto") 11 - # Windows CE is discontinued 12 - - userAgent.contains("Windows CE") 13 - # Windows 95 is discontinued 14 - - userAgent.contains("Windows 95") 15 - # Windows 98 is discontinued 16 - - userAgent.contains("Windows 98") 17 - # Windows 9.x is discontinued 18 - - userAgent.contains("Win 9x") 19 - # Amazon does not have an Alexa Toolbar. 20 - - userAgent.contains("Alexa Toolbar") 21 - - name: challenge-aggressive-brazilian-scrapers 22 - action: CHALLENGE 2 + action: WEIGH 3 + weight: 4 + adjust: 20 23 5 expression: 24 6 any: 25 - # This is not released, even Windows 11 calls itself Windows 10 26 - - userAgent.contains("Windows NT 11.0") 27 - # iPods are not in common use 28 - - userAgent.contains("iPod") 7 + # Internet Explorer should be out of support 8 + - userAgent.contains("MSIE") 9 + # Trident is the Internet Explorer browser engine 10 + - userAgent.contains("Trident") 11 + # Opera is a fork of chrome now 12 + - userAgent.contains("Presto") 13 + # Windows CE is discontinued 14 + - userAgent.contains("Windows CE") 15 + # Windows 95 is discontinued 16 + - userAgent.contains("Windows 95") 17 + # Windows 98 is discontinued 18 + - userAgent.contains("Windows 98") 19 + # Windows 9.x is discontinued 20 + - userAgent.contains("Win 9x") 21 + # Amazon does not have an Alexa Toolbar. 22 + - userAgent.contains("Alexa Toolbar") 23 + # This is not released, even Windows 11 calls itself Windows 10 24 + - userAgent.contains("Windows NT 11.0") 25 + # iPods are not in common use 26 + - userAgent.contains("iPod")
+3 -1
data/bots/cloudflare-workers.yaml
··· 1 1 - name: cloudflare-workers 2 2 headers_regex: 3 3 CF-Worker: .* 4 - action: DENY 4 + action: WEIGH 5 + weight: 6 + adjust: 15
+2
data/clients/small-internet-browsers/_permissive.yaml
··· 1 + - import: (data)/clients/small-internet-browsers/netsurf.yaml 2 + - import: (data)/clients/small-internet-browsers/palemoon.yaml
+5
data/clients/small-internet-browsers/netsurf.yaml
··· 1 + - name: "reduce-weight-netsurf" 2 + user_agent_regex: "NetSurf" 3 + action: WEIGH 4 + weight: 5 + adjust: -5
+5
data/clients/small-internet-browsers/palemoon.yaml
··· 1 + - name: "reduce-weight-palemoon" 2 + user_agent_regex: "PaleMoon" 3 + action: WEIGH 4 + weight: 5 + adjust: -5
+3 -1
data/clients/x-firefox-ai.yaml
··· 1 1 # https://connect.mozilla.org/t5/firefox-labs/try-out-link-previews-in-firefox-labs-138-and-share-your/td-p/92012 2 2 - name: x-firefox-ai 3 - action: CHALLENGE 3 + action: WEIGH 4 4 expression: '"X-Firefox-Ai" in headers' 5 + weight: 6 + adjust: 5
+6 -6
data/common/allow-private-addresses.yaml
··· 1 1 - name: ipv4-rfc-1918 2 2 action: ALLOW 3 3 remote_addresses: 4 - - 10.0.0.0/8 5 - - 172.16.0.0/12 6 - - 192.168.0.0/16 7 - - 100.64.0.0/10 4 + - 10.0.0.0/8 5 + - 172.16.0.0/12 6 + - 192.168.0.0/16 7 + - 100.64.0.0/10 8 8 - name: ipv6-ula 9 9 action: ALLOW 10 10 remote_addresses: 11 - - fc00::/7 11 + - fc00::/7 12 12 - name: ipv6-link-local 13 13 action: ALLOW 14 14 remote_addresses: 15 - - fe80::/10 15 + - fe80::/10
+2 -2
docs/docs/CHANGELOG.md
··· 10 10 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 11 11 12 12 ## [Unreleased] 13 + 13 14 - Remove the unused `/test-error` endpoint and update the testing endpoint `/make-challenge` to only be enabled in 14 15 development 15 - 16 - 17 16 - Add `--xff-strip-private` flag/envvar to toggle skipping X-Forwarded-For private addresses or not 17 + - Requests can have their weight be adjusted, if a request weighs zero or less than it is allowed through 18 18 - Refactor challenge presentation logic to use a challenge registry 19 19 - Allow challenge implementations to register HTTP routes 20 20 - Implement a no-JS challenge method: [`metarefresh`](./admin/configuration/challenges/metarefresh.mdx) ([#95](https://github.com/TecharoHQ/anubis/issues/95))
+36
docs/docs/admin/policies.mdx
··· 244 244 | `X-Anubis-Status` | The status and how strict Anubis was in its checks | `PASS` | 245 245 246 246 Policy rules are matched using [Go's standard library regular expressions package](https://pkg.go.dev/regexp). You can mess around with the syntax at [regex101.com](https://regex101.com), make sure to select the Golang option. 247 + 248 + ## Request Weight 249 + 250 + Anubis rules can also add or remove "weight" from requests, allowing administrators to configure custom levels of suspicion. For example, if your application uses session tokens named `i_love_gitea`: 251 + 252 + ```yaml 253 + - name: gitea-session-token 254 + action: WEIGH 255 + expression: 256 + all: 257 + - '"Cookie" in headers' 258 + - headers["Cookie"].contains("i_love_gitea=") 259 + # Remove 5 weight points 260 + weight: 261 + adjust: -5 262 + ``` 263 + 264 + This would remove five weight points from the request, making Anubis present the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx). 265 + 266 + ### Weight Thresholds 267 + 268 + Weight thresholds and challenge associations will be configurable with CEL expressions in the configuration file in an upcoming patch, for now here's how Anubis configures the weight thresholds: 269 + 270 + | Weight Expression | Action | 271 + | -----------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------- | 272 + | `weight < 0` (weight is less than 0) | Allow the request through. | 273 + | `weight < 10` (weight is less than 10) | Challenge the client with the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx) at the default difficulty level. | 274 + | `weight >= 10` (weight is greater than or equal to 10) | Challenge the client with the [Proof of Work challenge](./configuration/challenges/proof-of-work.mdx) at the default difficulty level. | 275 + 276 + ### Advice 277 + 278 + Weight is still very new and needs work. This is an experimental feature and should be treated as such. Here's some advice to help you better tune requests: 279 + 280 + - The default weight for browser-like clients is 10. This triggers an aggressive challenge. 281 + - Remove and add weight in multiples of five. 282 + - Be careful with how you configure weight.
+50 -5
lib/anubis.go
··· 402 402 http.Redirect(w, r, redir, http.StatusFound) 403 403 } 404 404 405 - func cr(name string, rule config.Rule) policy.CheckResult { 405 + func cr(name string, rule config.Rule, weight int) policy.CheckResult { 406 406 return policy.CheckResult{ 407 - Name: name, 408 - Rule: rule, 407 + Name: name, 408 + Rule: rule, 409 + Weight: weight, 409 410 } 410 411 } 411 412 413 + var ( 414 + weightOkayStatic = policy.NewStaticHashChecker("weight/okay") 415 + weightMildSusStatic = policy.NewStaticHashChecker("weight/mild-suspicion") 416 + weightVerySusStatic = policy.NewStaticHashChecker("weight/extreme-suspicion") 417 + ) 418 + 412 419 // Check evaluates the list of rules, and returns the result 413 420 func (s *Server) check(r *http.Request) (policy.CheckResult, *policy.Bot, error) { 414 421 host := r.Header.Get("X-Real-Ip") ··· 421 428 return decaymap.Zilch[policy.CheckResult](), nil, fmt.Errorf("[misconfiguration] %q is not an IP address", host) 422 429 } 423 430 431 + weight := 0 432 + 424 433 for _, b := range s.policy.Bots { 425 434 match, err := b.Rules.Check(r) 426 435 if err != nil { ··· 428 437 } 429 438 430 439 if match { 431 - return cr("bot/"+b.Name, b.Action), &b, nil 440 + switch b.Action { 441 + case config.RuleDeny, config.RuleAllow, config.RuleBenchmark, config.RuleChallenge: 442 + return cr("bot/"+b.Name, b.Action, weight), &b, nil 443 + case config.RuleWeigh: 444 + slog.Debug("adjusting weight", "name", b.Name, "delta", b.Weight.Adjust) 445 + weight += b.Weight.Adjust 446 + } 432 447 } 433 448 } 434 449 435 - return cr("default/allow", config.RuleAllow), &policy.Bot{ 450 + switch { 451 + case weight <= 0: 452 + return cr("weight/okay", config.RuleAllow, weight), &policy.Bot{ 453 + Challenge: &config.ChallengeRules{ 454 + Difficulty: s.policy.DefaultDifficulty, 455 + ReportAs: s.policy.DefaultDifficulty, 456 + Algorithm: config.DefaultAlgorithm, 457 + }, 458 + Rules: weightOkayStatic, 459 + }, nil 460 + case weight > 0 && weight < 10: 461 + return cr("weight/mild-suspicion", config.RuleChallenge, weight), &policy.Bot{ 462 + Challenge: &config.ChallengeRules{ 463 + Difficulty: s.policy.DefaultDifficulty, 464 + ReportAs: s.policy.DefaultDifficulty, 465 + Algorithm: "metarefresh", 466 + }, 467 + Rules: weightMildSusStatic, 468 + }, nil 469 + case weight >= 10: 470 + return cr("weight/extreme-suspicion", config.RuleChallenge, weight), &policy.Bot{ 471 + Challenge: &config.ChallengeRules{ 472 + Difficulty: s.policy.DefaultDifficulty, 473 + ReportAs: s.policy.DefaultDifficulty, 474 + Algorithm: "fast", 475 + }, 476 + Rules: weightVerySusStatic, 477 + }, nil 478 + } 479 + 480 + return cr("default/allow", config.RuleAllow, weight), &policy.Bot{ 436 481 Challenge: &config.ChallengeRules{ 437 482 Difficulty: s.policy.DefaultDifficulty, 438 483 ReportAs: s.policy.DefaultDifficulty,
+1
lib/policy/bot.go
··· 12 12 Challenge *config.ChallengeRules 13 13 Name string 14 14 Action config.Rule 15 + Weight *config.Weight 15 16 } 16 17 17 18 func (b Bot) Hash() string {
+14
lib/policy/checker.go
··· 47 47 return internal.SHA256sum(sb.String()) 48 48 } 49 49 50 + type staticHashChecker struct { 51 + hash string 52 + } 53 + 54 + func (staticHashChecker) Check(r *http.Request) (bool, error) { 55 + return true, nil 56 + } 57 + 58 + func (s staticHashChecker) Hash() string { return s.hash } 59 + 60 + func NewStaticHashChecker(hashable string) Checker { 61 + return staticHashChecker{hash: internal.SHA256sum(hashable)} 62 + } 63 + 50 64 type RemoteAddrChecker struct { 51 65 ranger cidranger.Ranger 52 66 hash string
+6 -3
lib/policy/checkresult.go
··· 7 7 ) 8 8 9 9 type CheckResult struct { 10 - Name string 11 - Rule config.Rule 10 + Name string 11 + Rule config.Rule 12 + Weight int 12 13 } 13 14 14 15 func (cr CheckResult) LogValue() slog.Value { 15 16 return slog.GroupValue( 16 17 slog.String("name", cr.Name), 17 - slog.String("rule", string(cr.Rule))) 18 + slog.String("rule", string(cr.Rule)), 19 + slog.Int("weight", cr.Weight), 20 + ) 18 21 }
+13 -7
lib/policy/config/config.go
··· 39 39 RuleAllow Rule = "ALLOW" 40 40 RuleDeny Rule = "DENY" 41 41 RuleChallenge Rule = "CHALLENGE" 42 + RuleWeigh Rule = "WEIGH" 42 43 RuleBenchmark Rule = "DEBUG_BENCHMARK" 43 44 ) 44 45 45 46 const DefaultAlgorithm = "fast" 46 47 47 48 type BotConfig struct { 48 - UserAgentRegex *string `json:"user_agent_regex"` 49 - PathRegex *string `json:"path_regex"` 50 - HeadersRegex map[string]string `json:"headers_regex"` 51 - Expression *ExpressionOrList `json:"expression"` 49 + UserAgentRegex *string `json:"user_agent_regex,omitempty"` 50 + PathRegex *string `json:"path_regex,omitempty"` 51 + HeadersRegex map[string]string `json:"headers_regex,omitempty"` 52 + Expression *ExpressionOrList `json:"expression,omitempty"` 52 53 Challenge *ChallengeRules `json:"challenge,omitempty"` 54 + Weight *Weight `json:"weight,omitempty"` 53 55 Name string `json:"name"` 54 56 Action Rule `json:"action"` 55 - RemoteAddr []string `json:"remote_addresses"` 57 + RemoteAddr []string `json:"remote_addresses,omitempty"` 56 58 } 57 59 58 60 func (b BotConfig) Zero() bool { ··· 73 75 return true 74 76 } 75 77 76 - func (b BotConfig) Valid() error { 78 + func (b *BotConfig) Valid() error { 77 79 var errs []error 78 80 79 81 if b.Name == "" { ··· 144 146 } 145 147 146 148 switch b.Action { 147 - case RuleAllow, RuleBenchmark, RuleChallenge, RuleDeny: 149 + case RuleAllow, RuleBenchmark, RuleChallenge, RuleDeny, RuleWeigh: 148 150 // okay 149 151 default: 150 152 errs = append(errs, fmt.Errorf("%w: %q", ErrUnknownAction, b.Action)) ··· 154 156 if err := b.Challenge.Valid(); err != nil { 155 157 errs = append(errs, err) 156 158 } 159 + } 160 + 161 + if b.Action == RuleWeigh && b.Weight == nil { 162 + b.Weight = &Weight{Adjust: 5} 157 163 } 158 164 159 165 if len(errs) != 0 {
+19
lib/policy/config/config_test.go
··· 168 168 }, 169 169 err: nil, 170 170 }, 171 + { 172 + name: "weight rule without weight", 173 + bot: BotConfig{ 174 + Name: "weight-adjust-if-mozilla", 175 + Action: RuleWeigh, 176 + UserAgentRegex: p("Mozilla"), 177 + }, 178 + }, 179 + { 180 + name: "weight rule with weight adjust", 181 + bot: BotConfig{ 182 + Name: "weight-adjust-if-mozilla", 183 + Action: RuleWeigh, 184 + UserAgentRegex: p("Mozilla"), 185 + Weight: &Weight{ 186 + Adjust: 5, 187 + }, 188 + }, 189 + }, 171 190 } 172 191 173 192 for _, cs := range tests {
+2 -2
lib/policy/config/expressionorlist.go
··· 14 14 15 15 type ExpressionOrList struct { 16 16 Expression string `json:"-"` 17 - All []string `json:"all"` 18 - Any []string `json:"any"` 17 + All []string `json:"all,omitempty"` 18 + Any []string `json:"any,omitempty"` 19 19 } 20 20 21 21 func (eol ExpressionOrList) Equal(rhs *ExpressionOrList) bool {
+6
lib/policy/config/testdata/good/simple-weight.yaml
··· 1 + bots: 2 + - name: simple-weight-adjust 3 + action: WEIGH 4 + user_agent_regex: Mozilla 5 + weight: 6 + adjust: 5
+4
lib/policy/config/testdata/good/weight-no-weight.yaml
··· 1 + bots: 2 + - name: weight 3 + action: WEIGH 4 + user_agent_regex: Mozilla
+5
lib/policy/config/weight.go
··· 1 + package config 2 + 3 + type Weight struct { 4 + Adjust int `json:"adjust"` 5 + }
+4
lib/policy/policy.go
··· 117 117 } 118 118 } 119 119 120 + if b.Weight != nil { 121 + parsedBot.Weight = b.Weight 122 + } 123 + 120 124 parsedBot.Rules = cl 121 125 122 126 result.Bots = append(result.Bots, parsedBot)