commits
I should have been using the record embed field the whole time,
since the postview embed field doesn't work sometimes... like in
the firehose monitor!
Migrate by changing your uses of misc-or-bonus to or:<LABEL>.
You can do this on the last commit, then updating after the changes
are made, to avoid any issues.
* The `or:` prefix functions as a generic replacement for the
currently hard-coded "misc-or-bonus" pseudo-label. If any category
has a matching keyword, with a positive score, the score is applied
as bonus points. Otherwise, it applies as the prefixed label.
* The `bo:` prefix is "bonus-points-only" but it only applies to
the prefixed label. So, if the prefixed label has no other matching
keywords with a positive score, NO bonus points are applied.
(the latter is mostly so that hidden tags for uspol get considered)
This technically uses the Bluesky search endpoint, so please set
your poll interval quite high.
(On my bot, the uspol hashtag will apply a label to EVERY result!)
I do a lot of changes to the filtering mechanisms, and when I make
those changes I don't want to interfere with the existing labeling
infrastructure... the dry-run mode makes it easier for me to test
most of it without interfering with the production labeler.
The clamp is done at the lookup so that it can be adjusted based on
the limits of each. Right now they're all clamped to 1-100 since
that's what the API docs say each endpoint is limited to.
ok that's far too condensed:
Moved the Redis "alreadyHandled" save down, to be called with the
bit that tags the post with auto-handled
This is wrapped in its own try-catch, AND it uses the public API
instead, so rate limits _shouldn't_ be a problem, and if they are,
it gracefully downgrades to just not crawling threads for a while.
The logs previously threw out long errors for every post that
was deleted which had an image, since it can no longer find
its associated blob(s).
@bengotow shared the code he uses to filter out these errors.
Thanks Ben!
Including:
* Use it to store cursors, which should make them persist across
restarts.
* check Redis for alreadyHandled posts before checking Ozone (when
it's first starting).
This is checked and set during the handler phase.
A keyword list can be checked in a specific language with the prefix
"$lang:", then the two letter language code.
A keyword list can be checked when it's not in a specific language with
"-$lang:". Good for "dem,-$lang:de".
Also "-keyword" should be usable for matching keyword lists only where
a certain word is not also present, i.e. "mtg,-play,-card,-game".
And, though it isn't useful (yet), self-labels can be checked with
"$", i.e. "$nudity" or "$!no-unauthenticated".
Two functions:
* Apply bonus points to users who are more or less likely to post
political content.
* Completely skip non-political accounts, like porn/art accounts.
this way, I only have to set the header once
The functionality is already covered by persistSession.
New sources have been added to what's checked for keywords:
+ "External" embedded link card titles
+ Descriptions from those link cards
+ Image alt text (from each image)
+ Video alt text
+ BridgyFed's original text field
Previously, phrases were not matched by words for some reason.
Now they are.
Also, handling longer keywords first will allow me to skip labels which
are included in other labels already.
sometimes people report an already labeled post as a way to
"appeal" it, if it's not theirs
it checks if it's already been addressed in a review or reported,
and then just ignores it if it has.
You'll need to add --poll-reports --firehose to get the previous
behavior.
* The `or:` prefix functions as a generic replacement for the
currently hard-coded "misc-or-bonus" pseudo-label. If any category
has a matching keyword, with a positive score, the score is applied
as bonus points. Otherwise, it applies as the prefixed label.
* The `bo:` prefix is "bonus-points-only" but it only applies to
the prefixed label. So, if the prefixed label has no other matching
keywords with a positive score, NO bonus points are applied.
A keyword list can be checked in a specific language with the prefix
"$lang:", then the two letter language code.
A keyword list can be checked when it's not in a specific language with
"-$lang:". Good for "dem,-$lang:de".
Also "-keyword" should be usable for matching keyword lists only where
a certain word is not also present, i.e. "mtg,-play,-card,-game".
And, though it isn't useful (yet), self-labels can be checked with
"$", i.e. "$nudity" or "$!no-unauthenticated".