[Rspamd-Users] What do you do with incorrect Bayes entries?

Fri Jul 5 00:14:13 UTC 2024

> Every now and then I get emails that have an incorrect Bayes value. That is, emails that are clearly spam but whose content does not immediately indicate spam and have Bayes -10.
> Is it enough to simply re-train this email using "rspamc leran_spam xx.eml" or does this lead to problems?

Just retrain if it's misclassified.

> I'm not entirely sure what happens when an email is trained for Bayes statistics.
> Is only the body of the email divided into tokens and then weighted for Bayes, or are scores from other checks also included in the Bayes evaluation?

I've already answered which parts of an email are considered with Bayes in my last mail.

Bayes is a filter on its own, it calculates a score using previously stored tokens and does not use e.g. multimap or spf checks for help.

> An email that has no SPF, missing_mx, BAD_REP_POLICIES and much more, i.e. is actually spam, but still has BAYES_HAM -10 because the content is clean?

Because Bayes thinks due to its stored/learned tokens the content is ham. You should retrain/learn it, so that more meaningful tokens are available next time.
You could additionally configure the fuzzy module to recognize this specific and even slightly deviating spam in the future.

> What still surprises me is that a number of new emails that come in are not checked at all and do not receive a BAYES score.

Enable debug to verify that it "does nothing".

Bayes operates with probabilites. Which symbol would you insert if the tokens suggest a 50% chance of being spam and therefore also a 50% chance of being ham?

> So I have emails in my inbox that were weighted by Rspamd but don't have BAYES scoring. I'm currently taking these and re-learning them manually with rspamc.

Good.

> What does it depend on when or if an email is learned at all?

Usually the score and the autolearn config.

> Can I also specify that if an email has several positive symbols then it is learned as BAYES_HAM, or as BAYES_SPAM if not.

Sure, just set the symbols' scores high enough so that they exceed your respective autolearn thresholds.

Best regards,
Gerald