[Rspamd-Users] Bayes questions and observations
Vsevolod Stakhov
vsevolod at rspamd.com
Thu Mar 14 17:51:29 UTC 2024
On 14/03/2024 16:56, christian via Users wrote:
> Hello,
> I've been trying to optimize my RSPAMD for a few weeks now and continue
> to learn how everything is connected.
> Please excuse my stupid questions.
> I have now looked more into Bayes and came across the following and
> still have a few questions about it.
>
> 1. There appears to be a difference between BAYES_SPAM/HAM and
> spamassassin. The BAYES_SPAM/HAM variant is integrated under the name
> “statistic”. It is configured under statistic.conf and
> classifier-bayes.conf. The results are saved in Redis and displayed in
> the web frontend under Status/Bayesian statistics.
> The data is learned when the emails and previously generated scores from
> RBL, reputation, fuzzy and much more are delivered.
> I'm not too happy with the results because I often get ham scores even
> though all other checks declare the email as spam. The content of an
> email can look quite reasonable even though it is spam. I don't have
> good experience with these results and that's why I only specified -2
> and +2. Emails can also be learned using rspamc learn_spam/ham. I have
> learned about 10,000 emails - spam and ham.
> Please correct me, if I am wrong.
>
> 2. The next way to improve the results is via the external Spamassassin.
> There is also spamassassin.conf (SA), or you can integrate it via
> external_services.conf (SPAMD). The advantage is that external filter
> sources (Heinlein, Schaal-it,...) can be used. The filter can then be
> further learned and improved using spamc --spam/ham.
> Please correct me, if I am wrong.
>
> Now I have via rspamd spamassassin.conf:
> ruleset = "/etc/spamassassin/local.cf";
> base_ruleset = "/var/lib/spamassassin/4.000000/*.cf";
> # Limit search size to 100 kilobytes for all regular expressions
> match_limit = 100k;
>
> sa-update is working
>
> SA local.cf is
> use_bayes 1
> bayes_auto_learn 1
> bayes_file_mode 777
> bayes_path /var/lib/spamassassin/bayes_db
>
>
> specified, but I can't find out whether these are also used by rspamd.
> spamassassin itself does not generate any logs. I can't find anything
> about this in the RSPAMD logs (debug mode). There is also no symbol for
> spamassassin. How are this SA results processed? spamc --spam email.eml
> works and learns the email, but I don't know where the results are
> saved. I can't come up with a solution to this.
>
> Thank you very much for your help
> Christian
Looks like XY problem to me: why do you need SA for Bayes counting that
it uses much more stupid algorithm for it? Of course, your whole problem
looks very weird to me. The *only* reason why SA integration exists are
testing and legacy concerns (not Bayes or regexps where Rspamd can do
much better job).
More information about the Users
mailing list