[Rspamd-Users] Bayes questions and observations

Thu Mar 14 17:51:29 UTC 2024

On 14/03/2024 16:56, christian via Users wrote:
> Hello,
> I've been trying to optimize my RSPAMD for a few weeks now and continue 
> to learn how everything is connected.
> Please excuse my stupid questions.
> I have now looked more into Bayes and came across the following and 
> still have a few questions about it.
> 
> 1. There appears to be a difference between BAYES_SPAM/HAM and 
> spamassassin. The BAYES_SPAM/HAM variant is integrated under the name 
> “statistic”. It is configured under statistic.conf and 
> classifier-bayes.conf. The results are saved in Redis and displayed in 
> the web frontend under Status/Bayesian statistics.
> The data is learned when the emails and previously generated scores from 
> RBL, reputation, fuzzy and much more are delivered.
> I'm not too happy with the results because I often get ham scores even 
> though all other checks declare the email as spam. The content of an 
> email can look quite reasonable even though it is spam. I don't have 
> good experience with these results and that's why I only specified -2 
> and +2. Emails can also be learned using rspamc learn_spam/ham. I have 
> learned about 10,000 emails - spam and ham.
> Please correct me, if I am wrong.
> 
> 2. The next way to improve the results is via the external Spamassassin. 
> There is also spamassassin.conf (SA), or you can integrate it via 
> external_services.conf (SPAMD). The advantage is that external filter 
> sources (Heinlein, Schaal-it,...) can be used. The filter can then be 
> further learned and improved using spamc --spam/ham.
> Please correct me, if I am wrong.
> 
> Now I have via rspamd spamassassin.conf:
> ruleset = "/etc/spamassassin/local.cf";
> base_ruleset = "/var/lib/spamassassin/4.000000/*.cf";
> # Limit search size to 100 kilobytes for all regular expressions
> match_limit = 100k;
> 
> sa-update is working
> 
> SA local.cf is
> use_bayes 1
> bayes_auto_learn 1
> bayes_file_mode 777
> bayes_path /var/lib/spamassassin/bayes_db
> 
> 
> specified, but I can't find out whether these are also used by rspamd. 
> spamassassin itself does not generate any logs. I can't find anything 
> about this in the RSPAMD logs (debug mode). There is also no symbol for 
> spamassassin. How are this SA results processed? spamc --spam email.eml 
> works and learns the email, but I don't know where the results are 
> saved. I can't come up with a solution to this.
> 
> Thank you very much for your help
> Christian

Looks like XY problem to me: why do you need SA for Bayes counting that 
it uses much more stupid algorithm for it? Of course, your whole problem 
looks very weird to me. The *only* reason why SA integration exists are 
testing and legacy concerns (not Bayes or regexps where Rspamd can do 
much better job).