[Rspamd-Users] Bayes questions and observations

Thu Mar 14 16:56:37 UTC 2024

Hello,
I've been trying to optimize my RSPAMD for a few weeks now and continue 
to learn how everything is connected.
Please excuse my stupid questions.
I have now looked more into Bayes and came across the following and 
still have a few questions about it.

1. There appears to be a difference between BAYES_SPAM/HAM and 
spamassassin. The BAYES_SPAM/HAM variant is integrated under the name 
“statistic”. It is configured under statistic.conf and 
classifier-bayes.conf. The results are saved in Redis and displayed in 
the web frontend under Status/Bayesian statistics.
The data is learned when the emails and previously generated scores from 
RBL, reputation, fuzzy and much more are delivered.
I'm not too happy with the results because I often get ham scores even 
though all other checks declare the email as spam. The content of an 
email can look quite reasonable even though it is spam. I don't have 
good experience with these results and that's why I only specified -2 
and +2. Emails can also be learned using rspamc learn_spam/ham. I have 
learned about 10,000 emails - spam and ham.
Please correct me, if I am wrong.

2. The next way to improve the results is via the external Spamassassin. 
There is also spamassassin.conf (SA), or you can integrate it via 
external_services.conf (SPAMD). The advantage is that external filter 
sources (Heinlein, Schaal-it,...) can be used. The filter can then be 
further learned and improved using spamc --spam/ham.
Please correct me, if I am wrong.

Now I have via rspamd spamassassin.conf:
ruleset = "/etc/spamassassin/local.cf";
base_ruleset = "/var/lib/spamassassin/4.000000/*.cf";
# Limit search size to 100 kilobytes for all regular expressions
match_limit = 100k;

sa-update is working

SA local.cf is
use_bayes 1
bayes_auto_learn 1
bayes_file_mode 777
bayes_path /var/lib/spamassassin/bayes_db

specified, but I can't find out whether these are also used by rspamd. 
spamassassin itself does not generate any logs. I can't find anything 
about this in the RSPAMD logs (debug mode). There is also no symbol for 
spamassassin. How are this SA results processed? spamc --spam email.eml 
works and learns the email, but I don't know where the results are 
saved. I can't come up with a solution to this.

Thank you very much for your help
Christian