[Rspamd-Users] How to improve Bayes effectiveness?

Tue Sep 20 19:51:09 UTC 2022

Hi,

My rspamd deployment is now almost a year old (now on version 3.2), but 
I cannot get the bayes classifier to work effectively and reliably. The 
SPAM I get is pretty "dumb" - 95% is 5 text variations with a few words 
changed. I have learn configured so that when a user moves a message 
in/out of the spam folder it supposedly trains the classifier (i checked 
the logs and as far as I can tell this is really happening). I would 
expect that as soon as a message is learned, the following messages on 
the same template would be correctly classified, but unfortunately I 
cannot get a consistent behavior. I am resorting to domain blacklisting 
as a stopgap but this is sub-optimal.

I know that Bayes can be quite effective since I used Thunderbird 
embedded bayes classifier a long time ago and it was good. So I wonder 
what I am missing. My configuration follows:

*****

classifier {
     bayes {
         learn_condition = "return require(\"lua_bayes_learn\").can_learn";
         new_schema = true;
         autolearn [
             -3,
             5,
         ]
         backend = "redis";
         cache {
             backend = "redis";
         }
         expire = 2144448000;
         tokenizer {
             name = "osb";
         }
         statfile {
             spam = false;
             symbol = "BAYES_HAM";
         }
         statfile {
             spam = true;
             symbol = "BAYES_SPAM";
         }
         store_tokens = true;
         signatures = true;
         min_tokens = 11;
         min_learns = 200;
     }
}

*****

Thanks,

Andrei