[Rspamd-Users] Messages learned as SPAM but still delivered as not SPAM

Wed Jan 12 20:15:36 UTC 2022

On 12/01/2022 04:44, Nihad @ RSPAMD maillist via Users wrote:
>> 2022-01-10 18:30:43 #923631(controller) <93604f>; csession;
>> rspamd_task_process: skip learning:
>> <q514n4g4148643n48406j5c434a4s5d4s254m4 at wj.novaoportunidadesim.com> is
>> skipped for bayes classifier: already in class spam; probability 100.00%
>>
>> Which, if I understand correctly, means that the message is already
>> considered SPAM by the classifier. However, the message is still
>> delivered to the inbox without any BAYES* symbol.
> I am thinking, could it be your action score values? maybe bayes is adding a symbol "spam", but your action score is too high, and therefore not triggering an action for spam.
> Check overall score of the message and compare the score to action score… is it below or above?

I don't think that is the case. My action scores are:

actions {
     greylist = 5;
     add_header = 2;
     reject = 150;
}

And I check the headers for spam messages that hit the inbox and they 
don't have the BAYES_SPAM symbol.

>> I wonder if it is possible that the redis database got "poisoned" in any
>> way, thus impacting Rspamd efficiency. Several months ago I had issues
>> with storage capacity (saw a bunch of "OOM command not allowed when used
>> memory > 'maxmemory'" in the logs), but I increased the redis database
>> capacity and since then the messages disappeared.
> It is possible that your learnt ham/spam database is giving mixed signals and giving false positives.
> Not sure if you only have "fuzzy" to score your messages or you rely on other things, but i almost never get any spam in my inbox. few times a week a message sneaks to spam folder. But for most part it is rejected either by postfix before it hits rspamd or rejected but below setup on rspamd.

The fuzzy module is enabled and occasionally i get the FUZZY_DENIED 
symbol. I also enabled the bayes classifier as follows:

classifier {
     bayes {
         expire = 2144448000;
         backend = "redis";
         cache {
             backend = "redis";
         }
         tokenizer {
             name = "osb";
         }
         statfile {
             spam = false;
             symbol = "BAYES_HAM";
         }
         statfile {
             spam = true;
             symbol = "BAYES_SPAM";
         }
         store_tokens = false;
         signatures = false;
         min_tokens = 11;
         min_learns = 200;
         learn_condition = "return require(\"lua_bayes_learn\").can_learn";
         new_schema = true;
         users_enabled = true;
         autolearn [
             -3,
             5,
         ]
     }

>
> I use some of my own rules/maps and multiple DNSBL databases to score my messages. Mostly multimap based on ASN, country, TLD. This takes out most spam. E.g. I do not expect mails from china, so china is on black list. (The same with Brasil. 😄) Or new TLD as. .blog .news .travel…
> One of rules I use is mail.baby (https://github.com/mailbaby/rspamd-rules)
> + Abussix (https://docs.abusix.com/105726-setup-abusix-mail-intelligence/rspamd-configuration)
> + Spamhaus (https://github.com/spamhaus/rspamd-dqs/)
>
> My experience is, that most of the rules from above trigger spam scoring on different aspects rather than fuzzy module.
> My experience is that spammers are adopt into circumventing fuzzy method as a word can be written and encoded in multitude of ways that will not always trigger a lookup in a sufficient way.
> Even that you see "word" as word in your email client, it could have been written in html and encoded or encoded with different iso code so is does not trigger "word" in fuzzy.
>
> Looking at my symbols in spam messages, they are almost never marked with fuzzy module. Or fuzzy score is so low that they would not trigger spam action. Which could also explain your behaviour.
>
> Or maybe my fuzzy module is not setup correctly, either. :D I am not expert in rspamd at all.

I think that adopting those modules could enhance an already working configuration, but it seems that I have something that is fundamentally broken on my deployment. Regarding the fuzzy score, in my experience it doesn't hit that often, but when it hits, it hits with a hammer.