[Rspamd-Users] Email hits BAYES_* after a few times

Sun Jun 2 20:39:35 UTC 2019

> On 2 Jun 2019, at 22:33, Sophie Loewenthal <sophie at klunky.co.uk> wrote:
> 
> 
>> On 2 Jun 2019, at 22:17, Tim Harman via Users <users at lists.rspamd.com> wrote:
>> 
>> On 03/06/2019 6:47 am, Sophie Loewenthal wrote:
>>> Hi,
>>> For some reason emails that come in more than twice start hitting
>>> BAYES_* rule, but these emails were not processed by 'rspamc
>>> learn_spam' or 'rspamc learn_ham', those can be discounted.  How does
>>> this email get into BAYES when I didn’t feed any eamils from the
>>> sender into rspamc learn_spam?
>> 
>> <snip>
>> 
>>> It’s a bit rum : How could i investigate this?
>>> Thank, Sophie
>> 
>> What does "rspamadm configdump classifier" tell you?
>> Probably you have autolearn enabled, thus rspamd is automatically learning your ham/spam.
>> 
>> Suggested Reading: https://rspamd.com/doc/configuration/statistic.html
> 
> 
> Hi Tim,
> 
> I thought autolearn was disabled, unless it’s on by default.  I don’t have autolearn = true in my config that I know of.   Bayes should be autolearning and configdump didn’t shed any light.
> 
> 
> # rspamadm configdump classifier
> *** Section classifier ***
> bayes {
>    backend = "sqlite3";
>    min_tokens = 11;
>    languages_enabled = true;
>    cache {
>        path = "/var/lib/rspamd/learn_cache.sqlite";
>    }
>    statfile {
>        path = "/var/lib/rspamd/bayes.ham.sqlite";
>        spam = false;
>        symbol = "BAYES_HAM";
>    }
>    statfile {
>        path = "/var/lib/rspamd/bayes.spam.sqlite";
>        spam = true;
>        symbol = "BAYES_SPAM";
>    }
>    tokenizer {
>        name = "osb";
>    }
>    learn_condition = <<EOD
> return function(task, is_spam, is_unlearn)
>  local learn_type = task:get_request_header('Learn-Type')
> 
>  if not (learn_type and tostring(learn_type) == 'bulk') then
>    local prob = task:get_mempool():get_variable('bayes_prob', 'double')
> 
>    if prob then
>      local in_class = false
>      local cl
>      if is_spam then
>        cl = 'spam'
>        in_class = prob >= 0.95
>      else
>        cl = 'ham'
>        in_class = prob <= 0.05
>      end
> 
>      if in_class then
>        return false,string.format('already in class %s; probability %.2f%%',
>          cl, math.abs((prob - 0.5) * 200.0))
>      end
>    end
>  end
> 
>  return true
> end
> EOD;
>    min_learns = 200;
> }
> 
> *** End of section classifier ***


Big type above:
>  Bayes should *not* be autolearning and configdump didn’t shed any light.