[Rspamd-Users] Rspamd bayes statistic per_user does not work as expected.

Thu May 23 09:34:00 UTC 2019

Hi,
I'm trying to configure the per_user by classifier following the doc:
https://rspamd.com/doc/configuration/statistic.html

I have set multiple classifier with redis backend, one global and the
other one peruser, configuring the lua script as the doc says.

If follows my classifier configuration:
~# rspamadm configdump
classifier {
    bayes {
        backend = "redis";
        min_tokens = 11;
        min_learns = 200;
        servers = "127.0.0.1:6379";
        statfile {
            spam = false;
            symbol = "BAYES_HAM";
        }
        statfile {
            spam = true;
            symbol = "BAYES_SPAM";
        }
        learn_condition = <<EOD
return function(task, is_spam, is_unlearn)
  local learn_type = task:get_request_header('Learn-Type')

  if not (learn_type and tostring(learn_type) == 'bulk') then
    local prob = task:get_mempool():get_variable('bayes_prob', 'double')

    if prob then
      local in_class = false
      local cl
      if is_spam then
        cl = 'spam'
        in_class = prob >= 0.95
      else
        cl = 'ham'
        in_class = prob <= 0.05
      end

      if in_class then
        return false,string.format('already in class %s; probability
%.2f%%',
          cl, math.abs((prob - 0.5) * 200.0))
      end
    end
  end

  return true
end
EOD;
        tokenizer {
            name = "osb";
        }
        name = "global";
        languages_enabled = true;
    }
}
classifier {
    bayes {
        backend = "redis";
        min_tokens = 11;
        users_enabled = true;
        min_learns = 200;
        servers = "127.0.0.1:6379";
        learn_condition = <<EOD
return function(task, is_spam, is_unlearn)
  local learn_type = task:get_request_header('Learn-Type')

  if not (learn_type and tostring(learn_type) == 'bulk') then
    local prob = task:get_mempool():get_variable('bayes_prob', 'double')

    if prob then
      local in_class = false
      local cl
      if is_spam then
        cl = 'spam'
        in_class = prob >= 0.95
      else
        cl = 'ham'
        in_class = prob <= 0.05
      end

      if in_class then
        return false,string.format('already in class %s; probability
%.2f%%',
          cl, math.abs((prob - 0.5) * 200.0))
      end
    end
  end

  return true
end
EOD;
        statfile {
            spam = false;
            symbol = "BAYES_HAM_USER";
        }
        statfile {
            spam = true;
            symbol = "BAYES_SPAM_USER";
        }
        languages_enabled = true;
        tokenizer {
            name = "osb";
        }
        name = "peruser";
        per_user = <<EOD
    return function(task)
        local rcpt = task:get_recipients(1)

    if rcpt then
        one_rcpt = rcpt[1]
    if one_rcpt['domain'] then
        return one_rcpt['domain']
    end
end

return nil
end
EOD;
    }
}

and my versions:
~# rspamd --version
Rspamd daemon version 1.8.3

~# redis-server --version
Redis server v=3.2.6

While here the rspamc stat output:
~# rspamc stat
Results for command: stat (4.008 seconds)
Messages scanned: 159941
Messages with action reject: 0, 0.00%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 9947, 6.22%
Messages with action greylist: 0, 0.00%
Messages with action no action: 149994, 93.78%
Messages treated as spam: 9947, 6.22%
Messages treated as ham: 149994, 93.78%
Messages learned: 11323
Connections count: 344
Control connections count: 3951
Pools allocated: 4350
Pools freed: 4451
Bytes allocated: 3.90G
Memory chunks allocated: 4294966813
Shared chunks allocated: 39
Chunks freed: 0
Oversized chunks: 857
Statfile: BAYES_SPAM_USER type: redis; length: 18.57M; free blocks: 0;
total blocks: 488.84k; free: 0.00%; learned: 996; users: 1; languages: 0
Statfile: BAYES_HAM_USER type: redis; length: 6.96M; free blocks: 0;
total blocks: 183.17k; free: 0.00%; learned: 235; users: 1; languages: 0
Statfile: BAYES_SPAM type: redis; length: 3.21M; free blocks: 0; total
blocks: 84.41k; free: 0.00%; learned: 326; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 4.32M; free blocks: 0; total
blocks: 113.66k; free: 0.00%; learned: 218; users: 1; languages: 0
Total learns: 1775

~# redis-cli
127.0.0.1:6379> keys *
1) "BAYES_SPAM_USER"
2) "BAYES_SPAM_keys"
3) "BAYES_HAM"
4) "BAYES_HAM_USER"
5) "BAYES_HAM_USER_keys"
6) "BAYES_SPAM_USER_keys"
7) "BAYES_SPAM"
8) "learned_ids"
9) "BAYES_HAM_keys"

Where, in my opinion, there is a strange thing because the users value
is 1, am I right?
learned: 996; users: 1;
learned: 235; users: 1;
learned: 326; users: 1;
learned: 218; users: 1;

So the per_user bayes symbol doesn't work as expected because the
BAYES_SPAM_USER or BAYES_HAM_USER symbols are applied also for different
recipents or domains!

It follows an example:
Learn a message as spam
~# cat testemail | rspamc -c peruser -d test0 at inboundcm.eu learn_spam
Results for file: stdin (0.004 seconds)
success = true;
filename = "stdin";
scan_time = 0.004000;

Recipient:
~# grep To: testemail
To: test0 at inboundcm.eu

~# rspamc < testemail
Results for file: stdin (0.008 seconds)
[Metric: default]
Action: no action
Spam: false
Score: 3.24 / 2000.00
Symbol: ARC_NA (0.00)
Symbol: ASN (0.00)
Symbol: BAYES_HAM (-0.03)[55.79%]
Symbol: BAYES_SPAM_USER (1.67)[86.96%]
Symbol: DATE_IN_PAST (1.00)
Symbol: FROM_NO_DN (0.00)
Symbol: MID_RHS_NOT_FQDN (0.50)
Symbol: MIME_GOOD (-0.10)[text/plain]
Symbol: MIME_TRACE (0.00)[0:+]
Symbol: ONCE_RECEIVED (0.10)
Symbol: RCPT_COUNT_ONE (0.00)[1]
Symbol: RCVD_COUNT_ONE (0.00)[1]
Symbol: RCVD_NO_TLS_LAST (0.10)
Symbol: R_DKIM_NA (0.00)
Symbol: TO_DN_NONE (0.00)

Changed recipient and domain:
~# grep To: testemail
To: test1 at potterbot.eu

~# rspamc < testemail
Results for file: stdin (0.004 seconds)
[Metric: default]
Action: no action
Spam: false
Score: 3.24 / 2000.00
Symbol: ARC_NA (0.00)
Symbol: ASN (0.00)
Symbol: BAYES_HAM (-0.03)[55.79%]
Symbol: BAYES_SPAM_USER (1.67)[86.96%]
Symbol: DATE_IN_PAST (1.00)
Symbol: FROM_NO_DN (0.00)
Symbol: MID_RHS_NOT_FQDN (0.50)
Symbol: MIME_GOOD (-0.10)[text/plain]
Symbol: MIME_TRACE (0.00)[0:+]
Symbol: ONCE_RECEIVED (0.10)
Symbol: RCPT_COUNT_ONE (0.00)[1]
Symbol: RCVD_COUNT_ONE (0.00)[1]
Symbol: RCVD_NO_TLS_LAST (0.10)
Symbol: R_DKIM_NA (0.00)
Symbol: TO_DN_NONE (0.00)

So what's wrong?
Is there something wrong in configuration?
Is there something wrong in the per-user lua function?
Could you please help me with the rspamd configuration in order to
properly run the per-users (recipient) bayses?

Thanks in advance
Regards
Gabriele Nencioni