[Rspamd-Users] Rspamd bayes statistic per_user does not work as expected.
Gabriele Nencioni
gabriele.nencioni at register.it
Thu May 23 09:34:00 UTC 2019
Hi,
I'm trying to configure the per_user by classifier following the doc:
https://rspamd.com/doc/configuration/statistic.html
I have set multiple classifier with redis backend, one global and the
other one peruser, configuring the lua script as the doc says.
If follows my classifier configuration:
~# rspamadm configdump
classifier {
bayes {
backend = "redis";
min_tokens = 11;
min_learns = 200;
servers = "127.0.0.1:6379";
statfile {
spam = false;
symbol = "BAYES_HAM";
}
statfile {
spam = true;
symbol = "BAYES_SPAM";
}
learn_condition = <<EOD
return function(task, is_spam, is_unlearn)
local learn_type = task:get_request_header('Learn-Type')
if not (learn_type and tostring(learn_type) == 'bulk') then
local prob = task:get_mempool():get_variable('bayes_prob', 'double')
if prob then
local in_class = false
local cl
if is_spam then
cl = 'spam'
in_class = prob >= 0.95
else
cl = 'ham'
in_class = prob <= 0.05
end
if in_class then
return false,string.format('already in class %s; probability
%.2f%%',
cl, math.abs((prob - 0.5) * 200.0))
end
end
end
return true
end
EOD;
tokenizer {
name = "osb";
}
name = "global";
languages_enabled = true;
}
}
classifier {
bayes {
backend = "redis";
min_tokens = 11;
users_enabled = true;
min_learns = 200;
servers = "127.0.0.1:6379";
learn_condition = <<EOD
return function(task, is_spam, is_unlearn)
local learn_type = task:get_request_header('Learn-Type')
if not (learn_type and tostring(learn_type) == 'bulk') then
local prob = task:get_mempool():get_variable('bayes_prob', 'double')
if prob then
local in_class = false
local cl
if is_spam then
cl = 'spam'
in_class = prob >= 0.95
else
cl = 'ham'
in_class = prob <= 0.05
end
if in_class then
return false,string.format('already in class %s; probability
%.2f%%',
cl, math.abs((prob - 0.5) * 200.0))
end
end
end
return true
end
EOD;
statfile {
spam = false;
symbol = "BAYES_HAM_USER";
}
statfile {
spam = true;
symbol = "BAYES_SPAM_USER";
}
languages_enabled = true;
tokenizer {
name = "osb";
}
name = "peruser";
per_user = <<EOD
return function(task)
local rcpt = task:get_recipients(1)
if rcpt then
one_rcpt = rcpt[1]
if one_rcpt['domain'] then
return one_rcpt['domain']
end
end
return nil
end
EOD;
}
}
and my versions:
~# rspamd --version
Rspamd daemon version 1.8.3
~# redis-server --version
Redis server v=3.2.6
While here the rspamc stat output:
~# rspamc stat
Results for command: stat (4.008 seconds)
Messages scanned: 159941
Messages with action reject: 0, 0.00%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 9947, 6.22%
Messages with action greylist: 0, 0.00%
Messages with action no action: 149994, 93.78%
Messages treated as spam: 9947, 6.22%
Messages treated as ham: 149994, 93.78%
Messages learned: 11323
Connections count: 344
Control connections count: 3951
Pools allocated: 4350
Pools freed: 4451
Bytes allocated: 3.90G
Memory chunks allocated: 4294966813
Shared chunks allocated: 39
Chunks freed: 0
Oversized chunks: 857
Statfile: BAYES_SPAM_USER type: redis; length: 18.57M; free blocks: 0;
total blocks: 488.84k; free: 0.00%; learned: 996; users: 1; languages: 0
Statfile: BAYES_HAM_USER type: redis; length: 6.96M; free blocks: 0;
total blocks: 183.17k; free: 0.00%; learned: 235; users: 1; languages: 0
Statfile: BAYES_SPAM type: redis; length: 3.21M; free blocks: 0; total
blocks: 84.41k; free: 0.00%; learned: 326; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 4.32M; free blocks: 0; total
blocks: 113.66k; free: 0.00%; learned: 218; users: 1; languages: 0
Total learns: 1775
~# redis-cli
127.0.0.1:6379> keys *
1) "BAYES_SPAM_USER"
2) "BAYES_SPAM_keys"
3) "BAYES_HAM"
4) "BAYES_HAM_USER"
5) "BAYES_HAM_USER_keys"
6) "BAYES_SPAM_USER_keys"
7) "BAYES_SPAM"
8) "learned_ids"
9) "BAYES_HAM_keys"
Where, in my opinion, there is a strange thing because the users value
is 1, am I right?
learned: 996; users: 1;
learned: 235; users: 1;
learned: 326; users: 1;
learned: 218; users: 1;
So the per_user bayes symbol doesn't work as expected because the
BAYES_SPAM_USER or BAYES_HAM_USER symbols are applied also for different
recipents or domains!
It follows an example:
Learn a message as spam
~# cat testemail | rspamc -c peruser -d test0 at inboundcm.eu learn_spam
Results for file: stdin (0.004 seconds)
success = true;
filename = "stdin";
scan_time = 0.004000;
Recipient:
~# grep To: testemail
To: test0 at inboundcm.eu
~# rspamc < testemail
Results for file: stdin (0.008 seconds)
[Metric: default]
Action: no action
Spam: false
Score: 3.24 / 2000.00
Symbol: ARC_NA (0.00)
Symbol: ASN (0.00)
Symbol: BAYES_HAM (-0.03)[55.79%]
Symbol: BAYES_SPAM_USER (1.67)[86.96%]
Symbol: DATE_IN_PAST (1.00)
Symbol: FROM_NO_DN (0.00)
Symbol: MID_RHS_NOT_FQDN (0.50)
Symbol: MIME_GOOD (-0.10)[text/plain]
Symbol: MIME_TRACE (0.00)[0:+]
Symbol: ONCE_RECEIVED (0.10)
Symbol: RCPT_COUNT_ONE (0.00)[1]
Symbol: RCVD_COUNT_ONE (0.00)[1]
Symbol: RCVD_NO_TLS_LAST (0.10)
Symbol: R_DKIM_NA (0.00)
Symbol: TO_DN_NONE (0.00)
Changed recipient and domain:
~# grep To: testemail
To: test1 at potterbot.eu
~# rspamc < testemail
Results for file: stdin (0.004 seconds)
[Metric: default]
Action: no action
Spam: false
Score: 3.24 / 2000.00
Symbol: ARC_NA (0.00)
Symbol: ASN (0.00)
Symbol: BAYES_HAM (-0.03)[55.79%]
Symbol: BAYES_SPAM_USER (1.67)[86.96%]
Symbol: DATE_IN_PAST (1.00)
Symbol: FROM_NO_DN (0.00)
Symbol: MID_RHS_NOT_FQDN (0.50)
Symbol: MIME_GOOD (-0.10)[text/plain]
Symbol: MIME_TRACE (0.00)[0:+]
Symbol: ONCE_RECEIVED (0.10)
Symbol: RCPT_COUNT_ONE (0.00)[1]
Symbol: RCVD_COUNT_ONE (0.00)[1]
Symbol: RCVD_NO_TLS_LAST (0.10)
Symbol: R_DKIM_NA (0.00)
Symbol: TO_DN_NONE (0.00)
So what's wrong?
Is there something wrong in configuration?
Is there something wrong in the per-user lua function?
Could you please help me with the rspamd configuration in order to
properly run the per-users (recipient) bayses?
Thanks in advance
Regards
Gabriele Nencioni
More information about the Users
mailing list