[Rspamd-Users] Bayes questions and observations
Vsevolod Stakhov
vsevolod at rspamd.com
Fri Mar 15 12:14:52 UTC 2024
On 15/03/2024 09:55, christian via Users wrote:
> Am 14.03.2024 um 18:51 schrieb Vsevolod Stakhov:
>
>> Looks like XY problem to me: why do you need SA for Bayes counting
>> that it uses much more stupid algorithm for it? Of course, your whole
>> problem looks very weird to me. The *only* reason why SA integration
>> exists are testing and legacy concerns (not Bayes or regexps where
>> Rspamd can do much better job).
>
> I still get a lot of spam that isn't recognized. There are batches of
> spam campaigns that come from different senders from different
> countries, with the same appearance but different words on the same
> topic (financial, ?hoonky? kitchen knife), which I can currently only
> block with multimap and regex. But after 2 days the new wave comes.
> The statistical function (BAYES_SPAM) is of no help because the results
> are not correct. The email has a value of 20, through ASN, RBL, Neural
> and Reputation. Then BAYES_Spam comes and says the email is ok -2.
> Learning doesn't help. I now learn every spam email again using rspamc
> learn_spam. The results do not improve.
>
> How do you solve this?
> Christian
That's very interesting and I would like to investigate more. In fact,
both SA and Rspamd are using more or less the same Bayes algorithm with
some slight differences on tokenisation logic.
If you have samples of misclassification, could you please do the
following things:
1) Enable "bayes" debugging (add "bayes" to the list of `debug_modules`
array in the local.d/logging.inc)
2) Check all logs with tag "bayes" when you scan those messages and send
them to me (probably via private email if there's some confidential data
or large attachment)
3) Send me both samples and your Redis dump so I can try to experiment
with that
Maybe (3) would be a huge overkill in terms of privacy and amount of
data, so I would appreciate if you can do 1-2.
Thanks in advance!
More information about the Users
mailing list