[Rspamd-Users] Multimap and syntax...

Sun Mar 3 23:16:20 UTC 2024

> When I take a closer look at your answers, it seems that the income filtering is mainly done by Bayes

No. Spam filtering consists of many tests that add to the global score. A mail is considered spam if that global score is high enough.
Bayes is just one of those tests, don't forget blocklists (Spamhaus, ...), whitelist module, multimaps, neural checks, dkim checks, antivirus, fuzzy, reputation ...

> and you train this filter.

Yes, Bayes has to be trained. It won't work until it has sufficiently been trained.
https://rspamd.com/doc/configuration/statistic.html
(min_learns = 200;)

> The decisive factor is the score of an email as to whether it is listed as spam or ham in the Bayes filter.

I don't know what you mean by that. When an email is learned as spam, its text is tokenized (sort of split into words) and those tokens are then associated with spam.
Any new mail is tokenized and compared with existing spam/ham tokens. The score the Bayes filter calculates from that tells you how likely it considers an email as spam. 

As I mentioned before the Bayes filter is just one test. Many other tests may add their scores as well.

> I completely deleted the redis entries for rspamd and started learning from scratch. But after a few hours I have a large surplus of Ham entries - about 100:10. I don't think that's the point of the matter. After one day I have 5000 BAYES_HAM entries and 600 BAYES_SPAM.

For initial learning you should manually train a corpus of ham/spammails (more than min_learns). It's best to train the same amount of ham mails and spam mails.

> But when I look at spam emails that get through, BAYES_SPAM/HAM is not checked at all.

Then you should manually train that mail as spam (rspamc learn_spam /path/to/spammail.eml).

> Here is an example of Spam:
> The sender Email ist on my multimap blacklist. No Multimap test and no BAYES Test.

For multimap I see two possibilities:

- config is wrong (checking for wrong selector or something like that)
- with regex: the regex does not match (wrong regex definition)

> Here is an example of a non-spam:
> X-Spamd-Result: default: False [1.87 / 30.00];
> 	INFO_TO_INFO_LU(2.00)[];
> 	SUBJECT_HAS_CURRENCY(1.00)[];
> 	DMARC_POLICY_ALLOW(-0.50)[unitedplugins.com,reject];
> 	R_DKIM_ALLOW(-0.20)[unitedplugins.com:s=mailjet];
> 	R_SPF_ALLOW(-0.20)[+ip4:185.250.236.0/22];
> 	MAILLIST(-0.11)[generic];
> 	MIME_GOOD(-0.10)[multipart/alternative,text/plain];
> 	MX_GOOD(-0.01)[];
> 	HAS_LIST_UNSUB(-0.01)[];
> 	DKIM_TRACE(0.00)[unitedplugins.com:+];
> 	RCPT_COUNT_ONE(0.00)[1];
> 	TO_MATCH_ENVRCPT_ALL(0.00)[];
> 	SPF_REPUTATION_HAM(0.00)[-0.51883337370734];
> 	IP_REPUTATION_HAM(0.00)[asn: 200069(-0.21), country: FR(0.00), 	 ip: 185.250.237.60(0.00)];
> 
> I trained the email as HAM. But no BAYES entry appears.

Check your logs if rspamd complains that Bayes has not been trained enough.

Otherwise learn the message with: rspamc learn_spam /path/to/spammail.eml
Then check if it's recognized: rspamc /path/to/spammail.eml  (BAYES_SPAM should be listed)

You do not train spam per user, right?
#per_user = true; # Enable per user classifier
https://rspamd.com/doc/configuration/statistic.html

> In addition, the domain is in a multimap whitelist which is also not displayed. The email is accepted, but only just.

Then any of your definition/selector/regex is wrong. Multimap works if configured correctly.

>> Rspamd includes the public suffix list (see https://publicsuffix.org/list/).
>> https://github.com/rspamd/rspamd/blob/master/contrib/publicsuffix/effective_tld_names.dat
> 
> Ok, then I don't have to worry about the multiple TLDs. Rspamd does this automatically.
> 
>> Try to be more precise when reading the documentation.
> 
> Unfortunately, the documentation is very confusing and not very structured. You don't recognize the connections.

As I wrote before:

  You've copied the example "email:domain:tld" which converts user at foo.example.com to example.com.
  So user at cmp.dotmail.co.uk will be converted to dotmail.co.uk, which is not in your list and therefore does not match.

You've added "email:domain" style domains to your multimap but configured "email:domain:tld" and wondered why it did not work.
The example in the documentation was clear about that and that's why I wrote you should try to be more precise when reading the documentation.

>> Just a hint: if you add e.g. adidas.com to your whitelist, any spammer that sends with @adidas.com is probably whitelisted due to score -20.
>> I'd rather train rspamd to filter spam and use those maps to assist learning. Otherwise a spammail with an added score of -20 will probably be learned as ham, which can ruin your bayes filter.
> 
> 
> Should an email that does not actually come from adidas.com not be checked further and be assessed differently as phishing? Check against DKIM and MX. This makes it clear that the email doesn't really come from adidias.com, right? OK, maybe -20 is a bit much.

This was just an example of what can happen when you set extreme scores like -20, it was not about the domain adidas.com.

Of course there are other tests and rspamd will check DKIM/DMARC/... if configured.

> But what always surprises me is that it's hard to understand why sometimes my multimaps work and the next email doesn't.

That means rspamd generally knows about the multimap, otherwise it would never match.
If it matches only sometimes you did not correctly configure the selector/type or the multimap content does not match (errors in regex, incomplete domainnames, ...).

> Why I can see that Bayesian statistics counts up for incoming emails, but no check is displayed in the email fields.

Check the logs if it complains about too few learned emails.

Best regards,
Gerald