[Rspamd-Users] multimap and header lines
Gerald Galster
list+rspamd at gcore.biz
Tue Nov 19 00:08:30 UTC 2024
> I wanted to just match one header; but wanted to be able to add more headers if necessary. Now my question is: what is the correct way of matching a single header line, from start to end?
>
> I now have: multimap.conf
> NOTACCEPTABLE { type = "content"; filter = "headers";map = "/tmp/headerblock.map"; regexp = true; action = "reject";
> message = "no thanks"; }
>
> With headerblock.map saying:
> /(*ANYCRLF)(^|\R)X-fc9822d6-c227-4fb2-a50a-c86656e68129: yes\R/
>
> which pretty much matches a regular e-mail that has a header
> X-fc9822d6-c227-4fb2-a50a-c86656e68129: yes
>
> ... but it still doesn't seem to match a <CR><LF> header.
Chances are your rspamd uses hyperscan for regular expressions
which implements a PCRE subset only. \R is not supported:
https://intel.github.io/hyperscan/dev-reference/compilation.html#unsupported-constructs
Moreover \R has been introduced in perl 5.10 recommended by Unicode
whereas E-Mail headers historically are stritctly 7-bit ASCII (this
might change very slowly with SMTPUTF8).
https://perldoc.perl.org/perlrebackslash#%5CR
Therefore I would explicitly state what to expect: \r?\n
You may enable multimap debugging in local.d/logging.inc:
debug_modules=['multimap'];
Then you will see the input that will be searched by your multimap regexes.
type = "content"; and filter = "headers"; receives undecoded headers.
(https://rspamd.com/doc/modules/multimap.html#content-filters)
It might look like this:
rspamd[1927161]: <4Xxk1b>; multimap; multimap.lua:563: check value Received: from localhost (localhost.localdomain [127.0.0.1])\x0A\x09by mx1.example.com (Postfix) with ESMTP id 4X2kAV5t2qzvR7n\x0A\x09for <user at example.com>; Tue, 19 Nov 2024 00:18:15 +0100 (CET)\x0D\x0AX-Virus-Scanned: amavisd-new at example.com\x0D\x0AX....
So this is one long string containing ALL headers in undecoded form.
To check for a special header you could use, e.g.
/^X-Virus-Scanned: amavisd-new at/m MSEU_HWL:1.23
Note the /m switch, which is a regexp modifier that switches to multiline
mode, so that you can match the start and end of each header line using
the ^ and $ symbols. Without multiline (or another suitable modifier)
the regexp processing will stop after the first newline (\n), as it usually
is a line-based search. Subsequent headers would not be found.
Keep in mind filter = "headers" is special as it receives undecoded
headers. Other types and filters might decode and present UTF-8, which would
require the /u (unicode) modifier to search for non-ascii unicode chars
(like umlauts or grapheme clusters).
> Yeah I know that SMTP has strict CRLF rules, but I also know that there's no real penalty.
It depends on the MTA, IIRC qmail does not accept bare LF.
Other MTAs like postfix make that configurable (smtpd_forbid_bare_newline)
or even fix incomplete line endings (sendmail_fix_line_endings) and other
software might add to the spam score.
> So:
> - what is the correct way to match a specific header line from beginning to end?
NOTACCEPTABLE {
type = "content";
filter = "headers";
map = "/etc/rspamd/local.d/maps.d/headerblock.map";
regexp = true;
action = "reject";
message = "no thanks";
}
Try to add to headerblock.map and reload (or wait until rspamd picks it up):
/^X-sender: postmaster at salesforce\.com/m
(Use /im for case insensitive checking)
There are of course other ways, just to add to your choice:
See the local.d/multimap.conf example INVALUEMENT_SENDGRID_ID that uses selectors:
https://rspamd.com/doc/configuration/selectors.html
Or you could write a lua rule:
https://rspamd.com/doc/modules/regexp.html#regular-expressions
/etc/rspamd/rspamd.local.lua:
config['regexp']['X_SENDER_SALESFORCE'] = {
re = 'X-Sender=/@salesforce\.com/iumxs{header}',
score = 5.5,
description = 'sender is salesforce'
}
You could then also use the X_SENDER_SALESFORCE symbol in composites.conf
or force_actions.conf.
Best regards,
Gerald
More information about the Users
mailing list