[Rspamd-Users] multimap and header lines

Tue Nov 19 00:08:30 UTC 2024

> I wanted to just match one header; but wanted to be able to add more headers if necessary. Now my question is: what is the correct way of matching a single header line, from start to end?
> 
> I now have: multimap.conf
> NOTACCEPTABLE { type = "content"; filter = "headers";map = "/tmp/headerblock.map"; regexp = true; action = "reject";
> message = "no thanks"; }
> 
> With headerblock.map saying:
> /(*ANYCRLF)(^|\R)X-fc9822d6-c227-4fb2-a50a-c86656e68129: yes\R/
> 
> which pretty much matches a regular e-mail that has a header
> X-fc9822d6-c227-4fb2-a50a-c86656e68129: yes
> 
> ... but it still doesn't seem to match a <CR><LF> header.

Chances are your rspamd uses hyperscan for regular expressions
which implements a PCRE subset only. \R is not supported:

https://intel.github.io/hyperscan/dev-reference/compilation.html#unsupported-constructs

Moreover \R has been introduced in perl 5.10 recommended by Unicode
whereas E-Mail headers historically are stritctly 7-bit ASCII (this
might change very slowly with SMTPUTF8).

https://perldoc.perl.org/perlrebackslash#%5CR

Therefore I would explicitly state what to expect: \r?\n

You may enable multimap debugging in local.d/logging.inc:
debug_modules=['multimap'];

Then you will see the input that will be searched by your multimap regexes.
type = "content"; and filter = "headers"; receives undecoded headers.
(https://rspamd.com/doc/modules/multimap.html#content-filters)

It might look like this:

rspamd[1927161]: <4Xxk1b>; multimap; multimap.lua:563: check value Received: from localhost (localhost.localdomain [127.0.0.1])\x0A\x09by mx1.example.com (Postfix) with ESMTP id 4X2kAV5t2qzvR7n\x0A\x09for <user at example.com>; Tue, 19 Nov 2024 00:18:15 +0100 (CET)\x0D\x0AX-Virus-Scanned: amavisd-new at example.com\x0D\x0AX....

So this is one long string containing ALL headers in undecoded form.
To check for a special header you could use, e.g.

/^X-Virus-Scanned: amavisd-new at/m MSEU_HWL:1.23

Note the /m switch, which is a regexp modifier that switches to multiline
mode, so that you can match the start and end of each header line using
the ^ and $ symbols. Without multiline (or another suitable modifier)
the regexp processing will stop after the first newline (\n), as it usually
is a line-based search. Subsequent headers would not be found.

Keep in mind filter = "headers" is special as it receives undecoded
headers. Other types and filters might decode and present UTF-8, which would
require the /u (unicode) modifier to search for non-ascii unicode chars
(like umlauts or grapheme clusters).

> Yeah I know that SMTP has strict CRLF rules, but I also know that there's no real penalty.

It depends on the MTA, IIRC qmail does not accept bare LF.
Other MTAs like postfix make that configurable (smtpd_forbid_bare_newline)
or even fix incomplete line endings (sendmail_fix_line_endings) and other
software might add to the spam score.

> So:
> - what is the correct way to match a specific header line from beginning to end?

NOTACCEPTABLE {
  type = "content";
  filter = "headers";
  map = "/etc/rspamd/local.d/maps.d/headerblock.map";
  regexp = true;
  action = "reject";
  message = "no thanks";
}

Try to add to headerblock.map and reload (or wait until rspamd picks it up):

/^X-sender: postmaster at salesforce\.com/m

(Use /im for case insensitive checking)

There are of course other ways, just to add to your choice:

See the local.d/multimap.conf example INVALUEMENT_SENDGRID_ID that uses selectors:
https://rspamd.com/doc/configuration/selectors.html

Or you could write a lua rule:
https://rspamd.com/doc/modules/regexp.html#regular-expressions

/etc/rspamd/rspamd.local.lua:

config['regexp']['X_SENDER_SALESFORCE'] = {
  re = 'X-Sender=/@salesforce\.com/iumxs{header}',
  score = 5.5,
  description = 'sender is salesforce'
}

You could then also use the X_SENDER_SALESFORCE symbol in composites.conf
or force_actions.conf.

Best regards,
Gerald