[Rspamd-Users] How to handle MIME encoded headers?

Gerald Galster list+rspamd at gcore.biz
Mon Jun 17 17:28:18 UTC 2024


> Been using rspamd for a couple of weeks now, and it works just fine.  The only issue I'm having is somehow getting tons of financial clickbait articles that score low on all defaults, so every day I have to delete dozens of these.  They get sneaky and encode the subject lines so instead of seeing things like 'subject: Taiwan in Dаnger Amіd Chinese Drіlls' it is 'subject: =?UTF-8?B?VGFpd2FuIGluIETQsG5nZXIgQW3RlmQgQ2hpbmVzZSBEctGWbGxz?='.  So the normal header examination setup in multimap.conf won't work because the headers have been decoded.  I get not authentic emails with the subjects encoded this way so I'd like to flag these as spam, but not high enough to be outright rejected.  So I'd like to look at the undecoded subject headers and if I see a regex like '=\?UTF-8\?.*\?=' it would add 7.0 to the score.  Except as said, the headers are decoded.  It seems like the following would work (if the subject header was undecoded):
> 
> mime_subject_spam {
>        type = "header"; <=== needs changing?
>        header = "subject";
>        filter = "regexp:/.*UTF\-8\?.*\?=/i";
>        map = "/var/rspamd/maps/mime_subject_spam.map"; <=== don't need a map but it complains, so an empty file?
>        symbol = "MIME_SUBJECT_SPAM";
>        description = "Detect mime-encoded spam subjects";
>        score = 7.0;
>        regexp = true;
> }


See https://rspamd.com/doc/modules/multimap.html#content-filters

  For content maps, the following filters are supported
  - headers -> undecoded headers

You could try something like this (untested):

MIME_SUBJECT_SPAM {
  type = "content";
  filter = "headers";
  map = "/etc/rspamd/local.d/maps.d/mime_subject_spam.map";
  description = "Detect mime-encoded spam subjects";
  score = 7.0;
  regexp = true;
}

/etc/rspamd/local.d/maps.d/mime_subject_spam.map:
/^Subject:.*?=\?UTF-8\?(B|Q)\?/

For quoted printable in MIME see
https://en.wikipedia.org/wiki/MIME#Encoded-Word


Besides that, have you had a look at your logs which symbols match?

https://github.com/rspamd/rspamd/blob/master/rules/regexp/headers.lua#L284-L299

There are lua rules for SUBJ_EXCESS_BASE64 and SUBJ_EXCESS_QP that should help:
"Subject header is unnecessarily encoded in base64/quoted-printable"

It might be sufficient to just bump those symbols' scores up.

It's also possible to write a lua rule (place it in /etc/rspamd/rspamd.local.lua):
https://rspamd.com/doc/developers/writing_rules.html#configuration-files
... and other examples on that site.


Best regards,
Gerald





More information about the Users mailing list