[Rspamd-Users] Control rspamd depending on subject content

G.W. Haywood rspamd at jubileegroup.co.uk
Thu Feb 29 17:19:22 UTC 2024


Hi there,

On Thu, 29 Feb 2024, Andreas wrote:

> ... /etc/rspamd/local.d/maps.d/banned_subjects.map:
>
> /*recipe-for-egg*/ BLOCK_SUBJECT:4.5

You haven't used the 'i' modifier in the regexes which you've shown
to make them case insensitive.  That may be deliberate, but most of
the time I use it in, er, case the spammers use the 'shift' keys.

Be aware of the rules for constructing regexes.  They're a bit quirky.
Just as the character '*' is special in filename globs, it's special
in regexes too, but in a different way.  In a filename glob it more or
less means "anything".  In a regex, unless it is 'escaped', it means
"match if the character immediately preceding the asterisk is repeated
zero or more times".  I'm not sure your regex will do what you want it
to do.  Perhaps you mean something like

/.*recipe-for-egg.*/

There are plenty of regex tutorials on the Web, but when you search do
be aware that there are different types of regex:

https://en.wikipedia.org/wiki/Regular_expression#Syntax

They are mostly somewhat similar, with enough differences to make life
interesting if you use more than one kind of them.  The kind used by
rspamd is called "Perl Compatible Regular Expressions" (usually PCRE).
Perl's regexes are IMNSHO the best to use for more or less anything.
Avoid POSIX regexes if you can - I use them a lot and I wish I didn't
have to.

Even though rspamd uses PCRE, it has extended the syntax for its own
purposes.  The rspamd extensions let you specify exactly where to look
in the message for the match so it's much easier to avoid accidentally
matching something that you didn't mean to match - and it's also a lot
more efficient in terms of computing resources of course, you could be
searching just a single line instead of a huge image.  Look at

https://rspamd.com/doc/modules/regexp.html#regular-expressions

which shows you how you can identify with very good granularity the
part or parts of the message which you want to search.  For example
the 'Subject' header you could use

Subject=/egg/i{header}

looks *only* in the Subject header for the string ('egg' or 'Egg' or
'EGG' or 'eGG' or...).

Header field names are case insensitive according to the RFCs.

> However, emails with “info prescription-free pharmacy” are still
> allowed through.

Maybe I'm missing something here but I think you need to show us more
of your config and generally give more details.  Which part of the
message are you searching for the text?  Is the search case sensitive?

This link might help:

https://jeffknerr.github.io/rspamd/regex/multimap/2021/03/02/rspamd-multimap-regex-examples.html

it was just a random result from my search using 'startpage.com' (the
Google front end which I prefer) for

"rspamd regex examples"

It looked like it made sense and the guy had made it work for him.  I
can't vouch for it, with more searching time I'm sure you could do a
lot better.

> I would also be interested in the syntax
> “map = “file:///etc/rspamd/local.d/maps.d/banned_subjects.map”;”
> must be or whether
> “map = “/etc/rspamd/local.d/maps.d/banned_subjects.map”;”
> correct is?

The use of both is shown at

https://rspamd.com/doc/modules/multimap.html#principles-of-work

Using a URI instead of using a file path changes the way in which the
content is accessed.  If there's no compelling reason to use a URI, I
would always use the simpler file path.

Have you used

rspamadm configtest

to check your configuration?

-- 

73,
Ged.


More information about the Users mailing list