[Rspamd-Users] block foreign charsets

G.W. Haywood rspamd at jubileegroup.co.uk
Tue May 16 11:52:47 UTC 2023


Hi there,

On Tue, 16 May 2023, Katharina Knuth via Users wrote:

> Hello, how can I block foreign charsets in rspamd?
> Like
>
> /Content-Type:.charset\s=\s*\“(koi8-r|big5|euc-kr|gb2312|ksc5601-1987|iso-2022-jp|windows-1251)\“/
>
> I do this at present with pcre via Postfix main.cf

Matches like this should always be case insensitive.  The dot after
the colon in your regex seems to me to serve no purpose.  Unless they
are required by the RFCs, whitespace and quotes should be optional
yet your regex requires whitespace before the '=' and also requires
quotes, which are only needed if the quoted content includes what are
known as 'tspecials' in the RFCs.  The hyphen character is not listed
amongst the tspecials.  For performance reasons avoid unlimited match
lengths, although your use of the asterisk above will probably not be
a performance problem.

Many Perl modules exist on CPAN which do things with message headers,
you might want to look at some of them for some guidance.  Things like
LinearWhiteSPace and(comments)can interfere with the scanning process.

You might also find published Yara rules which help you, although Yara
now uses its own regex engine so care may be needed in translation.

For a good grounding see the RFCs

https://www.rfc-editor.org/rfc/rfc2045 (and 2046, 2047)

but be prepared for the need to become familiar with BNF-speak.  See
section 5.1 of RFC2045 for tspecials.

In their efforts to sidestep your scanners, spammers may deliberately
play fast and loose with the MIME specifications.  You might want to
(if I too may play fast and loose) "be liberal in what you reject"...

Personally I find relatively little benefit from matching on these
character sets.  In my experience they appear somewhat infrequently in
mail, and other indicators like ASN generally, er, envelop them; but I
agree that they may be a good indication of unwantedness.  Of course
there are quite a few others, which we see here even less frequently.

-- 

73,
Ged.


More information about the Users mailing list