[Rspamd-Users] New Spamhaus zone and updates to the rules

Thu Apr 30 09:26:44 UTC 2020

On 30/04/2020 09:56, Riccardo Alfieri wrote:
> Hello,
> 
> I'm happy to announce to the Rspamd community that Spamhaus has released
> an updated version of our rules that solves minor issues and, more
> importantly, adds support for a new dataset we just released.
> 
> The new zone is called HBL (Hash BlockList) and deals with three
> different email scenarios previously not covered by the plugin:
> 
> - Dropbox emails: emails - mostly on freemail providers - used in
> 419-like scams, sextortions and the like
> - Cryptowallets: malicious crypto addresses used mainly in extortion
> scams. Currently supports BTC,BCH,LTC,XRP,XMR and ETH
> - Filehash: hashes of suspicious or confirmed malicious attachments
> 
> All the relevant technical information is available at
> https://docs.spamhaustech.com/10-data-type-documentation/datasets/030-datasets.html#hbl
> 
> 
> HBL is a zone available only to paid-for DQS users, but we do offer a
> free trial; just follow the instructions at
> https://github.com/spamhaus/rspamd-dqs
> 
> Even if you are not planning to use HBL, we strongly suggest you to
> update the rules to the latest release for general security.
> 
> We'd love some feedback and I'm always open for suggestions or
> discussion. Thank you!
> 

Unfortunately you have not considered my suggestions and I had no time
to fix and test these zones by myself. Hence, I cannot recommend using
of these rules in any production environment as they suffer from
performance and support issues.

I'll start with the most important ones:

1) That should be a part of RBL module! With crazy usage of
`r:resolve_a({ task = task, name = lookup , callback = dns_cb, forced =
true })` it is trivial to create a message that would kill your DNS
infrastructure after a few emails.
So the only way to do it properly is to reuse bells and whistles from
RBL module (probably that would require patching) and selectors framework.
All other ways are not good from architectural point of view and suffer
from lot's of mistakes.

2) `local re =
rspamd_re.create_cached('^(?:bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$')`
should not be used to match regular expressions: this is not optimized
by hyperscan and you can easily find yourself running in an semi-endless
backtracking loop with PCRE. So it is a big no way for production usage.
Furthermore, Rspamd has `BITCOIN_ADDR` symbol that can detect and
validate some of the popular bitcoin formats (it uses the same approach
as your rule but this needs to be changed tbh). Notably exceptions are
ETH and monero wallets - these are NYI. However, BTC and BTC cache
(segwit) addresses are fully supported.
So there is no need to call this expensive RE multiple times I suppose -
it is better to refactor and improve BITCOIN_ADDR symbol and use it in
the selectors based RBL.

3) Rspamd now supports RFC version of base32 out of the box, so no need
to use Lua for string transformation - it is not a good idea in general.

4) Sha256 is quite a bad choice if it comes to Rspamd. Ideally would be
to have a blake2b zone of hashes together with sha256 for other
appliances. The reason to do it aside of the overall speed of blake2b
comparing to sha256 is that blake2b hash is already calculated for all
content parts. Not sure if that's doable.

I'm not quite sure that I would have time to fix these issues in short
terms I'm afraid. However, the current rules set seems to be not ready
for the production usage.