[Rspamd-Users] Bayes questions and observations

Sat Mar 16 16:00:28 UTC 2024

Hi there,

On Sat, 16 Mar 2024, christian via Users wrote:

> ... e.g. an email that has already undergone several checks in RspamD:
>
> X-Spamd-Result: default: False [20.03 / 30.00];
> PH_SURBL_MULTI(7.50)[dennisberrien.com:url];
> NEURAL_SPAM_SHORT(3.00)[1,000];
> HFILTER_HOSTNAME_UNKNOWN(2.50)[];
> MISSING_MID(2.50)[];
> IP_REPUTATION_SPAM(1.39)[asn: 47674(0.23), country: MO(0.01), ip: 
> 185.236.231.93(0.00)];
> R_BAD_CTE_7BIT(1.05)[7bit,utf8];
> R_NO_SPACE_IN_FROM(1.00)[];
> MV_CASE(0.50)[];
> FORGED_SENDER(0.30)[no-reply at ehtakoskelo.fi,return at ehtakoskelo.fi];
> MIME_HTML_ONLY(0.20)[];
> ONCE_RECEIVED(0.10)[];
> MX_GOOD(-0.01)[];
> BAYES_SPAM(-5.00)[99.99%];
>
> But I have already learned such emails using rspamc learn_spam ...

I really do think that you're making it difficult for yourself.

AFAICT nothing good ever came out of AS47674.  The average DNSBL score
recorded in our database for connections from this ASN is 8.14.  That
means that on average, every connecting IP is on at least three of the
DNSBLs we use (the maximum weight for any single DNSBL here is 3.0).

There's really no point messing about with Bayes for ASNs like this
one, just drop everything from them.

If you wish I can easily provide a list of ASNs with scores greater
than whatever value you desire, which you then could drop with very
good confidence that nobody except the spammers would notice.

The 'score' here is the weighted average of the number of DNSBLs - in
our list of chosen BLs - on which an individual IP is found.  Spamhaus
'Zen' for example has a weight of 3; most of the others have weights
of only 1 or 2.  In my capacity as spam-hater-in-chief, I decide the
weights which we apply to the individual DNSBLs.  It works well now,
but I'm sure there's a lot of room for refinement.

score   count of
avg.>=   ASNs
------------------
  0.0	7283
  1.0    6406
  2.0    6197
  3.0    5985
  4.0	5705
  5.0	5417
  6.0	5047
  7.0	4647
  8.0	4252
  9.0	3797
10.0	3275
11.0	2804
12.0	2330
13.0	1826
14.0	1296
15.0	 719
16.0	 325
17.0	 155
18.0	  46
19.0	  11
20.0	   1
------------------

As you can see, of the IPs from the 7283 ASNs which have connected to
us, about 6000 typically scored three or more from the DNSBLs we use.
The vast majority of those send absolutely nothing but spam.

We tempfail at a score > 1.5.  At 4.0 and above, if the spam rules
find a hit in the individual message, we autoreport.  In the past five
years this has produced one false report (a Microsoft server, which
managed to get itself listed by a couple of well-regarded blacklists
and which sent us a DMARC failure report from a mailing list).

As of this afternoon there are 5383 ASNs with average BL *counts* > 3
(that is ASNs with IPs which, on connecting to us are typically listed
on more than three of the DNSBLs which we use).

For most of the ASNs it's fairly pointless doing the scoring exercise
every time and I'd suggest that, unless you have other priorities than
running an efficient mail service, you just drop the connection like a
hot potato as soon as it comes in.

-- 

73,
Ged.