[Rspamd-Users] Questions regarding how to increase rspamd's coverage on abused legitimate services/"living off trusted services" (LOTS)

Sat Mar 23 12:07:00 UTC 2024

Hello fellow rspamd users,

above all, apologies if the questions/suggestions below have already been discussed
here before (in which case, please point me to the relevant thread, as I was unable
to find one). This e-mail is something between rspamd-users and rspamd-development,
as I hope to implement the ideas below soon, if they strike you all as sensible.

Triggered by https://lots-project.com/, I was thinking of ways to increase rspamd's
coverage on phishing or malspam campaigns that rely on the abuse of legitimate services
("living off trusted services" [LOTS], similar to the "living off the land" TTPs in the
malware ecosystem). Somewhat related, it seems like IPFS has recently gained momentum
again in spam campaigns, sometimes through URL redirectors and the like.

My ideas are as follows:

- Currently, to the best of my understanding, rspamd does attempt to dereference
  shortened URLs, and checks the FQDNs against configured DNSBLs (correct me if this
  is wrong).

  However, regexp-based checks such as for IPFS gateway URLs, etc. are not performed
  on URLs dereferenced from a shortener URL in a message. Fixing this would probably
  reduce false negatives for known bad URLs if they are being disguised by a shortener.

- rspamd maintains a list of redirectors, but not of abused legitimate services
  (such as those mentioned by https://lots-project.com/). Unless a DNSBL lists
  either the involved FQDN (appears to happen rarely due to false positives), or a
  hash of the involved URL, rspamd misses that a message contains a link involving
  an abused legitimate service.

  Maybe introducing a map of such services, including a score of "how bad" the
  situation is, would make sense - similar to attachment types. For example, while
  a OneDrive link (1drv[.]ms et al.) could be legitimate in an e-mail, there is
  very little legitimate use of distributing a *.workers[.]dev (yet another service
  dumped on the world by Cloudflare without any apparent abuse prevention whatsoever :-/ )
  directly via an e-mail.

  While blankly blocking messages based on the presence of such LOTS links is probably
  not feasible, it would at least allow for some scoring, and machine learning to
  pick up such characteristics in spam messages.

- rspamd currently checks the file suffixes and MIME types of attachments. But it
  does not try to attempt to figure out if an URL in a message would lead to the
  download of a file with a "bad" suffix (.lnk, etc.).

  Although this is not a silver bullet, adding checks for trying to determine the
  file suffix from a URL in a message could increase coverage on spam mails containing
  malicious links that are not flagged by DNSBLs and the like already.

Somewhat related are two other occasions for further tuning:

- In contrast to SpamAssassin, rspamd currently does by default resolve IP addresses
  for links in messages, and checks the reputation of these IPs against DNSBLs.

  I get that enabling this by default has a performance impact, as there can be
  dozens of links in a message, and slow DNS response times may cause a DoS against
  rspamd. But from my experience, enabling this picks up a decent amount of badness,
  pushing more messages over the edge to "spam message rejected".

  I therefore wonder if this is something that can be enabled by default again, if
  additional safeguards are in place to prevent excessive performance decrease.

- As attachment policies are increasingly tightened, PDF abuse has increased. Sometimes,
  PDFs disseminated in spam campaigns include a blurred image of the lure, overlayed
  by an IPFS gateway link. Sometimes, they directly contain JavaScript exploits, and
  so on.

  I wonder if rspamd could extract URLs from PDF attachments, and check these against
  local rules, such as regexp patterns looking for IPFS gateway URLs. Checking all
  these links against DNSBLs, however, is probably way out of questions, given that there
  can be hundredths in a single PDF file.

What do you think? Any additional improvement potential I forgot (which is very likely)?

Cheers,
Tobias