[Rspamd-Users] regex matching only in "Scan/Learn" UI

Tue Oct 22 21:21:15 UTC 2024

> what could be the reason that this spam mail body part:
> 
> --0000000000001835a10625032efb
> Content-Type: text/plain; charset="UTF-8"
> 
> https://t.co/QOvzySU9xJ
> 
> --0000000000001835a10625032efb
> Content-Type: text/html; charset="UTF-8"
> 
> <div dir="ltr"><a href="https://t.co/QOvzySU9xJ">https://t.co/QOvzySU9xJ</a><br></div>
> 
> --0000000000001835a10625032efb--
> 
> 
> successfully matches my regex:
> 
> /Content-Type: text\/plain; charset="UTF-8"
> 
> https:\/\/\w+\.\w+\/\S{5,15}
> 
> /m
> 
> while testing in „Scan/Learn“ tab of the web interface, but not during the scan of the incoming mail?

It might be due to line endings: \n versus \r\n via smtp.
You could try to use \r?\n in your regex.

> OTOH, with the HFILTER_URL_ONLY symbol it’s vice versa: shows up there, but not during the „Scan/Learn“ thing. 

You could add a multimap with your own domains, e.g.

t.co  MY_MESSENGER_DOMAINS

and combine that with HFILTER_URL_ONLY:

https://rspamd.com/doc/configuration/composites.html

MESSENGER_URL_ONLY_COMPOSITE {
    expression = "MY_MESSENGER_DOMAINS and HFILTER_URL_ONLY";
    score = 5.0;
}

Best regards,
Gerald