[Rspamd-Users] URL Parsing error(s)

Steve Sturges (ststurge) ststurge at cisco.com
Thu Mar 24 17:19:01 UTC 2022

Hi all—

In a test with rspamd 3.1, I think I’ve identified a parsing error when a URL is extracted from an email message body, but the hostname is malformed.

First, consider a few simple URLs (which may be trying to fake a domain, such as linkedin.com<http://linkedin.com>), where the hostname is actually URL encoded:


From a lua callback, when invoking task:get_urls(), it returns both URLs unexpectedly with the %3D decoded as a =.

url list {[1] = http://www.li=kedin.com, [2] = http://www.linke=in.com}

However, When the %3D is replaced with the actual = sign representation,


the first URL is not even parsed and the URL list just includes this:

url list {[1] = http://www.li}

That second URL is expected, and what appears after the = is just treated as part of the text of the message.

I see two potential errors here:

1) URL decoding for the host name portion of a URL should not occur — only the data that should be URL encoded
2) In the second example, the first URL, in theory should be decoded as http://www.linke.

I will look try to thru the lua plugin code to see if there is an obvious fix next week, unless someone beats me to it.


More information about the Users mailing list