[Rspamd-Users] URL Parsing error(s)

Steve Sturges (ststurge) ststurge at cisco.com
Thu Mar 24 17:19:01 UTC 2022


Hi all—

In a test with rspamd 3.1, I think I’ve identified a parsing error when a URL is extracted from an email message body, but the hostname is malformed.

First, consider a few simple URLs (which may be trying to fake a domain, such as linkedin.com<http://linkedin.com>), where the hostname is actually URL encoded:

http://www.linke%3Din.com
http://www.li%3Dkedin.com<http://kedin.com>

From a lua callback, when invoking task:get_urls(), it returns both URLs unexpectedly with the %3D decoded as a =.

url list {[1] = http://www.li=kedin.com, [2] = http://www.linke=in.com}

However, When the %3D is replaced with the actual = sign representation,

http://www.linke=in.com
http://www.li=kedin.com<http://kedin.com>

the first URL is not even parsed and the URL list just includes this:

url list {[1] = http://www.li}

That second URL is expected, and what appears after the = is just treated as part of the text of the message.

I see two potential errors here:

1) URL decoding for the host name portion of a URL should not occur — only the data that should be URL encoded
2) In the second example, the first URL, in theory should be decoded as http://www.linke.

I will look try to thru the lua plugin code to see if there is an obvious fix next week, unless someone beats me to it.

Cheers
-steve


More information about the Users mailing list