[Rspamd-Users] URL Parsing error(s)
Steve Sturges (ststurge)
ststurge at cisco.com
Thu Mar 24 17:19:01 UTC 2022
Hi all—
In a test with rspamd 3.1, I think I’ve identified a parsing error when a URL is extracted from an email message body, but the hostname is malformed.
First, consider a few simple URLs (which may be trying to fake a domain, such as linkedin.com<http://linkedin.com>), where the hostname is actually URL encoded:
http://www.linke%3Din.com
http://www.li%3Dkedin.com<http://kedin.com>
From a lua callback, when invoking task:get_urls(), it returns both URLs unexpectedly with the %3D decoded as a =.
url list {[1] = http://www.li=kedin.com, [2] = http://www.linke=in.com}
However, When the %3D is replaced with the actual = sign representation,
http://www.linke=in.com
http://www.li=kedin.com<http://kedin.com>
the first URL is not even parsed and the URL list just includes this:
url list {[1] = http://www.li}
That second URL is expected, and what appears after the = is just treated as part of the text of the message.
I see two potential errors here:
1) URL decoding for the host name portion of a URL should not occur — only the data that should be URL encoded
2) In the second example, the first URL, in theory should be decoded as http://www.linke.
I will look try to thru the lua plugin code to see if there is an obvious fix next week, unless someone beats me to it.
Cheers
-steve
More information about the Users
mailing list