[Rspamd-Users] Unexpected URLs

Steve Sturges (ststurge) ststurge at cisco.com
Thu Jul 21 19:45:28 UTC 2022


Hi all—

Testing something with rspamd 3.2, I have an email body with a multipart, one of which is text/html:

--_000_6be055295eab48a5af7ad4022f33e2d0_
Content-Type: text/html; charset="utf-8"

<html><body>
<a href="http://somewhere.example.net">https://somewhereelse.otherexample.com</a>
</html>

In a lua plugin I’m building, I run task:get_parts() followed by part:get_urls(), it returns two URLs, both are the value of the href target, but nothing about the text that would be displayed.

local function url_test()
    local all_urls = {}
  local parts = task:get_parts()
  if not parts then
    return nil
  end
  for _,part in ipairs(parts) do
    if part:is_text() then
      local urls = part:get_urls()
      rspamd_logger.debugx("task:get_parts -> part:get_urls: %1", urls)
      for _,url in ipairs(urls) do
        table.insert(all_urls, url)
      end
    end
  end
  return all_urls
end

The output from the above code:

task:get_parts -> part:get_urls: {[1] = http://somewhere.example.net, [2] = http://somewhere.example.net}

I can definitely see a reason to return 2 URLs — link text is different than the target; however, the result is unexpected — I would expect either a single URL from the href, or both the URL from the href and the one that is the display text.

Any ideas before I dig into the C++-side of rspamd?

Cheers
-steve



More information about the Users mailing list