[Rspamd-Users] UFT8 versus raw

Steve Sturges (ststurge) ststurge at cisco.com
Thu Sep 17 14:40:31 UTC 2020


I have a question related to how rspamd is handling UTF-8 encoding versus raw, and regex matches that are related.  Lets say there is an email with a subject that is encoded, eg:

Subject: =?gb2312?b?tPq/qreixrExMzUzNzUzODQ0MrPC?=

With the following rules to demonstrate the issue I’m encountering in that it seems that rspamd is always converting to UTF-8 and ignoring flags that are specified in the regex or configuration specified for rspamd as a whole.  Consider these 3 regex’s, the difference between them being the modifier at the end of the expression (none, utf8, raw):

reconf['TEST_SUBJ1_H'] = {
   re = 'Subject=/\\xE5\\xBC\\x80.{0,3}\\xE5\\x8F\\x91.{0,3}\\xE7\\xA5\\xA8/H',
   policy = 'leave',
   one_shot = true,

reconf['TEST_SUBJ1_Hu'] = {
   re = 'Subject=/\\xE5\\xBC\\x80.{0,3}\\xE5\\x8F\\x91.{0,3}\\xE7\\xA5\\xA8/Hu',
   policy = 'leave',
   one_shot = true,

reconf['TEST_SUBJ1_Hr'] = {
   re = 'Subject=/\\xE5\\xBC\\x80.{0,3}\\xE5\\x8F\\x91.{0,3}\\xE7\\xA5\\xA8/Hr',
   policy = 'leave',
   one_shot = true,

The rules are looking for: '开.{0,3}发.{0,3}票' ~ 'E5BC80 .{0,3} E58F91 .{0,3} E7A5A8' and is present in UTF-8 encoded data.  In options.inc, regardless of whether I set to raw mode to true (raw_mode = true;)or false (raw_mode = false;), the symbols that match are TEST_SUBJ1_H & TEST_SUBJ1_Hr.
It appears that the decoding is always UTF-8 and the raw options — either on the regex itself or in the overall configuration — are being ignored.

Based on the documentation and descriptions of the modifiers, the expectation is as follows:

raw_mode = true;

—> Only match will be TEST_SUBJ_Hu

raw_mode = false;

—> Two matches will be TEST_SUBJ_H & TEST_SUBJ_Hu

Does rspamd always do the conversion to UTF-8, and then process the raw as a just-in-time decoding?  Or visa-versa, depending on the configuration?
Is there something else in the configuration that is missing, or perhaps in how rspamd is getting built, that could be causing this behavior?  Or are the expectations incorrect?


More information about the Users mailing list