commit e5328bd: [Fix] Fix emoji joiner FP

Vsevolod Stakhov vsevolod at rspamd.com
Mon Oct 3 22:21:04 UTC 2022


Author: Vsevolod Stakhov
Date: 2022-10-03 23:16:33 +0100
URL: https://github.com/rspamd/rspamd/commit/e5328bd63e30aba25e20fb94a21927a5eef61e50 (HEAD -> master)

[Fix] Fix emoji joiner FP
Issue: #4290

---
 src/libutil/cxx/utf8_util.cxx | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/libutil/cxx/utf8_util.cxx b/src/libutil/cxx/utf8_util.cxx
index 8c727e9ad..0e7cd39d7 100644
--- a/src/libutil/cxx/utf8_util.cxx
+++ b/src/libutil/cxx/utf8_util.cxx
@@ -85,8 +85,10 @@ rspamd_normalise_unicode_inplace(char *start, size_t *len)
 	if (!zw_spaces.isFrozen()) {
 		/* Add zw spaces to the set */
 		zw_spaces.add(0x200B);
+		/* TODO: ZW non joiner, it might be used for ligatures, so it should possibly be excluded as well */
 		zw_spaces.add(0x200C);
-		zw_spaces.add(0x200D);
+		/* See github issue #4290 for explanation. It seems that the ZWJ has many legit use cases */
+		//zw_spaces.add(0x200D);
 		zw_spaces.add(0xFEF);
 		zw_spaces.add(0x00AD);
 		zw_spaces.freeze();


More information about the Commits mailing list