[Rspamd-Users] Rspamd 2.6 has been released

Vsevolod Stakhov vsevolod at rspamd.com
Wed Sep 30 19:34:44 UTC 2020

We have released Rspamd 2.6 today.

There are several major projects in this release: neural network plugin
various improvements, better bitcoin scam detection, conditional regular
expressions and other reworks of the code, such as shadow results
support has been done.
Numerous of the bug fixes, including some critical ones have also been
applied during this release cycle.

Here is a list of the major projects and serious bugfixes where applicable.

### Neural network plugin rework

Rspamd now includes
[PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) method
to reduce the input space dimentionality in the heavily customised
environments with many rules. This method allows to transform all rules
set to a fixed number of inputs for neural network using linear
transformation. There are also other improvements for neural network
plugin that have been added in this release, including the following:

- Probabilistic learn method where spam and ham samples could be not
balanced (useful for the cases where spam/ham amounts are significantly
- Allowing to set a maximum number of inputs for ANN (via PCA prefiltering)
- Reworked the internal structure of ANN (more hidden layers and fixed
the output function)
- Low level tensors library for speeding up the matrices operations
- BLIS algebra library support

### Reworked bitcoin detection library

Rspamd now supports lua filters for regular expressions. The idea is to
allow fast pre-filter with regular expressions and slow Lua
postprocessing for the cases where this processing is needed. Here is
how it's used in bitcoin library:

config.regexp['RE_POSTPROCESS'] = {
  description = 'Example of postprocessing for regular expressions',
  re = string.format('(%s) || (%s)', re1, re2),
  re_conditions = {
    [re1] = function(task, txt, s, e)
      if e - s <= 2 then
        return false

      if check_re1(task, txt:sub(s + 1, e)) then
        return true
    [re2] = function(task, txt, s, e)
      if e - s <= 2 then
        return false

      if check_re2(task, txt:sub(s + 1, e)) then
        return true

This allows to add accelerated rules that are enabled merely if some
relatively rare regular expression matches. In this particular case this
feature is used to do BTC wallet verification and validation.

### IDNA bugs are fixed

Dr. Hajime Shimada and Mr. Shirakura from Nagoya University have
investigated that it is possible to bypass Rspamd URLs detection by
using of a special Unicode characters. We have changed this behaviour so
now full IDNA validation/normalisation is performed. I would like to
thank the researchers for sharing that with us.

### Fuzzy module telemetry

Rspamd will now send more data when checking for fuzzy hashes: it will
send the **source IP** address of email being scanned and the **domain
name** of a sender. This data is end-to-end encrypted between you and
Rspamd public fuzzy storage and I plan to use it for better spam
detection. If you don't want this data to be shared then please stop
using of the public fuzzy storage or set `no_share` flag to true.

### Other major improvements

- Use google-ced instead of libicu character detection
- Rework and refactor forged recipients plugin
- Added `SO_REUSEPORT` support for UDP sockets on Linux
- Better Spamhaus DQS service support (e.g. hashbl)
- Added secretbox Lua API for symmetric encryption (AEAD)
- More bitcoin addresses support (Bitcoincash, new BTC addresses etc)
- Timeouts for PDF processing
- Many improvements to the tests and build systems

### Critical/important fixes

* Arc: Fix ARC validation for chains of signatures
* Fix IDNA dots parsing
* Fix usage of crypto_sign it should be crypto_sign_detached!

Here is the list of the important changes:

* [Conf] Add missing symbols
* [Conf] Add missing symbols
* [Conf] Fix fat-fingers typo
* [Conf] Fix wrong comment in options.inc
* [Conf] Neural: Fix the default name for max_trains
* [Conf] Register a known symbol
* [Conf] Spf: Add R_SPF_PERMFAIL symbol
* [CritFix] Arc: Fix ARC validation for chains of signatures
* [CritFix] Distinguish socketpairs between different fuzzy workers
* [CritFix] Fix IDNA dots parsing
* [CritFix] Fix test assertion method
* [CritFix] Fix usage of crypto_sign it should be crypto_sign_detached!
* [Feature] Add BOUNCE rule
* [Feature] Add controller plugins support and selectors plugin
* [Feature] Add maps query method
* [Feature] Add minimal delay to fuzzy storage
* [Feature] Add multiple base32 alphabets for decoding
* [Feature] Add preliminary support of BCH addresses
* [Feature] Add query_specific endpoint
* [Feature] Allow multiple base32 encodings in Lua API
* [Feature] Allow to specify nonces manually
* [Feature] Controller: Allow to pass query arguments to the lua webui
* [Feature] Fuzzy_check: Add gen_hashes command
* [Feature] Fuzzy_check: Add weight_threshold option for fuzzy rules
* [Feature] Implement address retry on connection failure
* [Feature] Improve limits in pdf scanning
* [Feature] Initial support of subscribe command in lua_redis
* [Feature] Lua_cryptobox: Add secretbox API
* [Feature] Lua_text: Add encoding methods
* [Feature] Milter_headers: Allow to activate routines via users settings
* [Feature] PDF: Add timeouts for expensive operations
* [Feature] Preliminary maps addon for controller
* [Feature] Split pdf processing object and output object to allow GC
* [Feature] Support BLIS blas library
* [Feature] Support input vectorisation by recvmmsg call
* [Feature] Support multiple base32 alphabets
* [Feature] add queueid, uid, messageid and specific symbols to
selectors [Minor] use only selectors to fill vars in force_actions message
* [Feature] allow variables in force_actions messages
* [Feature] extend lua api
* [Fix] #3249
* [Fix] Allow to adjust neurons in the hidden layer
* [Fix] Another try to fix email names parsing
* [Fix] Arc: Allow to reuse authentication results when doing
multi-stage signing
* [Fix] Arc: Fix bug with arc chains verification where i>1
* [Fix] Arc: Sort headers by their i= value
* [Fix] Change neural plugin's loss function
* [Fix] Deal with double eqsigns when decoding headers
* [Fix] Default ANN names in clickhouse
* [Fix] Disable reuseport for TCP sockets as it causes too many troubles
* [Fix] Disable text detection heuristics for encrypted parts
* [Fix] Distinguish DKIM keys by md5
* [Fix] Distinguish type from flags in register_symbol
* [Fix] Dmarc: Unbreak reporting after
* [Fix] Do not flag pre-result of virus scanners as least if action is
* [Fix] Do not use GC64 workaround on 32bit platforms, omg
* [Fix] Exclude damaged urls from html parser
* [Fix] Fix FWD_GOOGLE rule (#1815)
* [Fix] Fix adding of the empty archive file for gzip
* [Fix] Fix aliases in forged recipients and limit number of iterations
* [Fix] Fix authentication results insertion
* [Fix] Fix calling of methods in selectors
* [Fix] Fix clen length for hiredis...
* [Fix] Fix endless loop if broken arc chain has been found
* [Fix] Fix false - operation
* [Fix] Fix get_urls table invocation
* [Fix] Fix group based composites
* [Fix] Fix headers passing in rspamd_proxy
* [Fix] Fix incomplete utf8 sequences handling
* [Fix] Fix lua_next invocation
* [Fix] Fix lua_parse_symbol_type function logic
* [Fix] Fix multiple listen configuration
* [Fix] Fix occasional encryption of the cached data
* [Fix] Fix parsing boundaries with spaces
* [Fix] Fix passing of methods arguments
* [Fix] Fix poor man allocator algorithm
* [Fix] Fix regexp selector and add flattening
* [Fix] Fix rfc base32 encode ordering (skip inverse bits)
* [Fix] Fix rfc based base32 decoding
* [Fix] Fix sockets leak in the client
* [Fix] Fix storing of the original smtp from
* [Fix] Fix types check and types usage in lua_cryptobox
* [Fix] Fix unused results
* [Fix] Fuzzy_check: Disable shingles for short texts (really)
* [Fix] Ical: Fix identation grammar
* [Fix] Improve part:is_attachment logic
* [Fix] Mmap return value must be checked versus MAP_FAILED
* [Fix] One more fix to skip images that are not urls
* [Fix] Pdf: Support some weird objects with no newline before endobj
* [Fix] Rbl: Fix ignore_defaults in conjunction with ignore_whitelists
* [Fix] Restore support for `for` and `id` parts in received headers
* [Fix] Segmentation fault in contrib/lua-lpeg/lpvm.c on ppc64el
* [Fix] Skip spaces at the boundary end
* [Fix] Slashing fix: fix captures matching API
* [Fix] Spamassassin: Rework metas processing
* [Fix] Store reference of upstream list in upstreams objects
* [Fix] Understand utf8 in content-disposition parser
* [Fix] Unify selectors digest functions
* [Fix] Use `abs` value when checking composites
* [Fix] Use strict IDNA for utf8 DNS names + add sanity checks for DNS names
* [Fix] Use unsigned char and better support of utf8 in ragel parser
* [Fix] add missing selector_cache declaration
* [Project] Add `L` flag for regexps to save start of the match in Hyperscan
* [Project] Add `lower` method to lua_text
* [Project] Add a simple matrix Lua library
* [Project] Add implicit bitcoincash prefix
* [Project] Add linalg ffi library for prototyping
* [Project] Add methods to append data to fuzzy requests
* [Project] Add routine to call a generic lua function
* [Project] Add ssyev method interface
* [Project] Add tensors index method
* [Project] Add text:sub method
* [Project] Allow rspamd_text based selectors
* [Project] Allow to specify re_conditions for regular expressions
* [Project] Attach extensions to the binary fuzzy commands
* [Project] Bitcoin: BTC cash addresses needs some checksum validation
* [Project] Cleanup the redis script
* [Project] Convert bitcoin rules to the new regexp conditions feature
* [Project] Detect memrchr in systems that supports it
* [Project] Do not listen sockets in the main process
* [Project] Implement 'probabilistic' learn mode for ANN
* [Project] Implement BTC polymod in C as it requires 64 bit ops
* [Project] Implement bitcoin cash validation in a proper way
* [Project] Implement extensions logic for fuzzy storage
* [Project] Implement symbols insertion in multiple results mode
* [Project] Lua_text: Add method memchr
* [Project] Neural: Add PCA loading logic
* [Project] Neural: Fix PCA based learning
* [Project] Neural: Fix matrix gemm
* [Project] Neural: Further PCA fixes
* [Project] Neural: Implement PCA in learning
* [Project] Neural: Implement PCA learning
* [Project] Neural: Implement PCA on ANN forward
* [Project] Neural: Implement PCA serialisation
* [Project] Neural: Start PCA implementation
* [Project] Neural: Use C version of scatter matrix producing
* [Project] Preliminary support of lua conditions for regexps
* [Project] Preliminary usage of the reuseport
* [Project] Process composites separately for each shadow result
* [Project] Remove old code
* [Project] Rework scan result functions to support shadow results
* [Project] Rework some more functions to work with shadow results
* [Project] Some more fixes
* [Project] Start results chain implementation
* [Project] Support fun iterators on rspamd_text objects
* [Project] Support multiply, minus and divide operators in expressions
* [Project] Tensor: Move scatter matrix calculation to C
* [Rework] Allow to specify exat metric result when adding a symbol
* [Rework] Change and improve openblas detection and usage
* [Rework] Close listen sockets in main after fork
* [Rework] Further rework of lua urls extraction API
* [Rework] Lua_cryptobox: Allow to store output of the hash function
* [Rework] Lua_task: Add more methods to deal with shadow results
* [Rework] Modernize logging for expressions
* [Rework] Remove empty prefilters feature - we are not prepared...
* [Rework] Remove old FindLua module, disable lua fallback when LuaJIT
is enabled
* [Rework] Rework and refactor forged recipients plugin
* [Rework] Rework expressions processing
* [Rework] Rework fuzzy commands processing
* [Rework] Rework url flags handling API
* [Rework] Rework urls extraction
* [Rework] Split operations processing and add more debug logs
* [Rework] Update zstd to 1.4.5
* [Rework] Use google-ced instead of libicu chardet as the former sucks
* [Rework] add alias util:parse_addr for util:parse_mail_address
* [Rework] get rid of util:parse_addr duplicating the
util:parse_mail_address, replace where used
* [Rules] Allow prefix for bitcoin cash addresses
* [Rules] More fixes for bitcoin cash addresses decoding
* [Rules] Refactor bleach32 addresses handling

More information about the Users mailing list