[Rspamd-Users] Redis huge database

Wed Nov 23 19:23:09 UTC 2022

Citát azurit at pobox.sk:

> Citát Alexander Moisseev via Users <users at lists.rspamd.com>:
>
>> On 23.11.2022 10:31, azurit at pobox.sk wrote:
>>>
>>> Citát Alexander Moisseev via Users <users at lists.rspamd.com>:
>>>
>>>> On 22.11.2022 22:45, azurit at pobox.sk wrote:
>>>>> i'm having problems with Redis database - it's huge and getting  
>>>>> bigger, no matter what i do. Redis is taking more and more  
>>>>> memory. If i look at the keys i see about 500 000 keys with name  
>>>>> similar to 'RS_1028729928929298385'. Is this normal?
>>>>>
>>>>
>>>> AFAIK no one has ever researched the correlation of number of  
>>>> statistical tokens and quality of classification, but I guess  
>>>> 0.5M keys may not be enough. Just for reference, I have about  
>>>> 4.2M RS_* keys in the bayes database (per user classifier is no  
>>>> enabled), used_memory_human:554.79M.
>>>
>>>
>>> My Redis database has almost 1 GB and Redis needs 4 GB of memory.  
>>> With lower values, i'm getting this error from rspamd:
>>> Nov 17 00:13:56 server00 rspamd[4086]: <177d52>; lua;  
>>> history_redis.lua:132: got error OOM command not allowed when used  
>>> memory > 'maxmemory'. when writing history row: no value
>>>
>> The on-disk .rdb file is compressed, so 1GB is a relatively large database.
>>
>>> rspamd is the only service using this Redis instance.
>>>
>> As you store everything related to Rspamd in the single Redis  
>> instance there is no easy way to determine how much database space  
>> each module consumes, I'm afraid. Probably you need to to count the  
>> number of keys matching patterns with redis-cli.
>> Also some excessive numbers (like stored fuzzy hashes, history  
>> nrows, etc.) can indirectly indicate the source of the problem.
>
>
> Ok, here are the numbers:
>
> keys beginning with "RS_": 1007930
> keys beginning with "RR:": 27141
> keys beginning with "rs_first_": 895
> everything other: 37983
>
> I don't see any pattern in other keys, seems random, for example  
> rrxc1p8tu5skays4a5gu63 (but lots of them begins with 'rr' like the  
> one in the example).
>
> Other data:
>
> 127.0.0.1:6379> debug object BAYES_HAM
> Value at:0x7f836d192ab0 refcount:1 encoding:hashtable  
> serializedlength:762292662 lru:8248068 lru_seconds_idle:43
>
> 127.0.0.1:6379> debug object BAYES_SPAM
> Value at:0x7f838bae6450 refcount:1 encoding:hashtable  
> serializedlength:138423713 lru:8248136 lru_seconds_idle:11
>
> Configuration in /etc/rspamd/local.d/statistic.conf:
> classifier "bayes" {
>     expire = 100d;
>     new_schema = true;
>     tokenizer {
>         name = "osb";
>     }
>
>     # Minimum number of words required for statistics processing
>     min_tokens = 11;
>     # Minimum learn count for both spam and ham classes to perform  
> classification
>     min_learns = 200;
>
>     backend = "redis";
>     autolearn = [-4, 10];
>     statfile {
>         symbol = "BAYES_HAM";
>         spam = false;
>     }
>     statfile {
>         symbol = "BAYES_SPAM";
>         spam = true;
>     }
> }
>
>
>
>
> Btw:
> rspamd 3.4
> Redis 5.0.14

Any hints? Looks like the problem is with BAYES_HAM tokens which seems  
to take 700+ MB.