[Rspamd-Users] Redis huge database
azurit at pobox.sk
azurit at pobox.sk
Wed Nov 23 19:23:09 UTC 2022
Citát azurit at pobox.sk:
> Citát Alexander Moisseev via Users <users at lists.rspamd.com>:
>
>> On 23.11.2022 10:31, azurit at pobox.sk wrote:
>>>
>>> Citát Alexander Moisseev via Users <users at lists.rspamd.com>:
>>>
>>>> On 22.11.2022 22:45, azurit at pobox.sk wrote:
>>>>> i'm having problems with Redis database - it's huge and getting
>>>>> bigger, no matter what i do. Redis is taking more and more
>>>>> memory. If i look at the keys i see about 500 000 keys with name
>>>>> similar to 'RS_1028729928929298385'. Is this normal?
>>>>>
>>>>
>>>> AFAIK no one has ever researched the correlation of number of
>>>> statistical tokens and quality of classification, but I guess
>>>> 0.5M keys may not be enough. Just for reference, I have about
>>>> 4.2M RS_* keys in the bayes database (per user classifier is no
>>>> enabled), used_memory_human:554.79M.
>>>
>>>
>>> My Redis database has almost 1 GB and Redis needs 4 GB of memory.
>>> With lower values, i'm getting this error from rspamd:
>>> Nov 17 00:13:56 server00 rspamd[4086]: <177d52>; lua;
>>> history_redis.lua:132: got error OOM command not allowed when used
>>> memory > 'maxmemory'. when writing history row: no value
>>>
>> The on-disk .rdb file is compressed, so 1GB is a relatively large database.
>>
>>> rspamd is the only service using this Redis instance.
>>>
>> As you store everything related to Rspamd in the single Redis
>> instance there is no easy way to determine how much database space
>> each module consumes, I'm afraid. Probably you need to to count the
>> number of keys matching patterns with redis-cli.
>> Also some excessive numbers (like stored fuzzy hashes, history
>> nrows, etc.) can indirectly indicate the source of the problem.
>
>
> Ok, here are the numbers:
>
> keys beginning with "RS_": 1007930
> keys beginning with "RR:": 27141
> keys beginning with "rs_first_": 895
> everything other: 37983
>
> I don't see any pattern in other keys, seems random, for example
> rrxc1p8tu5skays4a5gu63 (but lots of them begins with 'rr' like the
> one in the example).
>
> Other data:
>
> 127.0.0.1:6379> debug object BAYES_HAM
> Value at:0x7f836d192ab0 refcount:1 encoding:hashtable
> serializedlength:762292662 lru:8248068 lru_seconds_idle:43
>
> 127.0.0.1:6379> debug object BAYES_SPAM
> Value at:0x7f838bae6450 refcount:1 encoding:hashtable
> serializedlength:138423713 lru:8248136 lru_seconds_idle:11
>
> Configuration in /etc/rspamd/local.d/statistic.conf:
> classifier "bayes" {
> expire = 100d;
> new_schema = true;
> tokenizer {
> name = "osb";
> }
>
> # Minimum number of words required for statistics processing
> min_tokens = 11;
> # Minimum learn count for both spam and ham classes to perform
> classification
> min_learns = 200;
>
> backend = "redis";
> autolearn = [-4, 10];
> statfile {
> symbol = "BAYES_HAM";
> spam = false;
> }
> statfile {
> symbol = "BAYES_SPAM";
> spam = true;
> }
> }
>
>
>
>
> Btw:
> rspamd 3.4
> Redis 5.0.14
Any hints? Looks like the problem is with BAYES_HAM tokens which seems
to take 700+ MB.
More information about the Users
mailing list