[Rspamd-Users] Fwd: About Bayes & Fuzzy training & distribution?

pettai+rspamd at sunet.se pettai+rspamd at sunet.se
Fri Nov 12 14:47:21 UTC 2021

Hi list,

I’m wondering how much bayes ham/spam data rspamd needs to start using it, and then it becomes more/most “effective” ?
And what about the distribution of ham/spam, is there a need to aim at as close to 50% / 50% as possible?
And at what distribution does it become skewed / unhealthy? (And is there builtin logic to prevent this too?)
Some guesstimate numbers will do, I’m just want to understand how much/little effort is needed to for proper training…

(Btw it seems to do quite well without any Bayes training, but I’m guessing it’s partly because of using a few rbls including the rspamd,com fuzzy storage with the default shipped configuration...)

Another thing I can’t seem to understand, is if the Bayes training data is per recipient, or global? or both? 
There is a user counter is only showing 1 users, even thought a few recipient has pushed in a few emails as training data 


And about fuzzy training, I guess it doesn’t matter much then it comes to ham/spam distribution, as the fuzzy “logic" is about finding similarities?

(Any pointers to the documentation is welcome, if I missed it)


More information about the Users mailing list