[Rspamd-Users] Fwd: About Bayes & Fuzzy training & distribution?

pettai+rspamd at sunet.se pettai+rspamd at sunet.se
Fri Nov 12 14:47:21 UTC 2021


Hi list,

I’m wondering how much bayes ham/spam data rspamd needs to start using it, and then it becomes more/most “effective” ?
And what about the distribution of ham/spam, is there a need to aim at as close to 50% / 50% as possible?
And at what distribution does it become skewed / unhealthy? (And is there builtin logic to prevent this too?)
Some guesstimate numbers will do, I’m just want to understand how much/little effort is needed to for proper training…

(Btw it seems to do quite well without any Bayes training, but I’m guessing it’s partly because of using a few rbls including the rspamd,com fuzzy storage with the default shipped configuration...)

Another thing I can’t seem to understand, is if the Bayes training data is per recipient, or global? or both? 
There is a user counter is only showing 1 users, even thought a few recipient has pushed in a few emails as training data 

[…]
"total_learns":127,
"statfiles”:[
{"revision":127,"used":0,"total":0,"size":0,"symbol":"BAYES_SPAM","type":"redis","languages":0,"users":1},{"revision":0,"used":0,"total":0,"size":0,"symbol":"BAYES_HAM","type":"redis","languages":0,"users":0}],

And about fuzzy training, I guess it doesn’t matter much then it comes to ham/spam distribution, as the fuzzy “logic" is about finding similarities?

(Any pointers to the documentation is welcome, if I missed it)

Thx,
/P




More information about the Users mailing list