| Main Archive Page > Month Archives > spamassassin-users archives |
On Fri, 9 Mar 2012 16:38:49 +0100
Matus UHLAR - fantomas wrote:
> You can of course configure mailer to train automatically on anything
> received/delivered. However this would apparently cause much more
> FP's and FN's rate than letting user train only those that misfire.
The use of the word "apparently" never inspires much confidence. I'm
guessing that you don't have any real evidence.
> >If you're going to train on error then train on the right error, not
> >a rarer, correlated error.
>
> The only error that really matters is the one that causes misfiring.
No, it isn't. Bayes is a statistical filter it needs to learn a lot of
diverse spam and ham to reach it's optimum accuracy. It's been
demonstrated on Bogofilter that "train-on-everything" outperforms
"train-on-error" on the same corpora. They both end-up with similar
accuracy, but "train-on-everything" gets there very much faster.
Bogofilter is almost identical to BAYES; they just differ in the
details of the tokenizer and the Robinson parameters.
Training on SA miss-classification is going to be glacially slow.