spamassassin-users March 2012 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Allowing IMAP users to train spam/ham

Re: Allowing IMAP users to train spam/ham

From: RW <rwmaillists_at_nospam>
Date: Sat Mar 10 2012 - 00:07:03 GMT
To: users@spamassassin.apache.org

On Fri, 9 Mar 2012 16:38:49 +0100
Matus UHLAR - fantomas wrote:

> You can of course configure mailer to train automatically on anything
> received/delivered. However this would apparently cause much more
> FP's and FN's rate than letting user train only those that misfire.

The use of the word "apparently" never inspires much confidence. I'm
guessing that you don't have any real evidence.

> >If you're going to train on error then train on the right error, not
> >a rarer, correlated error.
>
> The only error that really matters is the one that causes misfiring.

No, it isn't. Bayes is a statistical filter it needs to learn a lot of
diverse spam and ham to reach it's optimum accuracy. It's been
demonstrated on Bogofilter that "train-on-everything" outperforms
"train-on-error" on the same corpora. They both end-up with similar
accuracy, but "train-on-everything" gets there very much faster.
Bogofilter is almost identical to BAYES; they just differ in the
details of the tokenizer and the Robinson parameters.

Training on SA miss-classification is going to be glacially slow.