spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Seeking advice re: SA score discrepancie

Re: Seeking advice re: SA score discrepancies

From: Karsten Bräckelmann <guenther_at_nospam>
Date: Mon Oct 18 2010 - 01:28:20 GMT
To: users@spamassassin.apache.org

On Sun, 2010-10-17 at 17:05 -0700, Jerry Pape wrote:
> At some time in the not too distant past, my otherwise reliable SA
> system has broken in an odd way.
>
> This example is characteristic of the problem:

Can't follow. It is broken, because SA itself reports something
different from an unrelated, third-party, stranger website?

If not, please feel free to explain what changed without pointing to
that source.

> x-spam-status reads: No, score=3.8 required=4.0
> tests=BAYES_40,HTML_IMAGE_RATIO_02,
> HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RDNS_NONE,URIBL_BLACK autolearn=no version=3.2.5
>
> Assessment of this header at
> http://www.futurequest.net/docs/SA/decode/ yields:

> BAYES_40 0.000 Bayesian spam probability is 20 to 40%
> HTML_IMAGE_RATIO_02 0.550 HTML has a low ratio of text to image area

That site uses SA 3.2.x, score set 1, network tests enabled, Bayes
disabled, as evidenced by the above two scores and confirmed by the
other scores. You clearly use score set 3, both network tests and Bayes
enabled.

Given there *is* a BAYES_xx rule in there, the site is broken and does
not evaluate correctly. No excuse for the site in this case. (It would
be different with "no network test hits", which is indistinguishable
from being disabled, without the scores.)

> Clearly 5.336 does not equal 3.8.

Clearly, that site does not know, neither detect correctly your score
set used.

> My SA is 3.2.5 in a default config except that I have set global score
> required to 4.0 with latest updates.

Yup, with Bayes enabled, the exact total score is 3.808.

What's off-setting all this is, that the Bayes Classifier based on its
training believes the mail to be hammy-ish, almost neutral -- while it
should, after appropriate training, classify it spammy, raising the
overall score.

-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}