spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Seeking advice re: SA score discrepancie

Re: Seeking advice re: SA score discrepancies

From: Jerry Pape <jpape_at_nospam>
Date: Mon Oct 18 2010 - 05:37:23 GMT
To: users@spamassassin.apache.org

  Wow, I am grateful for the prompt answers, but I must say they have
confused me.

Bayes should not be on in my config and subsequent check of the GUI says
its not--this may be wrong.

Further, what are the "scoreset" indexes?

I don't use Bayes because all of my clients are POP mail and they are
neither smart|committed enough to mail back ham/spam to educate the system.

Additionally, when I used Bayes way back when (without manual
population) and simply allowed auto-population to occur, I ended up with
enormous
.spamassassin sub-files that rapidly eclipsed 50% of the client's disk
quota.

I am certain that I am missing critical configurational understanding
and optimizations, but
until your lot kindly educates me--it is what it is and my initial
dilemma remains unresolved.

JP

On 10/17/10 7:01 PM, John Hardin wrote:
> On Sun, 17 Oct 2010, Jerry Pape wrote:
>
>> [Not sure if this is the right place to send this--please correct me
>> if I am in error]
>
> This is the place.
>
>> Assessment of this header at
>> http://www.futurequest.net/docs/SA/decode/ yields:
>>
>> Test Score Description
>> BAYES_40 0.000 Bayesian spam probability is 20 to 40%
>> HTML_IMAGE_RATIO_02 0.550 HTML has a low ratio of text to
>> image area
>> HTML_MESSAGE 0.001 HTML included in message
>> HTML_MIME_NO_HTML_TAG 1.052 HTML-only message, but there is
>> no HTML tag
>> MIME_HTML_ONLY 1.672 Message only has text/html MIME parts
>> RDNS_NONE 0.100 Delivered to trusted network by a host with
>> no rDNS
>> URIBL_BLACK 1.961 Contains an URL listed in the URIBL blacklist
>> Total: 5.336
>>
>> Clearly 5.336 does not equal 3.8.
>
> There are four score sets to choose from based on what options you
> have enabled. The above is for scoreset 2, no BAYES + net tests.
> Scoreset 3, BAYES + net tests, gives:
>
> HTML_MIME_NO_HTML_TAG 0.097
> MIME_HTML_ONLY_MULTI 0.001
> HTML_IMAGE_RATIO_02 0.383
> HTML_MESSAGE 0.001
> MIME_HTML_ONLY 1.457
> BAYES_40 -0.185
> URIBL_BLACK 1.955
> RDNS_NONE 0.1
> -------
> 3.809
>
> These are all of the default scores, and match what you're seeing.
>
>> I have no idea how to regress and resolve this problem.
>
> First off, you need to review your Bayes training. An obviously spammy
> message shouldn't be hitting BAYES_40. Properly-trained Bayes, hitting
> BAYES_99, would have scored 7.494 on that message.
>
> For analysis in general...
>
> This will put the individual rule scores into the headers:
>
> add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_
> tests=_TESTSSCORES_ autolearn=_AUTOLEARN_ version=_VERSION_"
>
> "spamassassin --debug area=rules <test_msg_file" is often helpful.
>
> However:
>
> The nature of spam changes over time. 3.2, which is only getting
> critical bug fixes now, will become steadily less effective the more
> time passes and the spammers evolve new tricks. It's getting to the
> point that you should really consider upgrading to the latest 3.3
> release.
>