spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Seeking advice re: SA score discrepancie

Re: Seeking advice re: SA score discrepancies

From: Jerry Pape <jpape_at_nospam>
Date: Mon Oct 18 2010 - 05:43:23 GMT
To: users@spamassassin.apache.org

  Oops, further investigation indicates that Bayes is "on"--thought the
default was "off" for my config. I would be inclined to turn it off as I
have no decent way of teaching it beyond mass-config into the
future--please advise.

JP

On 10/17/10 10:37 PM, Jerry Pape wrote:
> Wow, I am grateful for the prompt answers, but I must say they have
> confused me.
>
> Bayes should not be on in my config and subsequent check of the GUI
> says its not--this may be wrong.
>
> Further, what are the "scoreset" indexes?
>
> I don't use Bayes because all of my clients are POP mail and they are
> neither smart|committed enough to mail back ham/spam to educate the
> system.
>
> Additionally, when I used Bayes way back when (without manual
> population) and simply allowed auto-population to occur, I ended up
> with enormous
> .spamassassin sub-files that rapidly eclipsed 50% of the client's disk
> quota.
>
> I am certain that I am missing critical configurational understanding
> and optimizations, but
> until your lot kindly educates me--it is what it is and my initial
> dilemma remains unresolved.
>
> JP
>
> On 10/17/10 7:01 PM, John Hardin wrote:
>> On Sun, 17 Oct 2010, Jerry Pape wrote:
>>
>>> [Not sure if this is the right place to send this--please correct me
>>> if I am in error]
>>
>> This is the place.
>>
>>> Assessment of this header at
>>> http://www.futurequest.net/docs/SA/decode/ yields:
>>>
>>> Test Score Description
>>> BAYES_40 0.000 Bayesian spam probability is 20 to 40%
>>> HTML_IMAGE_RATIO_02 0.550 HTML has a low ratio of text to
>>> image area
>>> HTML_MESSAGE 0.001 HTML included in message
>>> HTML_MIME_NO_HTML_TAG 1.052 HTML-only message, but there is
>>> no HTML tag
>>> MIME_HTML_ONLY 1.672 Message only has text/html MIME parts
>>> RDNS_NONE 0.100 Delivered to trusted network by a host with
>>> no rDNS
>>> URIBL_BLACK 1.961 Contains an URL listed in the URIBL blacklist
>>> Total: 5.336
>>>
>>> Clearly 5.336 does not equal 3.8.
>>
>> There are four score sets to choose from based on what options you
>> have enabled. The above is for scoreset 2, no BAYES + net tests.
>> Scoreset 3, BAYES + net tests, gives:
>>
>> HTML_MIME_NO_HTML_TAG 0.097
>> MIME_HTML_ONLY_MULTI 0.001
>> HTML_IMAGE_RATIO_02 0.383
>> HTML_MESSAGE 0.001
>> MIME_HTML_ONLY 1.457
>> BAYES_40 -0.185
>> URIBL_BLACK 1.955
>> RDNS_NONE 0.1
>> -------
>> 3.809
>>
>> These are all of the default scores, and match what you're seeing.
>>
>>> I have no idea how to regress and resolve this problem.
>>
>> First off, you need to review your Bayes training. An obviously
>> spammy message shouldn't be hitting BAYES_40. Properly-trained Bayes,
>> hitting BAYES_99, would have scored 7.494 on that message.
>>
>> For analysis in general...
>>
>> This will put the individual rule scores into the headers:
>>
>> add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_
>> tests=_TESTSSCORES_ autolearn=_AUTOLEARN_ version=_VERSION_"
>>
>> "spamassassin --debug area=rules <test_msg_file" is often helpful.
>>
>> However:
>>
>> The nature of spam changes over time. 3.2, which is only getting
>> critical bug fixes now, will become steadily less effective the more
>> time passes and the spammers evolve new tricks. It's getting to the
>> point that you should really consider upgrading to the latest 3.3
>> release.
>>
>
>