spamassassin-users October 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: New Bayes like paradigm

Re: New Bayes like paradigm

From: Marc Perkel <support_at_nospam>
Date: Thu Oct 13 2011 - 14:34:28 GMT

On 10/10/2011 9:16 AM, wrote:
> On 10/10, Marc Perkel wrote:
>> On 9/28/2011 8:02 AM, wrote:
>>> On 09/28, Marc Perkel wrote:
>>>> You would only have to test the rule combinations that the message
>>>> actually triggered. So if it hit 10 rules then it would be 1024
>>>> combinations. Seems not to be unreasonable to me.
>>> You definitely have a good point that it would only be necessary to track
>>> the combinations that actually show up in emails, however 1024 is only
>>> the possible combinations from one set of 10 rules. The number of
>>> combinations in the actual corpora would be much higher. I'll try to
>>> get you a number.
>> You wouldn't have to store all combinations. You could just do up to
>> 3 levels and only the combinations that actually occur and use a
>> hash to look up the combinations.
> I never said storage would be a problem. I agree you could just store a
> relatively small number that were most useful.
> The problems are:
> 1) The many years it would take to find useful rule combinations by trying
> one possibility per masscheck run.
> 2) The hundreds of times as much (masscheck) data we'd need to get an
> accurate re-score using all rule combinations existing in the corpora.
> There is still the possibility of doing an analysis of what combinations of
> rules hit false-negatives significantly more often than they hit non-spam.
> (Or false-positives vs. spam.)

I suppose it seems to me that there should be some automated way to find
useful rule combinations.

-- Marc Perkel - Sales/Support Junk Email Filter dot com 415-992-3400