spamassassin-dev September 2011 archive
Main Archive Page > Month Archives  > spamassassin-dev archives
spamassassin-dev: [Bug 6667] New: bayes_use_hapaxes and and dubi

[Bug 6667] New: bayes_use_hapaxes and and dubious claim about database size

From: <bugzilla-daemon_at_nospam>
Date: Fri Sep 30 2011 - 15:02:57 GMT
To: dev@spamassassin.apache.org

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6667

             Bug #: 6667
           Summary: bayes_use_hapaxes and and dubious claim about database
                    size
           Product: Spamassassin
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: rwmaillists@googlemail.com
    Classification: Unclassified

In the Mail::SpamAssassin::Conf documentation we have

"bayes_use_hapaxes (default: 1)

Should the Bayesian classifier use hapaxes (words/tokens that occur only once)
when classifying? This produces significantly better hit-rates, but increases
database size by about a factor of 8 to 10."

Unless someone can come up with a good reason why the claim about database size
is true, I would suggest it be removed.

-- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.