spamassassin-users June 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: High Performance Bayes Database Configur

Re: High Performance Bayes Database Configuration?

From: Marc Perkel <support_at_nospam>
Date: Tue Jun 21 2011 - 14:30:51 GMT
To: "David F. Skoll" <dfs@roaringpenguin.com>

On 6/21/2011 7:23 AM, David F. Skoll wrote:
> On Tue, 21 Jun 2011 07:06:11 -0700
> Marc Perkel<support@junkemailfilter.com> wrote:
>
>> Trying to get MySQL bays working in a high volume environment.
>> Dedicated MySQL server with SSD drives. Can someone send me a sample
>> my.cnf file and make other suggestings to keep it running wihout
>> database corruption and other MySQL "features"? Or - should I be
>> using some other DB?
> We've tried various ways of storing Bayes data (we have our own Bayes
> implementation, so this discussion may not correspond exactly with the
> SA implementation.) After trying Berkeley DB files and PostgreSQL---we
> would never use MySQL for any data we care about---we finally settled
> on Dan Bernstein's CDB format. It has by far the best performance.
> See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
> Take a look at the "Random Reads" timings. CDB is 6 times faster than
> Berkeley DB!
>
> CDB is read-only, which means when you want to do Bayes training, you
> have to rewrite the entire database. This is not an issue for our
> system because of how we do Bayes training, but it may be an issue
> with the standard sa-learn.
>
> Regards,
>
> David.
>
>

Thanks David but I need real time updating and it's spread across
multiple servers. So need PostgreSQL or MySQL.

-- Marc Perkel - Sales/Support support@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400