spamassassin-users April 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: One thing about bug 6558

Re: One thing about bug 6558

From: Kris Deugau <kdeugau_at_nospam>
Date: Fri Apr 01 2011 - 20:08:02 GMT
To: spamassassin-users <users@spamassassin.apache.org>

Mark Martinec wrote:
> When we started using SpamAssassin years back our bayes and awl
> databases were on a Berkeley DB. This worked reasonably well (sharing
> your opinion on being 'occasionally flaky'), but the auto-expiration
> long times started to grow from minutes to hours. Initially this was
> solved by turning off opportunistic auto-expiry and running it
> explicitly periodically. A long auto-expiry run could bog down mail
> processing for a good part of an hour or more, collecting a large
> backlog in a mail queue.

I found BDB worked pretty well with a midnight-ish expiry up to ~600
accounts; bayes_expiry_max_db_size was set to 600k and expiry was
dropping ~60k tokens daily. I don't recall what the expiry time was but
I'm sure it wasn't more than 10-15 minutes at most. Around that time
that server became a legacy machine and customer cancellations slowly
dropped the number of accounts back down.

> So we finally gave up on using a Berkeley DB for bayes and
> switched to MySQL - and what a relief that was! Opportunistic
> auto-expire could be used again in real time, and the whole mail
> sytem could breath again. Well - occasionally the MyISAM -type
> database would enter into an unusable state, where SpamAssasin
> would still appear to be running normally, but bayes would not be
> returning sensible results. The solution was to run an occasional
> database repair, which would make things right again for a week
> or two.

I haven't seen corruption beyond anything that can just be attributed to
mis(auto)learning; can you expand on what you saw and what you did here?

> Life was beautiful - until somehow this SQL solution started to become
> slow, and tweaking and cleaning a database did not help. I'm not
> sure what exactly happened, but even starting with a new scratch
> database soon lost its speed. Not seeing any obvious solution, we tried
> switching to PostgreSQL - and stayed there ever since, never looking back!
> The switch was made somewhere in the 8.3 version of a PostgreSQL
> server, but now we are running a 9.0.3 (on a FreeBSD) and are very happy
> with it (along with a SpamAssassin from trunk - to become a 3.4).

We switched from MyISAM on physical disk to MyISAM on RAMdisk. Since
we're running a global Bayes setup, ~2G of "disk" is plenty. A little
tweaking of the MySQL init script makes sure the tmpfs is mounted, and
then loads a backup dump of the database on startup. The effort
involved in setting up a physical-disk storage system that can handle
the I/O load is too high at the moment.

Are you running per-user Bayes DBs?

> To put things into perspective, our user base is about 1000 users, so
> my experience does not necessarily translate to large ISPs, or to SOHO.

*nod* We're filtering ~13k of ~50k users, but we're also filtering all
outbound mail.

On my personal server, I'm still using per-user BDB Bayes; I've seen no
reason to switch. There are all of about 3 live users, after all...

-kgd