spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Collecting IP reputation data from many

Re: Collecting IP reputation data from many people

From: David F. Skoll <dfs_at_nospam>
Date: Thu Oct 28 2010 - 16:06:11 GMT
To: users@spamassassin.apache.org

On Thu, 28 Oct 2010 11:19:50 -0400
Darxus@ChaosReigns.com wrote:

> Having nothing to prevent someone from registering millions of
> accounts and spewing data from a single IP is not acceptable to me.

Umm...

Perhaps you have heard of a recent phenomenon called "a botnet"? Just
what security do you think TCP really buys you?

And what kind of account registration do you envision that lets you
easily register "millions" of accounts?

> Sure, I'll post there. Although detecting malicious data is clearly
> a much lower priority for them than reducing bandwidth used. Which
> makes sense, since they're limiting account creation by charging
> money, and I wouldn't.

We have some users who report to us whom we do not charge. These are
MIMEDefang users that we know and trust, and who use our Perl client
library to report back.

Just what exactly is a "reputation" system anyway? When you want to find
out the reputation of something, you ask people you know and trust. You
don't stop random people in the street and ask them.

That's why I think it's folly to accept IP reputation submissions from people
with whom you have no trust relationship. They could be feeding you utter
garbage and you'd never know.

Hence, we restrict reports to people we know and trust and to our
customers. (We may not know and trust all of our CanIt customers, but
we have a reasonable level of trust in the reporting software. It
would take a fair bit of effort for one of our customers to try to
game the system.)

> And I'm not convinced there's a reason to conform to the rest of the
> RFC.

Apart from the fact that our system has been running in production for
many months, has collected billions of reports, collects >1000
reports/second on commodity hardware with practically no CPU overhead,
has been used to build DNSBL lists of 8 million+ machines, and has a
peer-reviewed RFC incorporating many suggestions from knowledgeable
experts in the field, no, I can't really think of a reason.

Regards,

David.