spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Collecting IP reputation data from many

Re: Collecting IP reputation data from many people

From: <Darxus_at_nospam>
Date: Thu Oct 28 2010 - 16:43:51 GMT
To: users@spamassassin.apache.org

On 10/28, David F. Skoll wrote:
> Perhaps you have heard of a recent phenomenon called "a botnet"? Just
> what security do you think TCP really buys you?

Requiring them to use the botnet.

> And what kind of account registration do you envision that lets you
> easily register "millions" of accounts?

Free. Unrestricted.

> We have some users who report to us whom we do not charge. These are
> MIMEDefang users that we know and trust, and who use our Perl client
> library to report back.

Right, so you're doing something quite different.

> That's why I think it's folly to accept IP reputation submissions from people
> with whom you have no trust relationship. They could be feeding you utter
> garbage and you'd never know.

Yeah, that's the primary problem with what I was talking about. As I said.
The reason I posted about it. I think it might be possible to get useful
data out of it. It would probably be challenging.

Which is precisely why I feel it is absolutely necessary to prevent the
sender IP forging which UDP allows.

> Hence, we restrict reports to people we know and trust and to our
> customers. (We may not know and trust all of our CanIt customers, but
> we have a reasonable level of trust in the reporting software. It
> would take a fair bit of effort for one of our customers to try to
> game the system.)

And that's great for you, but not for people who aren't paying you.

> Apart from the fact that our system has been running in production for
> many months, has collected billions of reports, collects >1000
> reports/second on commodity hardware with practically no CPU overhead,
> has been used to build DNSBL lists of 8 million+ machines, and has a
> peer-reviewed RFC incorporating many suggestions from knowledgeable
> experts in the field, no, I can't really think of a reason.

So if I just open a socket, dump over the IP, whether it's ham or spam, and
maybe a protocol version, it just won't work huh?

That RFC is a great checklist. But I really don't see a reason to conform
to it.

On 10/28, David F. Skoll wrote:
> On a somewhat less sarcastic note: One reason we didn't use TCP is that
> it simply doesn't scale. If you have clients that open a TCP connection,
> do a report, and then close the TCP connection, there's a huge bandwidth
> penalty. On the other hand, if your clients maintain persistent TCP
> connections, your server is going to run out of sockets rather quickly.

I expect scaling to be much more of an issue with your reputation system
than the free system I've been talking about. And if I'm wrong, I hope
others will donate server resources. As has happened with similar
projects.

Also, sender IP forging.

> Remember, our system is designed to scale to tens or hundreds of thousands
> of reporting systems sending tens or hundreds of thousands of reports
> per second.

That's great. And not what I expect to do.

On 10/28, Lawrence @ Rogers wrote:
> What reporting system do you use? and how does one avail of the data
> it provides?

http://www.roaringpenguin.com/products/canit-reputation-rbl

Pay them.

-- "Government is not reason, it is not eloquence, it is force; like fire, a troublesome servant and a fearful master. Never for a moment should it be left to irresponsible action." - George Washington http://www.ChaosReigns.com