spamassassin-dev September 2011 archive
Main Archive Page > Month Archives  > spamassassin-dev archives
spamassassin-dev: Re: High ham rate in darxus corpora for URIBL_

Re: High ham rate in darxus corpora for URIBL_WS_SURBL Re: ham scores

From: Kris Deugau <kdeugau_at_nospam>
Date: Tue Sep 20 2011 - 14:45:28 GMT
To: dev@spamassassin.apache.org

darxus@chaosreigns.com wrote:
> On 09/20, Axb wrote:
>> from what I'm seeing:
>>
>> livejournal.com is in 20_aux_tlds.cf
>>
>> util_rb_2tld livejournal.com
>
> I saw that, but didn't think it was relevant. How is it relevant? It also
> doesn't seem like it makes sense. "2TLDs include things like co.uk,
> fed.us, etc." Livejournal.com isn't one of those.

util_rb_2tld is intended to let SA process URIs as a more general case
of the strict 2tlds you mention: while livejournal.com itself shouldn't
be listed, it's quite possible that some spammer has set up a
LiveJournal account, and therefore links to <spammer>.livejournal.com -
which *should* (and does) get listed.

As for uridnsbl_skip_domain, as far as I can tell, it only blocks
lookups on an exact match, not *.example.com. Since none of LJ's
subdomains are listed in a uridnsbl_skip_domain segment, and because
livejournal.com is listed as a 2tld, those subdomains are looked up in
all available URI blacklists.

If you're getting hits on those legitimate subdomains, then either the
blacklist briefly listed them incorrectly, your upstream DNS resolver is
playing games with DNS responses, or they may have exceeded the query
limits and are getting "everything is listed" responses back.

-kgd