spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: List of urls

Re: List of urls

From: Karsten Bräckelmann <guenther_at_nospam>
Date: Tue Oct 26 2010 - 16:47:16 GMT
To: users@spamassassin.apache.org

On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:
> For your question, why dont you regexp it?
>
> uri url_1 /www.domain(1|2|3|4).com/
>
> The exact regexp is naturally depending on the domains but you dont need a
> seperate check for all.

One way to consolidate them, yes -- depending on the nature of the
strings to match it can be very intuitive and natural.

The other technique you can use are meta rules, together with
non-scoring sub-rules to prevent the individual parts from scoring
(default of 1, if not set explicitly).

  uri __MY_BL_001 /example.(com|net)/
  uri __MY_BL_002 /example.org/

  meta MY_BL __MY_BL_001 || __MY_BL_002
  score MY_BL 10.0

Note though, that the above uri matches are not sufficiently strict
(similar to the OPs example) and might result in FPs.

The dot in an RE matches any char, and must be escaped to match a
literal dot. Also, the REs should be anchored, either at the left or
right end, to prevent possibly matching innocent bystanders. Since
parsed URIs are guaranteed to have a protocol (pre-pended by SA, if
none), this would be much more safe than the simple example above.

  uri __MY_BL_000 m~^https?://(www\.)?example\.org(/|$)~

It is anchored at the beginning of the URI, allows an optional "www"
host name, and is anchored at the end to further prevent FPs. Oh, and it
also uses m// with an alternative delimiter, so I don't have to escape
the slash in the RE.

How strict you want your uri rule REs depends on your level of paranoia
and the domains to match.

> The best to handle domains is putting them in a small rbl, or get them
> added to a existing rbl.

Well, it certainly depends on the amount of URIs, and how frequently the
list may change. SA config is not suitable for frequent changes, but
would be way easier to set up than a local RBL, if the list isn't too
large and mostly static.

Adding to existing URI DNSBLs isn't always an option, btw. URL
shorteners may have a place in severely size-constrained messages of
sorts, but have no business in mail. They won't be blacklisted by the
mayor players out there, though. ;)

-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}