spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: New plugin: DecodeShortURLs

Re: New plugin: DecodeShortURLs

From: Brent Gardner <brent.gardner_at_nospam>
Date: Tue Oct 05 2010 - 20:35:19 GMT
To: users@spamassassin.apache.org

Steve Freegard wrote:
> Hi All,
>
> On 17/09/10 14:11, Steve Freegard wrote:
>> Hi All,
>>
>> Recently I've been getting a bit of filter-bleed from a bunch of spams
>> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
>> that upon closer inspection would have been rejected with a high score
>> if the real URL had been used.
>>
>> To that end - it annoyed me enough to write a plug-in that decodes the
>> shortened URL using an HTTP HEAD request to extract the location header
>> sent by the shortening service and to put this into the list of
>> extracted URIs for other plug-ins to find (such as URIDNSBL).
>>
>> On the messages I tested it with - it raised the scores from <5 to >10
>> based on URIDNSBL hits which is just what I wanted.
>>
>> Hopefully it will be useful to others; you can grab it from:
>>
>> http://www.fsl.com/support/DecodeShortURLs.pm
>> http://www.fsl.com/support/DecodeShortURLs.cf
>>
>
> I've just put up a new version at the above URLs (v0.3) which adds the
> following new features:
>
> - Now follows 'chained' short URLs (e.g. shortURL -> shortURL -> real)
>
> When chained URLs are detected the rule 'SHORT_URL_CHAINED' is fired.
> If a chained loop is detected the rule 'SHORT_URL_LOOP' is fired.
> If more than 10 chained URLs are found 'SHORT_URL_MAXCHAIN' is fired
> and no further redirections are checked.
>
> - If the shortener returns 404 (e.g. not found) for the short URL then
> 'SHORT_URL_404' is fired.
>
> - Prevent amavis from die'ing on eval block tests by adding "local
> $SIG{'__DIE__'} to each block.
>
> - Added option to allow logging to syslog (mail.info).
>
> Kind regards,
> Steve.
>
I've been testing this plugin, version 0.5. I'm running SpamAssassin
v3.2.5 on CentOS v5.5 32-bit, Perl v5.8.8. I've been testing using a
test message and changing out the URLs it contains.

Using URLs like these:

http://goo.gl/foo
http://bit.ly/foo
http://2chap.it/foo

I consistently hit on these rules:

HAS_SHORT_URL
SHORT_URL_404
SHORT_URL_CHAINED
SHORT_URL_LOOP
SHORT_URL_MAXCHAIN

I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why is
-every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP, SHORT_URL_MAXCHAIN?

Brent Gardner