spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: New plugin: DecodeShortURLs

Re: New plugin: DecodeShortURLs

From: Yet Another Ninja <sa-list_at_nospam>
Date: Tue Oct 05 2010 - 20:42:51 GMT
To: users@spamassassin.apache.org

On 2010-10-05 22:35, Brent Gardner wrote:
> Steve Freegard wrote:
>> Hi All,
>>
>> On 17/09/10 14:11, Steve Freegard wrote:
>>> Hi All,
>>>
>>> Recently I've been getting a bit of filter-bleed from a bunch of spams
>>> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
>>> that upon closer inspection would have been rejected with a high score
>>> if the real URL had been used.
>>>
>>> To that end - it annoyed me enough to write a plug-in that decodes the
>>> shortened URL using an HTTP HEAD request to extract the location header
>>> sent by the shortening service and to put this into the list of
>>> extracted URIs for other plug-ins to find (such as URIDNSBL).
>>>
>>> On the messages I tested it with - it raised the scores from <5 to >10
>>> based on URIDNSBL hits which is just what I wanted.
>>>
>>> Hopefully it will be useful to others; you can grab it from:
>>>
>>> http://www.fsl.com/support/DecodeShortURLs.pm
>>> http://www.fsl.com/support/DecodeShortURLs.cf
>>>
>>
>> I've just put up a new version at the above URLs (v0.3) which adds the
>> following new features:
>>
>> - Now follows 'chained' short URLs (e.g. shortURL -> shortURL -> real)
>>
>> When chained URLs are detected the rule 'SHORT_URL_CHAINED' is fired.
>> If a chained loop is detected the rule 'SHORT_URL_LOOP' is fired.
>> If more than 10 chained URLs are found 'SHORT_URL_MAXCHAIN' is fired
>> and no further redirections are checked.
>>
>> - If the shortener returns 404 (e.g. not found) for the short URL then
>> 'SHORT_URL_404' is fired.
>>
>> - Prevent amavis from die'ing on eval block tests by adding "local
>> $SIG{'__DIE__'} to each block.
>>
>> - Added option to allow logging to syslog (mail.info).
>>
>> Kind regards,
>> Steve.
>>
> I've been testing this plugin, version 0.5. I'm running SpamAssassin
> v3.2.5 on CentOS v5.5 32-bit, Perl v5.8.8. I've been testing using a
> test message and changing out the URLs it contains.
>
> Using URLs like these:
>
> http://goo.gl/foo
> http://bit.ly/foo
> http://2chap.it/foo
>
> I consistently hit on these rules:
>
> HAS_SHORT_URL
> SHORT_URL_404
> SHORT_URL_CHAINED
> SHORT_URL_LOOP
> SHORT_URL_MAXCHAIN
>
>
> I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why is
> -every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP, SHORT_URL_MAXCHAIN?

I bet *none* of the /foo targets exist.
Could that be confusing the plugin when /foo redirects back to "home"
Steve?