spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: New plugin: DecodeShortURLs

Re: New plugin: DecodeShortURLs

From: Brent Gardner <brent.gardner_at_nospam>
Date: Tue Oct 05 2010 - 21:08:05 GMT
To: users@spamassassin.apache.org

René Berber wrote:
> On 10/5/2010 3:42 PM, Yet Another Ninja wrote:
>
>
>> On 2010-10-05 22:35, Brent Gardner wrote:
>>
>
> [snip]
>
>>> Using URLs like these:
>>>
>>> http://goo.gl/foo
>>> http://bit.ly/foo
>>> http://2chap.it/foo
>>>
>>> I consistently hit on these rules:
>>>
>>> HAS_SHORT_URL
>>> SHORT_URL_404
>>> SHORT_URL_CHAINED
>>> SHORT_URL_LOOP
>>> SHORT_URL_MAXCHAIN
>>>
>>>
>>> I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why
>>> is -every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP,
>>> SHORT_URL_MAXCHAIN?
>>>
>> I bet *none* of the /foo targets exist.
>> Could that be confusing the plugin when /foo redirects back to "home"
>> Steve?
>>
>
> Brent can see in /tmp/DecodeShortURLs.txt if that was the case (i.e. the
> file shows the mapping found between the short link and the long one).
> Of course this is only if he didn't change the original .cf's
> url_shortener_log .
>
Here's the contents of /tmp/DecodeShortURLs.txt so far:

[1286308657] http://2chap.it/foo => http://2chap.it
[1286308914] http://bit.ly/foo => sdadsa
[1286309776] http://goo.gl/l6MS =>
http://googleblog.blogspot.com/2009/12/making-urls-shorter-for-google-toolbar.html
[1286309866] http://tinyurl.com/2vw3t8j =>
http://www.google.com/search?q=android+url+shortener&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
[1286309979] http://bit.ly/3hDSUb => http://www.example.com/
[1286310537] http://bit.ly/3hDSUb => http://www.example.com/

Of course, I didn't expect the /foo URLs to exist, but I didn't have any
live data to test with. I found the other URLs listed by googling.
They all act the same, hitting on all 5 rules listed above.

Brent Gardner