spamassassin-users March 2012 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: RE: Help with blocking Chinese Spam

RE: Help with blocking Chinese Spam

From: Jenny Lee <bodycare_5_at_nospam>
Date: Thu Mar 15 2012 - 22:31:29 GMT
To: <users@spamassassin.apache.org>

Well, it is not easy to quote properly from hotmail. Excuse my mess up and top posting.
 
Bottom line is... I got rid of this chinese crap.
 
Thank you all for the help SA users.
 
Jenny

---------
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 12:40:16 +0000
> Jenny Lee <bodycare_5@live.com> wrote:
>
> > Will give this a go. What I don't understand is that... Why is this
> > not catching this 'utf' which is on the subject?
>
> You need the :raw tag to see the raw, unencoded header. The meta-rule:
>
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
>
> attempts to limit matches on UTF-8 subjects to Chinese characters
> because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
> ideographs. It's not a perfect filter, but blocking all UTF-8-encoded
> subjects would yield way too many FPs for us.
>
> Regards,
>
> David.
>
> PS: I haven't looked at SA's Bayes implementation. Can it handle
> words in non-western character sets properly?

Thank you David, Jared and Jari.

Adding:
Subject:raw =~/=\?utf-8\?B/i
Subject =~ /[\xe4-\xe9]/

caused this crap get caught. Both works, so I will keep David's advice.

So I think I will just remove this TexCat plugin which does not identify it properly.

This is great list, thanks again for everyone. All help appreciated.

Jenny