spamassassin-users March 2012 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Help with blocking Chinese Spam

Re: Help with blocking Chinese Spam

From: David F. Skoll <dfs_at_nospam>
Date: Tue Mar 13 2012 - 13:14:10 GMT
To: <users@spamassassin.apache.org>

On Tue, 13 Mar 2012 12:40:16 +0000
Jenny Lee <bodycare_5@live.com> wrote:

> Will give this a go. What I don't understand is that... Why is this
> not catching this 'utf' which is on the subject?

You need the :raw tag to see the raw, unencoded header. The meta-rule:

    header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/

attempts to limit matches on UTF-8 subjects to Chinese characters
because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
ideographs. It's not a perfect filter, but blocking all UTF-8-encoded
subjects would yield way too many FPs for us.

Regards,

David.

PS: I haven't looked at SA's Bayes implementation. Can it handle
words in non-western character sets properly?