spamassassin-users October 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: rule to catch subject spamming

Re: rule to catch subject spamming

From: Lawrence _at_nospam <_at_nospam>
Date: Sat Oct 23 2010 - 18:06:06 GMT
To: users@spamassassin.apache.org

On 23/10/2010 2:28 PM, Lawrence @ Rogers wrote:
> Hello all,
>
> I noticed recently that our users are getting spam with the subject
> similar to the following:
>
> SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet
>
> SpamAssassin seems to be having a hard time determining whether it is
> spam or not because it appears as one long word.
>
> In all cases, the subject contains no spaces (to prevent detection I
> would think) and is longer than 62 characters (not sure why they do
> this, but it is true in every sample I've seen so far).
>
> I would like to create a rule to pick up on this, but having a bit of
> difficult with the regex for the rule. This is what I've come up with
> so far
>
> header CR_SUBJECT_SPAMMY Subject =~ /.{62}/
> describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of
> characters, and no spaces)
> score CR_SUBJECT_SPAMMY 2.5
>
> I just need to modify the regex to check that the Subject contains no
> spaces.
>
> I've done some research, and the longest non-coined word in a major
> dictionary is 30 characters long, meaning that if it was used twice in
> a subject, the total length would still only be 60 characters, There
> may be some FPs if the sender used formatting like commas and such,
> but the possibility of them using 2 of the word, then formatting
> without spacing, would probably be extremely remote.
>
> Any assistance or advice would be greatly appreciated.
>
> Regards,
>
> Lawrence Williams
> LCWSoft
>

This is the rule I've come up with now

# Matches a new technique used by spammers in the Subject line
# Running a bunch of pornographic words together (with no spaces) to evade
# spam filters
# This rule tests for the Subject containing any numbers, letters, or
common formatting
# string must be at least 42 characters and contain no spaces

header CR_SUBJECT_SPAMMY Subject =~ /^[0-9a-zA-Z,.+]{42,}$/
describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of
characters, and no spaces)
score CR_SUBJECT_SPAMMY 3.5
tflags CR_SUBJECT_SPAMMY noautolearn