spamassassin-dev March 2011 archive
Main Archive Page > Month Archives  > spamassassin-dev archives
spamassassin-dev: Re: [SA-dev] Fwd: Re: Reproducing Bug 6559

Re: [SA-dev] Fwd: Re: Reproducing Bug 6559

From: Adam Katz <antispam_at_nospam>
Date: Mon Mar 21 2011 - 19:43:09 GMT
To: dev@spamassassin.apache.org

On 03/20/2011 08:44 PM, Karsten Bräckelmann forwarded From: Matt Elson
> I have no idea why, but it seems:
> \s proceeded by three or more characters and tflags multiple
> regularly hits the problem for me.

I don't have much experience with non-production re2c; how do I properly
reproduce (and therefore test) this bug on svn trunk?

I would want to try this, which should be a faster regex anyway:

/free\s[ptc](?:ill|ablet|ap(?:sule|let)s/i

I also wanted to try a leading word-break ("\b") in front of the regex,
though I don't know how many spams that will skip.

While looking at the PILL_PRICE rules,

body __PILL_PRICE_1
m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i

What is the point of leading with an optional piece? That regex is
identical to this simpler one:

m;[\d\s.]{3}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i

Another point; what if we merge _1 and _3 from

_1 m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i
_2 /(?:pill|tablet|cap(?:sule|let))s\s\$?[\d\s.]{3,8}/i
_3 /free\s(?:pill|tablet|cap(?:sule|let))s/i

into (note removal of _1's optional lead)

m;(?:[\d\s.]{3}(?:/|per|each)|free)\s?(?:pill|tablet|cap(?:sule|let));i

Matt already showed that disabling _1 and _2 didn't prevent the problem
with _3, so this isn't as much of a potential remedy as it initially
seems, but it should be slightly more efficient and might avoid the re2c
bug.