spamassassin-users October 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: (Non-) Capturing REs

Re: (Non-) Capturing REs

From: Adam Katz <antispam_at_nospam>
Date: Mon Oct 24 2011 - 20:58:54 GMT

On 10/23/2011 06:44 PM, Karsten Bräckelmann wrote:
> [...] as I read it, the warning is referring to the usage of the
> special $&, $` and $' match capturing variables, resulting in a
> substantial performance penalty -- and mentions the non-capturing
> extended regex in this *context*, since it uses the same mechanism
> for the $n matches. If these special vars are used.

Using special variables like those you mentioned are particularly bad,
especially with some of the older versions of perl (I seem to recall
some of them getting big performance boosts in more recent perl
revisions). That's not to say that the extra memory consumption from an
unnecessary grouping doesn't impact performance.

> Now, I just grepped the entire SA source code, and NONE of these
> spacial vars are used. Yay! (I did not grep all external SA
> dependencies, mind you.)

I'm guessing I'm not the only person that looks through the rules
periodically for such things, including frivolous portions like the glob
in /foo.*/ or the range in /bar\W{2,30}/ and wipe them out to become
e.g. /foo/ and /bar\W{2}/

> So, does this "substantial performance penalty" using capturing
> groups even apply to SA?
> Is it really worth it, religiously using non-capturing grouping?

From the profiling I've seen, yes it is. (I don't have data to share
though, sorry).