spamassassin-dev September 2011 archive
Main Archive Page > Month Archives  > spamassassin-dev archives
spamassassin-dev: [Bug 6649] sa-compile fails on SOUGHT rule wit

[Bug 6649] sa-compile fails on SOUGHT rule with re2c: error: line 207, column 2: unterminated string constant (missing ")

From: <bugzilla-daemon_at_nospam>
Date: Thu Sep 22 2011 - 20:19:37 GMT
To: dev@spamassassin.apache.org

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6649

Mark Martinec <Mark.Martinec@ijs.si> changed:

           What |Removed |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined |3.4.0

--- Comment #16 from Mark Martinec <Mark.Martinec@ijs.si> 2011-09-22 20:19:37 UTC ---
I'm beginning to understand what may have happened.

> \x{bf}\x{01}d\x{e6}...
> See where the original rule says z\x{bf}\x{01} the scanner2.re file
> should say z<BF><01>, but it says z<BF><newline>

See \x{01}d - note that \xd is a code for a newline.

> note that \x{e4}2 was transformed into a "

Cannot yet fully explain this, but note that \x22 is a code for "

Seems like the trouble originates from how the output
of a "perl -Mre=debug" command is being parsed by
Mail::SpamAssassin::Plugin::BodyRuleBaseExtractor::extract_hints().

The extract_hints() has to deal with truncted regexp debug output
(lines with "..." at their end). The RE debug output of perl has
changed subtly between 5.8 and 5.10:

  perl 5.8: <xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...>(18)
  perl 5.10, 5.12, 5.14: <xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>... (18)

I'm not sure this is still being properly handled.
A side effect may be that a \x{dd} in a regexp may be arbitrarily
chopped for long strings, and an \x{ may join later with some
text that follows. I do not fully understand what happens next
and how to properly fix it.

As a small hardening measure, I added escaping of NL and NULL
characters, following a suggestion in Comment 4. It does not solve
the origin of the trouble, but at least it limits the damage.

I'm also not sure that re2c knows how to handle null characters
(escaped or raw) in regexp strings in a scanner*.re source.
There are several of these in current SOUGHT rules.

trunk:
  Bug 6649: sa-compile fails on SOUGHT rule with re2c: unterminated
  string constant - protect special characters, some debuggings aids,
  perl -Mre=debug changed its output format with perl 5.10
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending lib/Mail/SpamAssassin/Util.pm
Sending sa-compile.raw
Committed revision 1174349.

-- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.