1

I have a regex that works perfectly with pcregrep:

pcregrep -M '([a-zA-Z0-9_&*]+)(\(+)([a-zA-Z0-9_ &\*]+)(\)+)(\n)(\{)'

Now I tried to include this regex in my C++ code but it does not match (escapes included):

char const *regex = "([a-zA-Z0-9_&*]+)\\(+([a-zA-Z0-9_ &\\*]+)\\)+(?>\n+)\\{+";
re = pcre_compile(regex, PCRE_MULTILINE, &error, &erroffset, 0);

I'm trying to find function bodies like this (the paragraph is 0a in hex):

my_function(char *str)
{

Why does it work with pcregrep and not within the C++ code?

JohnnyFromBF
  • 9,873
  • 10
  • 45
  • 59
  • Which compiler version are you using? Full regex support is available for e.g. GCC 4.9 or newer. – πάντα ῥεῖ Aug 10 '15 at 14:15
  • g++ 4.9.2-10ubuntu13 – JohnnyFromBF Aug 10 '15 at 14:17
  • I've never seen `(?>\n+)` in a regular expression. What does `?>` mean? Why do you use it in your C++ code but not in pcregrep? – cpplearner Aug 10 '15 at 14:46
  • [The (?> is called an atomic group, which basically means that PCRE will not backtrack if it fails to match it.](https://nikic.github.io/2011/12/10/PCRE-and-newlines.html) I could use (\r+) aswell. But (\n+) does not work, though the paragraph is `0a`. There's some problem concerning multiline I guess. – JohnnyFromBF Aug 10 '15 at 14:51

1 Answers1

1

Your first regex:

 ( [a-zA-Z0-9_&*]+ )           # (1)
 ( \(+ )                       # (2)
 ( [a-zA-Z0-9_ &\*]+ )         # (3)
 ( \)+ )                       # (4)
 ( \n )                        # (5)
 ( \{ )                        # (6)

Your second regex:

 ( [a-zA-Z0-9_&*]+ )           # (1)
 \(+
 ( [a-zA-Z0-9_ &\*]+ )         # (2)
 \)+
 (?> \n+ )
 \{+

Other than different capture groups and an unnecessary atomic group (?>)

there is one thing that is obviously different:

The last newline and curly brace in the second regex have + quantifiers.
But that's 1 or more, so I think the first regex would be a subset of the second.

The un-obvious difference is that it is unknown if the files were opened in translated mode or not.

You can usually cover all cases with \r?\n in place of \n.
(or even (?:\r?\n|\r) ).

So, if you want to quantify the linebreak, it would be (?:\r?\n)+ or (?:\r?\n|\r)+.

The other option might be to try the linebreak construct (I think its \R)
instead (available on the newest versions of pcre).

If that doesn't work, it's something else.