3

I am trying to get function content (body) if the function's name matches a defined pattern

what I tried so far:

(Step1) get with a recursion all function bodies in a define C file {(?:[^{}]+|(?R))*+}

(Step2) find all matches of wanted function' s name

(Step3) Combine both steps. This where I am struggling

Input

TASK(arg1)
{
    if (cond)
    {
      /* Comment */
      function_call();
      if(condIsTrue)
      {
         DoSomethingelse();
      }
    }
    if (cond1)
    {
      /* Comment */
      function_call1();
    }
}


void FunctionIDoNotWant(void)
{
    if (cond)
    {
      /* Comment */
      function_call();
    }
    if (cond1)
    {
      /* Comment */
      function_call1();
    }
}

I am looking for the function TASK. When I add the regex to match TASK in front of "{(?:[^{}]+|(?R))*+}", nothing works.

(TASK\s*\(.*?\)\s)({((?>[^{}]+|(?R))*)})

Desired Output

Group1:
   TASK(arg1)
Group2:
    if (cond)
    {
      /* Comment */
      function_call();
      if(condIsTrue)
      {
         DoSomethingelse();
      }
    }
    if (cond1)
    {
      /* Comment */
      function_call1();
    }
Jared Smith
  • 19,721
  • 5
  • 45
  • 83
HOrst
  • 33
  • 4

3 Answers3

1

You are recursing the whole pattern with (?R) which is the same like (?0) whereas you want to recurse (?2), the second group. Group one contains your (TASK...)

See this demo at regex101

(TASK\s*\(.*?\)\s)({((?>[^{}]+|(?2))*)})
                  ^ here starts the second group -> recursion with (?2)
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Exactly what I was looking for. Thanks for the hint. I slightly corrected a bit to (TASK\s*\(.*?\)\s)({(?>[^{}]+|(?2))*}). Worked :). Thanks – HOrst Jun 08 '19 at 16:59
  • You're welcome @HOrst. Your change looks fine too if group 3 unneeded. – bobble bubble Jun 08 '19 at 17:07
0

This problem is a bit complicated, might depend on our inputs, and might be solved in part with regular expressions and in part with scripting, for instance, we would be starting with an expression that passes newlines such as:

(TASK.+)\s*({[\s\S]*})\s*void
(TASK.+)\s*({[\w\W]*})\s*void
(TASK.+)\s*({[\d\D]*})\s*void

here we have a start boundary which is our first desired output:

(TASK.+)

and a left and right boundary around our second desired output:

\s*({[\s\S]*})\s*void

and the right boundary would likely change:

\s*void

Demo

RegEx

If this expression wasn't desired and you wish to modify it, please visit this link at regex101.com.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(TASK.+)\s*({[\s\S]*})\s*void"

test_str = ("TASK(arg1)\n"
    "{\n"
    "    if (cond)\n"
    "    {\n"
    "      /* Comment */\n"
    "      function_call();\n"
    "      if(condIsTrue)\n"
    "      {\n"
    "         DoSomethingelse();\n"
    "      }\n"
    "    }\n"
    "    if (cond1)\n"
    "    {\n"
    "      /* Comment */\n"
    "      function_call1();\n"
    "    }\n"
    "}\n\n\n"
    "void FunctionIDoNotWant(void)\n"
    "{\n"
    "    if (cond)\n"
    "    {\n"
    "      /* Comment */\n"
    "      function_call();\n"
    "    }\n"
    "    if (cond1)\n"
    "    {\n"
    "      /* Comment */\n"
    "      function_call1();\n"
    "    }\n"
    "}")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Emma
  • 27,428
  • 11
  • 44
  • 69
0

This can't be done using regex alone - regex can't count opened (and closed) brackets ({ }). At least not without some weird extensions.

Try this code (assuming start is first character after function header you're looking for):

i = start + 1
c = 1
r = re.compile('[{]|[}]')
while c > 0:
    m = r.search(test_str, i)
    if not m:
        break
    if m.group(0) == '{':
        c += 1
    else:
        c -= 1
    i = m.end(0) + 1
if c == 0:
    print(test_str[start:i])

What it does is iterates over your source code starting right after function header you are looking for and counts opened ({) and closed (}) brackets. Be careful, that macro can introduce those brackets as well - in that case you would probably have to force compiler to produce source code after macro substitution, which depends on compiler.

Radosław Cybulski
  • 2,952
  • 10
  • 21