-1

I am trying to split a C program by its function blocks. For example,

I tried using regex library and try to split by (){. But of no use. Not sure where to begin.

string = """
int firt(){
    if () { 

    }
}

customtype second(){
    if () { 

    }
    for(){

    }
}
fdfndfndfnlkfe
    """

And I want the result to be a list that has each of the function block as an element: ['int first(){ ... }', 'customtype second(){....}']

I tried the following but getting None

import regex
import re

reg = r"""^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}"""

print(regex.match(reg, string))
NewCoder
  • 39
  • 3

2 Answers2

0

Parsing source code is a pretty difficult task. Software like Bison generates source code parsers in C, C++, and Java (C code can be used in Python), but you're unlikely to create a regex to solve this problem (at least easily).

Levi Lutz
  • 151
  • 5
0

First of all: don't - use a parser instead.
Second, if you insist and to see why should use a parser instead, have a glimpse at this recursive approach (which will only work with the newer regex module):

^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}

See a demo on regex101.com. This will break with comments that include curly braces.


In Python this would be
import regex as re

reg = re.compile(r"""^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}""", re.VERBOSE | re.MULTILINE)

for function in reg.finditer(string):
    print(function.group(0))
Jan
  • 42,290
  • 8
  • 54
  • 79
  • This would work great. I tried it with Python but I m getting `None` as the output. I updated the code. Could you kindly see what am i doing wrong? – NewCoder Jul 26 '19 at 09:32
  • @NewCoder: Added the modified snippet, you need the `verbose` flag. – Jan Jul 26 '19 at 10:18
  • shows `re.error: unknown extension ?R at position 41 (line 3, column 16)` ? May I know why? – NewCoder Jul 26 '19 at 15:45
  • @NewCoder: Have you used the exact code from above including `pip install regex` ? – Jan Jul 26 '19 at 17:16
  • with regex, it does not produce anything. the output screen looks empty. i.e. it does not even get in the for loop – NewCoder Jul 26 '19 at 17:24
  • @NewCoder: My fault, you need to add the `multiline` flag as well (`re.VERBOSE | re.MULTILINE`) – Jan Jul 26 '19 at 18:19