1

I'm working in a software of formal verification of programs, where the user defines an algorithm written in C ++ to be verified. Without going too much into details of the subject matter, I will try to express as clearly as possible what I and my ideas about it.

If the user enters something of the form:

int foo ( [arg1,...,argN] ) {
    if ( T_CONDITION ) {
        T_EXEC;
    }
    else {
        T_EXEC';
    }
}

Then I want to get T_CONDITION and both T_EXEC and T_EXEC', in the form Parts = [ COND => T_CONDITION, EXEC => [ T_EXEC, T_EXEC' ] ], where T_CONDITION is the entire condition and T_EXEC are the sentences that the programm executes if the condition is true and T_EXEC' if the program goes into the else statement. I think this is called "tokenizer" and its the function of a parser, but I'm not sure. The problem is that I don't know anything about parsers. The problem is I do not know where the condition and executions begins or ends, then I cant deal with the string operations.

Once I have T_CONDITION, I need to break it down in such a way to get several atomic logical formulas. Something like:

T_CONDITION = ( ( A OR N ) OR ( B AND C ) OR ( D AND ( E  OR F ) ) )

Then I want to get CONDITION_PARTS = [ [ A ], [ N ], [ B , C ], [ D, [ [ E ], [ F ] ] ] ] this is: if I get A or B, then I need PART = [[A],[B]] and if I get A and B, then PART = [A,B]. But how I can recognize which part of the condition belongs each closing parenthesis?

Is this possible?, What tools should I use to do it?, Do you know some guides about this?

  • 7
    Don't re-invent the wheel: [Clang tooling](http://clang.llvm.org/docs/Tooling.html). – Angew is no longer proud of SO Jan 06 '15 at 18:19
  • This isn't a request for a specific off-site resource (which would be off-topic). This is asking how the **category** of tools is named, which **does** have an objective answer. You can see this because the question already contains a potential (but incorrect) ansswer, "tokenizer". – MSalters Jan 07 '15 at 14:08

2 Answers2

4

Clang is the only sane way to go here. It is a C++ compiler you can invoke as a library. You can use their existing C++ lexer, analyzer, and parser to discover the contents of the file.

Even if you were a parser expert, only an insane man would roll his own C++ parser- it is Turing Complete.

Puppy
  • 144,682
  • 38
  • 256
  • 465
1

Depends on how general you need your parser to be. If you want to handle full C++ syntax, you should look at the lexers in g++ and other open-source frontends.

If you can guarantee a relatively simple syntax, you can get away with rolling your own parser.

But parsing C++ is very hard -- think about all the things that you need to know (template definitions, #define'd constructs, etc....), so if you are hoping to do formal verification in the general case, you will be much better off adapting an existing C++ lexer/parser instead of trying to write your own.

Sam Mikes
  • 10,438
  • 3
  • 42
  • 47