2

I would like to generate some wrapper code based on C++ types. I basically would like to parse some C++ headers, get the types, classes and their fields defined in the headers, and generate some code based on them.

What would be the easiest way to parse C++ and get type information? I thought about using the Clang C++ parser, but I couldn't make a working hello world in a couple of hours, so I gave up for the time being.
Could you advise any other way to parse C++, or if Clang is the easiest solution, could you point me to a simple getting started guide to be able to parse C++ types with it?

(basically any technology would be ok, C++, Java, C#, etc., this would be part of a command line tool)

Mark Vincze
  • 7,737
  • 8
  • 42
  • 81
  • 2
    Parsing C++ is anything but simple... SWIG or CLANG might be a start. – DarkWanderer Mar 28 '14 at 10:08
  • That's true, but my scenario is a bit specific: I only want to get all the types with all their fields, and nothing else, that's why I thought there might be a limited but simple solution. – Mark Vincze Mar 28 '14 at 10:10
  • [GCC-XML](http://gccxml.github.io/HTML/Index.html) perhaps? doxygen can produce such XML output too, albeit somewhat more heuristic and unreliable. Or search for Ira Baxter on this site... I think every one of her posts ends up mentioning her company's products ;-). – Tony Delroy Mar 28 '14 at 10:10
  • @Mark, while you only need the types and their fields, understanding what is a type and what is not, may require full parsing (since the C++ grammar is not context-free). Your problem is by no means simple. – utnapistim Mar 28 '14 at 10:11
  • But if you can make certain assumption about the declarations to parse -- namely that they are "simple" in some definable way -- will take much of the difficulty out. – Peter - Reinstate Monica Mar 28 '14 at 10:13
  • 1
    Types are by far the most complex part of the C++ syntax, so you won't get a simple solution here. You have to use a real parser. – SK-logic Mar 28 '14 at 10:13
  • Have a look at Rose http://en.wikipedia.org/wiki/ROSE_(compiler_framework) (http://www.rosecompiler.org/ seems to be off-line right now) – High Performance Mark Mar 28 '14 at 10:17
  • 2
    Peter's right - you can do something simple with a few lines of regular expressions - but be aware that over time whomever maintains the headers you're wrapping is likely to come along and make various changes such as deciding everything belongs in a namespace, or a template should switch from a typedef to a `using` statement, or a parameter should be selected with metaprogramming etc. - they'll be unimpressed when the hack breaks. – Tony Delroy Mar 28 '14 at 10:18
  • @PeterSchneider: I would consider this pretty simple: `template struct A { typename foo m_f; };` but there is no way to ever parse that without at least a class memory layout. – PlasmaHH Mar 28 '14 at 10:40
  • 1
    @Mark: `template auto f(T1 a, T2 b)->decltype(a+b) {return a+b}; auto x = f(3,2.0);`. This is a simple example of C++11 code which requires full parsing to get the variable type. – DarkWanderer Mar 28 '14 at 11:32
  • @PlasmaHH: I wouldn't ;-). – Peter - Reinstate Monica Mar 28 '14 at 11:44
  • 1
    A "simple header" may very well include `` at which point it's really not simple anymore. – MSalters Mar 28 '14 at 13:03
  • @Tony D: Get your facts straight. – Ira Baxter Mar 29 '14 at 03:18
  • I'm surprised at OP. He spent *a couple of hours* trying to make machinery that works with C++, didn't succeed, and gave up? He shouldn't expect that dealing with C++ should be an easy topic. The language is enormously complex, and thus tools that deal with it are enormously complex. – Ira Baxter Mar 29 '14 at 03:20
  • @IraBaxter: hey - thought I'd thrown some business your way. Anyway, if you're going to tell me to get my facts straight, mind telling me which fact/facts was/were bent? – Tony Delroy Mar 29 '14 at 03:28
  • http://www.semdesigns.com/Company/People/idbaxter/ is one. – Ira Baxter Mar 29 '14 at 03:37

3 Answers3

4

Clang is definitely the easiest option. Consider using cindex python bindings, it's pretty straightforward. Alternatively, you could get an older version of clang which still features an xml backend.

EDIT: the link above seems to be down, so here is a link to the google cache of it.

Another link suggested in the comments: http://www.altdevblogaday.com/2014/03/05/implementing-a-code-generator-with-libclang/

SK-logic
  • 9,605
  • 1
  • 23
  • 35
  • Thanks for the tips, I tried to make this sample work, at the moment I'm stuck at using libclang from Python, here is the followup question: http://stackoverflow.com/questions/22730935/why-cant-this-python-script-find-the-libclang-dll :) – Mark Vincze Mar 29 '14 at 12:12
2

Unless your object is to verify correctness, or the code involves advanced template stuff, consider using the XML output of DOxygen or GCC_XML. Alternatively, consider clang, even if that's what you found too complex. Note that for clang it might be best to work in *nix-land.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • gcc-xml (since based on gcc 4.2 being somewhat limited anyways) doesn't support templates, only their instantiations in a limited way. Also both do not support things like `foo` which I would not consider as "advanced template stuff". – PlasmaHH Mar 28 '14 at 10:22
0

If your generation tool is in Java, consider using the parser from the Eclipse CDT. my set of dependencies are:

  • com.ibm.icu_4.4.2.v20110823.jar
  • org.eclipse.cdt.core_5.3.2.201202111925.jar
  • org.eclipse.equinox.common_3.6.0.v20110523.jar

(these are from an old Eclipse version, because I have a dependency on old java class versions), but taking from the latest CDT wil do.

parsing involves:

FileContent reader;
reader = FileContent.createForExternalFileLocation(fullPath);
IScannerInfo info = new ScannerInfo(definedSymbols, includePaths);
return GPPLanguage.getDefault().getASTTranslationUnit(reader, info, FilesProvider.getInstance(), null, 0,log);

This returns an IASTTranslationUnit that can be accessed through a Visitor pattern (ASTVisitor).

I cannot comment on the accuracy of the parsing in corner scenarios, because so far I've been generating code based on simple C++ structure definitions.

jsantander
  • 4,972
  • 16
  • 27