0

Is there a gcc-xml equivalent or some similar tool for Visual C++ compiler, which can reflect the internal structure of a C++ source code?

My goal is to generate output by using a C++ (native C++) source or header file as input such that:

  1. all preprocessor directives are processed. (This can already be achieved via /P, /E or /EP compiler switches.)
  2. all typedefs are expanded to their base type.
  3. A list of all variables/functions/classes/members name and their signature be obtained.
  4. Optionally, a list of all the instantiated templated classes can be obtained.

As a typedef example, this code:

typedef string my_type;
my_type s1;

be expanded to:

std::basic_string<char, std::char_traits<char>, std::allocator<char> > s1;

or something that can get me to this.

I know that this may be achieved by using gcc-xml (with gcc as intermediate compiler), or a number of non-MSVC options. However the important requirement is that "the compilation all be done by VC++".

Any solution or workaround that can solve at least one of the later features or guide me through the goal is appreciated.

Community
  • 1
  • 1
Masood Khaari
  • 2,911
  • 2
  • 23
  • 40
  • 1
    The Visual C++ compiler does not expose its internal data structures via any public interface. – James McNellis Feb 20 '13 at 06:44
  • `/Zg` switch of cl.exe provides a very useful list of expanded function prototypes. But unfortunately it works on C source files only, thus classes and member functions are left out. I couldn't find any similar switches that generate such lists for classes and/or `typedef`s. – Masood Khaari Feb 20 '13 at 13:29

1 Answers1

1

Semantic Designs (my company) provides the DMS Software Reengineering Toolkit, with C++ parsers for a variety of dialects of C++, including MS Visual Studio.

DMS isn't designed to specifically produce the data you want but it will produce it rather easily. DMS by design is a customizable utility tool that requires a bit of configuration to get custom answers. In your case, pretty much everything you want is available in DMS's C++ symbol table, so the customization would be "walk the symbol table, format and extract what you want." There's a complete set of APIs to support doing just this (as well as many other useful program analysis/manipulation tasks).

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • It's OK to have a symbol table or metadata to generate needed info. But I guess your DMS uses its own internal compiler. Am I right? – Masood Khaari Feb 20 '13 at 11:06
  • The idea is that if VC++ follows its own specific syntax, has incompatibilities with other C++ compilers, or even doesn't follow standards, it's OK. The source generation also must follow a same scheme. E.g. if a C++ source code is standard-incompatible but compiles under VC, it's an accepted source file, and if the source follows standards but fails to compile under VC++, it will be rejected. – Masood Khaari Feb 20 '13 at 11:06
  • Now suppose the user has switched to a different version of MS VC++ (older or newer) that has a slightly different behavior. We must also follow the new compiler rules. Also consider future releases of VC++ with a possibly broader coverage of C++11. – Masood Khaari Feb 20 '13 at 11:09
  • 1) DMS has a C++ front end that we specifically built for it; it isn't using the MS compiler. 2) We understand very well the notion of specific dialects of C++, including the different variations produced by MS over the years, including managed vs not going back to Visual C++ 6, and our front end can be configured to handle any of these, 3) theses front end may accept somewhat more than the MS compiler does, but determine the same semantic consequences for the same input, 4) yes, we handle full C++11 and to the extent we understand how MS has implemented it differently, we've tracked that. – Ira Baxter Feb 20 '13 at 14:43
  • 1
    I'll note that the C++ standard is simply humonguous and convoluted, and the MS is not very good about documenting what they have done. Consequently our front end matches MS to the extent that we know about (via documentation we understand :) or have encountered in practice. As a consequence, I'd expect one to be able to still find dark corner differences. We work hard to adjust things when a difference is encountered. I know of no other alternatives that make any attempt to track MS. – Ira Baxter Feb 20 '13 at 14:48
  • I prefer to use MS compiler directly instead of having to constantly track what MS VC++ does (by myself or whoever is interested). After hours of searches however, it seems impossible to do in a documented and standard way. Anyway, I give your toolkit a try, although I was looking for an open-source and free tool or source code. – Masood Khaari Feb 20 '13 at 14:55
  • I know MS VC++'s tedious complexity and hence appreciate your hard work :) – Masood Khaari Feb 20 '13 at 14:58
  • The reason we build front ends is a) because the vendors of most language tools won't provide us access to theirs [MS C++ is a perfect example], and b) we want to do things to code their compilers are are not designed, and cannot be bent, to do, such as mechanically transform the code in a reliable way, c) DMS has an architecture that makes many interesting things possible, and the front ends need to be compatible with that architecture. – Ira Baxter Feb 20 '13 at 15:03
  • But gcc and many C++ compilers and front-ends (like Clang, MinGW, Cygwin, ...), thanks to being open-source, theoretically provide many functionality to their users. gcc-xml as an example transforms C++ source code into xml in a reliable way. Note that gcc-xml is also free and open-source. Undoubtedly your toolkit is great in easing the users needs in this scope. But it's not solution to my specific requirement. Also, we develop open-source software that cannot rely on proprietary tools. – Masood Khaari Feb 20 '13 at 15:26
  • Lots of people object that our stuff isn't open source, but don't have the interest or energy to make their own open source version. This stuff is hard; we couldn't afford to build it for free. Good luck finding another solution; I don't believe its out there. You're welcome to try and bend GCC and Clang to the job. – Ira Baxter Feb 20 '13 at 15:35
  • Downvoter: Care to explain your objection? The question directly addresses OP's request. – Ira Baxter Feb 20 '13 at 20:04
  • 1
    I didn't down-vote your answer. It unfortunately doesn't satisfy my requirement, but it's close enough to want I want. – Masood Khaari Feb 21 '13 at 07:05
  • But about your comment; I think there's a misunderstanding. I knew from first sight that your toolkit is very comprehensive and probably worth paying for it. I'm not kind of people disgracing all the proprietary software. I like many, say, Microsoft products although they are proprietary and not free. I've already appreciated your hard work, because I can imagine how much hard it is. – Masood Khaari Feb 21 '13 at 07:05
  • However, it's not just what I want. I don't want a comprehensive re-engineering toolkit as front-end for a wide range of languages (not just C++) providing [a wide range of](http://www.semdesigns.com/Products/DMS/DMSToolkit.html) analysis and transformation functionalities. What I want (enumerating a list of types/classes/functions of a C++ source file) is just a tiny subset of DMS capabilities which, I believe, is simple enough to be found, if any, as one C++ souce code file or as a relatively light-weight and open-source program like gcc-xml. – Masood Khaari Feb 21 '13 at 07:07
  • Although, it seems there's no such thing. Because unfortunately MSVC++ (as @James McNellis stated) doesn't expose any, documented or not, reliable interface for it. That just means that my feasibility study reached a failure and I just have to give up for my other works. – Masood Khaari Feb 21 '13 at 07:08
  • Thanks for the compliment. I think your understanding of your imagined solution is a little weak, though. You seem to think a "tiny subset ... " is adequate for this task. It isn't. You want precise type information about C++; that means you need a precise parser, and the machinery to support it. You get that one of two ways: 1) a tool that has a full C++ parser built into somehow, or 2) something like DMS. You offer gcc-xml as "light weight". You'll discover it basically includes the gnu C++ compiler machinery (or at least C++ front end), so it isn't "a tiny subset" – Ira Baxter Feb 21 '13 at 10:18
  • Since (for most langauges!) you can't get away with a tiny subset, and the compilers usually don't offer APIs at all [as you have seen], let alone APIs that happen to do what you want, something like DMS is necessary to satisfy the diverse needs of the custom tools community. If you don't want to deal with the (as Einsten said) necessary complexity, then yes, it isn't a solution for you. – Ira Baxter Feb 21 '13 at 10:22
  • It's not about complexity, whether necessary or not. It's about enforced aforementioned requirements I have. That is "all the compilation be done by VC++", and, in case of using a tool, the tool is preferably open-source. Yes, I want type information about C++; but that doesn't mean I "need a precise parser, and the machinery to support it", as I wanted VC++ do all the hard work for me. And I want neither of the two ways you mentioned, because it's exactly the opposite of my requirement. – Masood Khaari Feb 21 '13 at 11:34
  • Usually when one's requirements leads one to an impossible place, the requirements get relaxed. Otherwise you never get a solution. – Ira Baxter Feb 21 '13 at 11:35
  • Compilers usually don't offer APIs, but offer a handful of useful switches. VC++ has the switch `/P` which outputs preprocessed input file, `/Zg` which exposes the list of expanded C function prototypes (the exact thing I want for classes/members), and an undocumented switch `/d1reportAllClassLayout` which gives classes layout in memory. I hoped there was other switch combinations which could help me, or I can get the needed output by reading compiler-generated intermediate files. I therefore call my need (not the one you suggest) "tiny" in comparison to implementing a full C++ parser. – Masood Khaari Feb 21 '13 at 11:35
  • And I called gcc-xml light-weight in comparison to comprehensive frameworks like DMS. Note that a large part of gcc-xml is gcc compiler itself. If VC++ was also open-source, it wasn't too hard to adapt it to generate the needed outputs. So I call again this as "tiny" relative to a full C++ parser. – Masood Khaari Feb 21 '13 at 11:37
  • Changing requirements is totally another issue for the management which don't apply to this post with these intended functionalities. And yes, sometimes people never get a solution. Because lowering the requirements will question the entire project's philosophy. – Masood Khaari Feb 21 '13 at 11:39
  • I do not understand you. "If VC++ was open source... "? this is fantasy. "...(wouldn't be) too hard to adapt it ... tiny relative to a full C++ parser" Huh? If VC++ was available to you, you'd discover it contained *exactly* a full C++ parser. I do understand your point: MS might have made your answer easy to get. (They might have made everybody's special need available with a switch). They did not. – Ira Baxter Feb 21 '13 at 11:40
  • Exactly yes. MSVC++ contains a full parser, which is indeed what I want; to leverage it in order not to reinvent it. And yes, they didn't provided such switch or something. So I wonder what is this discussion for. By "If VC++ was open source..." I meant to denote the lower amount of work gcc-xml has to accomplish in comparison to DSM, and therefore being "relatively" light-weight. – Masood Khaari Feb 21 '13 at 11:52
  • GCC-XML has to do the same kind of thing you would do with DMS. It invokes a full [GCC] C++ parser, and then picks through the symbol table. There's no way to avoid the parser. There's no practical way to avoid building the symbol table (In fact, GCC can't parse without doing this) for your goal. So the work is the same; it isn't "lighter weight". (From one perspective it is: GCC has man-millenia invested in it; DMS only about a man-century because our general foundations makes doing hard language tasks a lot easier). – Ira Baxter Feb 21 '13 at 12:00
  • I don't think what I mean is hard to understand :). When did I tell I want my results without parsing C++ code?! I told I only want the exact MSVC++ parser do the job for me, as gcc-xml uses the exact gcc parser instead of mimicking its behavior. gcc-xml therefore does not do parsing for itself and doesn't do the real hard job. – Masood Khaari Feb 21 '13 at 12:17
  • Ah. You said "leverage it in order not to reinvent it". Yes, GCCXML wouldn't be reinventing it. Yes, DMS wouldn't be reinventing it. The original requirement that it be *precisely* the original version I thought was a soft requirement, and for you its a hard requirement. OK, then, no solution. – Ira Baxter Feb 21 '13 at 15:34
  • I continue to be astonished by drive-by flaggers that object to my answers. This answer addressed OP's original question with a specific request "of any solution" by virtue of being the *only* solution to his problem that I am presently aware (and I try to keep track of this kind of thing). I interpret the absence of other answers as being general confirmation that others are not aware any alternatives. So why the flags? – Ira Baxter Apr 26 '13 at 10:32
  • I personally consider your answer acceptable as it doesn't violate the original question requirements and may be useful to someone having a similar problem. But in general, the absence of an alternative doesn't necessarily (and shouldn't) force to stick to an **unsatisfactory** solution. – Masood Khaari Apr 28 '13 at 04:53
  • FYI, we have decided to write a small parser which presumes that the given C++ source code is syntactically correct. Note that this parser isn't fully complete, but this assumption makes it really simple as it does not have to check for syntax errors, leaving it for the compiler itself to handle. For example, it accepts `#pragma once`. If this is accepted by the compiler too, nothing wrong happens (and we enter the parser phase). If not, the compiler issues an error. This way we have not to consider each and every compiler dialect. – Masood Khaari Apr 28 '13 at 04:56
  • The parser itself is (being) implemented in one or several C++ source files which is far far better that resorting to a heavyweight and commercial [golden hammer](http://sourcemaking.com/antipatterns/golden-hammer): “I have a hammer and everything else is a nail.” (and check [this](http://c2.com/cgi/wiki?GoldenHammer).) – Masood Khaari Apr 28 '13 at 05:00
  • @MassoodKhaari: I'm not sure I understand what you are trying to say. You don't like my solution, because it isn't the "original compiler with a command line switch". Yet you seem to want precise name and type information. Your "small parser" won't ever come close; many other people have tried this. C++ is a tough language to parse, C++11 is much, much tougher to get at the name and type level, and you want it for MS, too. I kind of object to your implication that DMS is (pejorative) hammer. It isnt useful for little tasks; but I think it is pretty good for driving really big nails. – Ira Baxter Apr 28 '13 at 05:20
  • I don't know from which part of my comments you inferred that I "don't like your solution, because it isn't the original compiler with a command line switch"! I suggest you to review the comments again so that I have not to rephrase my notes. But to summarize the reasons: 1) Your solutions doesn't use VC++ compiler for parsing/compiling source codes although it closely mimics it, which is against my original requirement. I don’t want the user have to update the internal DMS for every future release of VC++. – Masood Khaari May 11 '13 at 08:16
  • 2) It's not open-source or even free; Can you suppose a free and open-source library which depends on a commercial software? :) Of course, this is not listed in my original requirements. I hadn't even updated my posting to reflect this, because I thought your solution might still be of use for those who have a similar problem. But if I use DMS, the grounds my library is based on disappears. So I have to either re-evaluate my design or totally abandon the project. – Masood Khaari May 11 '13 at 08:16
  • **"Your small parser won't ever come close."** Just consider [CxxTest](http://cxxtest.com/) as a counterexample! It has an internal light-weight C++ parser written in Python. The whole library is less than 300 KB. Writing a similar parser in C++ is not an impossible task, if Python ever bothers. Please note that my requirements have been eased, as the compiler does C++ syntax checking, and only after a successful compilation my library comes in. I can even restrict library users NOT to apply the syntax extensions that the library provides to the lines containing future syntax of C++. – Masood Khaari May 11 '13 at 08:17
  • There's probably no doubt that DMS "is pretty good for driving really big nails"! although I hadn't tried it for real projects. I already confirmed its values. (Refer to previous comments to avoid duplicate notes.) But this doesn't prevent a tool to be a candidate for a golden hammer. Visual Studio, say, is a really great tool that I use for my everyday work. Eclipse is another great one. C# and Java are awesome languages. But when I use each of these for every problem I encounter, then... well their name is golden hammer. – Masood Khaari May 11 '13 at 08:17
  • Every single problem has a unique solution based on its context, which in turn requires a unique IDE/tool/language/design etc. This is in fact the real job of an engineer to provide specific problem-addressing solutions; otherwise every fool knows a bunch of languages/techniques/tools to use everywhere. DMS is a great tool too. But please consider the context of each and every problem in isolation before proposing DMS, or objecting to anyone’s implication. – Masood Khaari May 11 '13 at 08:18