7

Is there a free diff utility that can compare two C++ files using their ASTs instead of the text?

What I was thinking of is something like:

  • convert both files to AST
  • Render AST back as C++ code (this canonicalizes indentation)
  • Do normal diff between these two
  • Also try to detect simple refactorings that were done (add/del/rename member for example)
genpfault
  • 51,148
  • 11
  • 85
  • 139
tohava
  • 5,344
  • 1
  • 25
  • 47
  • 5
    That would be slow, and I can't imagine why someone would want such a thing. – Mooing Duck Jan 09 '13 at 18:43
  • 3
    I imagine this is to catch cheating among students. – chrisaycock Jan 09 '13 at 18:44
  • 1
    this is to be able to review code changes on my code by another prorgammer who hates my coding style. Since our group has no official rules for styling, this kind of a tool will be very helpful. – tohava Jan 09 '13 at 18:52
  • 3
    @tohava your group should implement an official code style rule before it's too late to do so. – Mahmoud Al-Qudsi Jan 09 '13 at 18:59
  • 3
    If he reformats your code without making other changes (or you do the same to him) tell him to stop (or stop doing that). Also you should agree on a code style and enforce it with tools such as your IDE's settings or a pre-check-in code formatter. Whether you like it or not matters less than that everybody's code be pretty consistent. – bames53 Jan 09 '13 at 19:00
  • @MooingDuck: why do you think that would be slow? Having built such a tool, I can tell you runs as fast as regular diff. – Ira Baxter Jul 01 '13 at 14:54
  • @IraBaxter: I was assuming such a tool has to parse C++ code into an AST, and then work effectively like a regular diff, but just realized it _doesn't_ have to work like a regular diff. So it would only be slightly slower if at all. – Mooing Duck Jul 01 '13 at 17:05
  • @MooingDuck: How it computes the diff is kind of irrelevant (actually ours uses a Levenstein hueristic lifted to trees plus some other enhancements), but mostly it is limited by reading source from the disk. – Ira Baxter Jul 01 '13 at 17:50

4 Answers4

5

[Asked by one of the other answerers to post the name of a commercial tool.]

Semantic Designs' SmartDifferencer tool will parse C++, and compute a difference based on ASTs; layout formatting simply doesn't matter. The parser is a full C++11 parser. It can parse most source files without expanding most preprocessor directives as long as they are "structured"; C++ preprocessor usage isn't usually as abusive as it is in C.

There are versions of the SmartDifferencer for other languages.

[Disclosure: I'm the CTO at Semantic Designs]

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
2

I can think of two alternative solutions to your problem:

  1. Discuss a coding style that will be used as a group and stick to it. You may have to find compromises between the team members' differing personal coding styles.

  2. If you are using source control, add hooks which remove all formatting on a commit and customize the code formatting on checkout. This takes some works but allows team members to use their own formatting style. Of course, this doesn't solve differing opinions in variable naming and other non-formatting code style elements.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
2

There are tools like MOSS (http://theory.stanford.edu/~aiken/moss/) which might be of help.

A. K.
  • 34,395
  • 15
  • 52
  • 89
1

You could pass the code through AStyle to normalize the indentation/spacing/formatting before doing the diff. This will not do anything for refactorings, but honestly you would need a full preprocess/compile pass to do this properly AFAICS.

http://astyle.sourceforge.net/

StilesCrisis
  • 15,972
  • 4
  • 39
  • 62
  • 2
    This tool doesn't force a CANONICAL form. Take for example "int main()\n{\n}" vs. "int main\n()\n{\n}". They do not yield the same output. A tool that actually compiles the code into an AST would be helpful in this case. – tohava Jan 09 '13 at 19:27
  • I don't think you understand what you're asking for. A full AST pass would require (at a minimum!) a full preprocessor pass, which means every #include would be expanded out, every #define would be expanded out, etc. In many cases, your code would bear little resemblance to what you started with. This isn't even taking into account the "detect refactorings" part, which seems just like a pipe dream. – StilesCrisis Jan 09 '13 at 22:07
  • Let's skip the detect refactorings. The #includes can easily be rolled back during pretty printing (we have #line directives or something similar for this). Macros are trickier, I think they may still be handled if the preprocessor is modified to create something like javascript line maps that the pretty printer can then use to roll macros back. Am I missing something here? – tohava Jan 09 '13 at 22:22
  • 2
    Well, you're asking a lot if you want a free pre-existing utility that does all of this for you. To answer your original question, I think if you want something that does all that you describe, you will need to write it. You are better off talking with your coworkers and getting on the same page, or getting a new job, IMO. – StilesCrisis Jan 10 '13 at 00:20
  • There are commercials tool that will parse C++, and compute a difference based on ASTs. The one I know about can parse source files without expanding most preprocessor directives (as long as they are "structured"; C++ preprocessor usage isn't usually as abusive as it is in C). – Ira Baxter Jan 10 '13 at 05:50
  • Maybe you should post the name of that tool as an answer, Ira? – StilesCrisis Jan 10 '13 at 15:10