0

I have a single executable which consist of many .c source files across several directories. Currently I need to run static analysis on the whole source code, not on each files separately.

I just found gcc ʟᴛᴏ (link time optimisation) works by compressing gimple which mirror the preprocessed source.
Also when the compiler crash during ʟᴛᴏ linking phase, it asks for sending preprocessed sources for the bug report.

By merging source files, I mean combining all the files used for creating the executable into a single file. Compiling and linking that single file would create the library, resulting in doing manually what ʟᴛᴏ does. (but it’s not the aim here. static analysers don’t support things like ɪᴘᴏ/ʟᴛᴏ)
Doing this manually will definitely takes hours…

So, is there a way to merge C source files automatically ? Or at least to get ʟᴛᴏ preprocessed sources ? (it seems thesave-tempsoption does nothing interesting during linking with ʟᴛᴏ)

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user2284570
  • 2,891
  • 3
  • 26
  • 74
  • It's not clear what you mean by "merge C source files". Surely you can't mean taking all the C source files and putting it into a single source file. That would not make a lot of sense as the merged result is unlikely to be a compilable unit (e.g. individual source files could have conflicting file scope static definitions). – kaylum Oct 08 '15 at 21:44
  • @AlanAu :`the merged result is unlikely to be a compilable unit`the purpose of that question is to merge them in a way the result won’t trigger compiler errors, which is a bit what gcc’s ʟᴛᴏ do. – user2284570 Oct 08 '15 at 21:53
  • @AlanAu : [see also](https://llvm.org/bugs/show_bug.cgi?id=25116). – user2284570 Oct 08 '15 at 22:36
  • You are trying to solve whatever problem you have in a completely bogus way (i.e. your static analyzer (which product is it?) very likely *can* deal with multiple C files. Any analyzer able to only process a single C file would not sell well, to say the least. BTW, what does "processing gimple" mean? – Jens Oct 09 '15 at 12:13
  • @Jens : It can deal with multiple .c files, but it won’t perform global analysis on how function interact if they are split across different files. [It is the one officially used by osx](https://llvm.org/bugs/show_bug.cgi?id=25116). A gimple is an internal representation used by gcc. It’s implementation mirror the code and the compile flags. – user2284570 Oct 09 '15 at 12:18

2 Answers2

2

CIL (C Intermediate Language) has a 'merger' feature which I've successfully used for some simple merge operations.

I've used it to merge moderately complicated programs - around a hundred files in different folders. Of course, if your codebase includes any C++ the CIL merger won't be able to merge that.

Mingye Wang
  • 1,107
  • 9
  • 32
mjt
  • 431
  • 3
  • 9
  • accross lot of directory, the operation seems too difficult. Anyway, I learned even with a single file there’s no magic. I earned several bounties since by clever looking. – user2284570 Jun 23 '16 at 20:54
  • CIL performs a semantic merge, so it will definitely work on files in multiple directories. – byako Jun 29 '16 at 14:06
0

No, because for example two files might have conflicting static declarations. There are other ways that moving source code into a single file might make it stop working, and diagnosing every possible one would require solving the Halting Problem. (Does an arbitrary program ever use the result of __FILE__ in such a way that it fails if two sections of code are in the same file?) File-scope declarations are the most likely to occur in the real world, though.

That said, you can try just concatenating the files and seeing what error messages you get. Most headers should keep working if you #include them twice. A conflicting identifier name can be fixed by a search-and replace in the original files.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • 1
    `No, because for example two files might have conflicting static declarations.`Yes I perfectly know that doing`cat **\*.c > git.c`doesn’t solve the problem. Finding an automatic way to overcome declarations is the point of that question. So basically I’m asking`cat **\*.c > git.c`doesn’t work, how to overcome this. You answer that by`cat **\*.c > git.c`won’t work. – user2284570 Oct 08 '15 at 22:08
  • `That said, you can try just concatenating the files and seeing what error messages you get. Most headers should keep working if you #include them twice. A conflicting identifier name can be fixed by a search-and replace in the original files.`and in my question I wrote this will just takes ages and there are already partial automatic solutions like gcc’s ʟᴛᴏ. – user2284570 Oct 08 '15 at 22:12
  • I’m saying that no automated solution that works on every pathological example can exist. That said, you can get pretty far by renaming every static identifier`var` in `foo.c` to `foo_c_var` before you concatenate, making sure all your headers use the `#ifndef FOO_H` trick, and possibly preprocessing every `__FILE__` and `__LINE__` first too. – Davislor Oct 08 '15 at 22:36
  • `I’m saying that no automated solution that works on every pathological example can exist`they do exist, but not in the expected way. That’s pretty a bit how ʟᴛᴏ do, but with preprocessed source. – user2284570 Oct 08 '15 at 22:49
  • So what happens when you pass that tool a header file that behaves differently when included a second time, and two source files that include it once? That’s considered bad style, but it’s legal. – Davislor Oct 08 '15 at 22:49
  • gcc will create an executable, and the compiler optimisations will run as if those two source files where a single one. – user2284570 Oct 08 '15 at 22:54
  • Another example: in ANSI C, it is specifically guaranteed that the standard library functions are defined as macros, and undefining them gives you a function name that you can use as a function pointer. How does the automated tool handle two source files that both do this to the same function? Or one that defines a macro with the same name as a wrapper? – Davislor Oct 08 '15 at 22:55
  • Same thing as previous comment. see https://gcc.gnu.org/projects/lto/lto.pdf. At this point, you might simply [try it out](https://gcc.gnu.org/wiki/LinkTimeOptimization). This option is automatically enabled after gcc 4.9. – user2284570 Oct 08 '15 at 23:04
  • @user2284570 Or consider what happens when you compile two source files with conflicting options, such as linking a file compiled with `-std=c11` and one compiled with `-std=c89`. You can do all this if your combined source is like an archive of all the files and you insert the optimize step right before linking, but you can’t if the combined file has to be legal C you could pass to another tool that doesn't know your extensions or mangling. In particular, if you have to mangle the syntax so much that a human can’t tell which line originally caused an error, the tool isn't much good. – Davislor Oct 08 '15 at 23:15
  • That’s why gimples record command lines options, so the`lto1`program will take care of those kind of behaviour when linking. I recognize that particular case can’t be done manually, but it’s outside my question scope *(where everything is in c89)*. – user2284570 Oct 08 '15 at 23:20
  • @user2284570 Would it be accurate to say, you aren’t asking whether it’s possible to translate multiple source files into a single source file that has the same behavior and is also legal, working C? If GCC supports any way for other tools to analyze the intermediate step of its whole-program optimize pass, I’m not aware of it, sorry. – Davislor Oct 08 '15 at 23:25
  • The best is gcc isn’t probably [the only one](https://en.wikipedia.org/wiki/Interprocedural_optimization#Flags_and_implementation). As far I know, gcc is the only one which can issue warning about particular lines of you code during link time or invoke`as`on`ld`output. – user2284570 Oct 08 '15 at 23:37
  • @user2284570 Do we both agree that it is possible to “merge C source files” in the sense of linking them together in an intermediate format suitable for IPO, but not in the sense of creating a single new C source file which compiles to a program with the same behavior? You can do the latter for large subsets of C by renaming identifiers, though. – Davislor Oct 08 '15 at 23:49
  • Yes for most compilers like icc. but I disagree for gimples, because they mirror the source code. More generally, I don't believe a computer can't do a very simple task. – user2284570 Oct 08 '15 at 23:52
  • @user2284570 I’m 99.94% sure there’s some pathological case that would break any given automated attempt to do this, for example, by using names that clash with the renaming scheme the merge tool would absolutely need to have. Maybe if you renamed everything in a way that absolutely still worked no matter what names or macros the source files declare. But consider the example of two source files that both use *every* legal identifier of the allowed length, in file scope. – Davislor Oct 09 '15 at 00:32
  • Just duplicate the header. If that kind of very simple task can be done manually there is no reason to not do the same automatically. – user2284570 Oct 09 '15 at 12:03
  • @user2284570 Consider what happens if a header defines a feature-test macro in one compilation unit but not others. Say it changes the prototype of a function from `unsigned long` to `uint32_t` where those are different, or `char*` to `void*`. You can’t rename the functions and still link to them. But now you have conflicting prototypes for the same function! – Davislor Oct 09 '15 at 18:16
  • Ok this need to refactor the code but still look possible. – user2284570 Oct 09 '15 at 21:12
  • When you start getting into “need to refactor the code,” and the code does not have a very specific grammar with transformation rules, you should be very suspicious. No computer program can determine whether another arbitrary computer program gives an answer at all or runs forever, even in theory. It sounds simple, but it’s not. – Davislor Oct 09 '15 at 21:27
  • Anyways this doesn’t concern my question where most things are missing include guards. I recognize my question take a bit of the xyz problem and that your answer [doesn’t solve it](https://llvm.org/bugs/show_bug.cgi?id=25116). – user2284570 Oct 09 '15 at 21:30
  • Right. We seem to have gotten sidetracked here. Again, the most practical advice I can give is to rename possibly-conflicting identifiers into different namespaces, e.g., `main_foo`, `frobulate_foo`. – Davislor Oct 09 '15 at 21:41
  • In that case don’t take analyser but do it manually… It will takes less weeks. :( – user2284570 Oct 09 '15 at 21:42