I am curious to know : what is current state of art in memory managemnet of IR during Interprocedural Data flow Analysis. I want to know does IR for complete code resides in the memory during analysis or some memory management techniques are applied to load and unload the IR at any instant of time. In context of llvm/gcc infrastructure how is it possible to scale any analysis to million lines of code.
1 Answers
You are correct that holding the IR for the entire program is problematic. The current state of the art is the gold linker, which is responsible for enabling whole-program-optimization in both GCC and LLVM. Its early whole-program optimization design draft is the best description I found of how it works, though of course a lot have changed since 2007.
In general, it has three stages:
Each compilation unit is compiled and optimized separately into an object file*. The optimization here can include interprocedural optimizations but those don't cross compile-unit bounds.
The linker analyzes all the object files and builds a control-flow graph for the entire program. This is memory-intensive but manageable - full function code is not needed here. Then decisions are made about which transformations the linker should perform.
The linker performs transformations as decided in step (2). Each of these is localized and thus requires loading only a limited subset of the entire program code.
Steps (1) and (3) are composed of many tasks that can be performed in parallel.
* Better optimizations are enabled when working with compiler IR over regular object files. In GCC it works by embedding the IR inside the object file; in LLVM it works by just providing an LLVM IR file as an object file to the linker. In both cases this is enabled by using plugins to the linker.

- 26,231
- 8
- 93
- 152
-
thanks for answer. With current state of art as per my study about the subject creating a call graph for least million line of code has been possible by clustering and partitioning. The question still remains unanswered is do there still exist a need for framework that can perform more complex interprocedural data flow analysis by prpoer management algorithm for IR or there exists such information in literature. – user3382041 Mar 06 '14 at 05:44