19

Is there a non-toyish tool that can create a call graph of the whole application? I don't mean just getting a picture or drawing call graph by means of pointing method-by-method.

I need a call graph, which is accessible programmatically, i.e. the tool should flush it to a file in text mode (e.g. XML) or build the call graph in memory (which becomes problematic for large application). A call graph built in a DB would be great.

Both static and dynamic call graphs are in demand; though static one is a little more interesting, the fact that it is overapproximated is acceptable.

I have tried Soot so far. However, it is not able to handle even medium-size projects like FreeCol (java sources are available). Soot depletes 1.5GB of memory on that project, and then JVM crashes, as described here: http://www.sable.mcgill.ca/pipermail/soot-list/2008-July/001828.html

Could anyone suggest a tool to generate a call graph, as described above? Java or .NET languages are ok.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • 4
    Take the hammer and use a 64-bit platform and assign some 6 GB or whatever to the JVM for Soot... ;) – Lucero Jan 29 '10 at 11:46
  • You want a call graph constructed for Java? For C? for ...? I'm guessing Java, but your reference to C# calls this assumption into doubt. – Ira Baxter Jan 31 '10 at 22:22
  • Lucero, thanks. BTW, is JVM able to handle more than 2GB of RAM ? Anyway, though this solution may let me build a call graph for FreeCol, but for a large project (e.g. Alfresco) it will require 100 GB of RAM, etc. That's not the proper way. –  Feb 01 '10 at 11:35
  • Ira Baxter, I would be happy to find a tool for Java or C# (in this order of preference). Indeed, I want to implement some analysis that takes a callgraph on input. Even C++ is somewhat fine, but this language is harder to work with further, as there's no reflection in it. –  Feb 01 '10 at 11:35
  • @Sarge: The DMS solution stages the analysis. Call graph construction for a 35 million line C system happens in 4GB of memory. However, the points-to analysis for that 35 million lines of code takes honest-to-god 95 Gb of VM and several days of CPU, but it does complete! AFAIK, this is the largest points-to analysis done anywhere. – Ira Baxter Feb 01 '10 at 11:39
  • @Sarge: what has reflection got to do with anything? Most people attempting to do analysis of code by using reflection soon discover that reflection is a pretty weak way to get at the code details, as it leaves so much out. The point of a tool like DMS is that it provides complete access to the program code as compiler data structures. No reflection needed. – Ira Baxter Feb 01 '10 at 15:51
  • Which version of soot did you use? did you try using the `-no-bodies-for-excluded` option (in soot 2.5)? – Jus12 Jun 19 '12 at 05:16

4 Answers4

7

Our DMS Software Reengineering Toolkit can construct global call graphs for C, Java, and COBOL. These are computed as an in-memory data structure, and can then be walked to collect arbitrary other facts. (You could export it to some other tool to walk over it, but for a big call graph the time and effort to export would dominate the time to just analyze it, so we tend not to export it. YMMV.).

It is relatively easy to extract call-graph information from a statement of the abstract form of "CALL X(...)", because the target X is right there in the code at the call site. Indirect (virtual or method calls) are problematic in that the actual call targets are not trivially in the code at the call site, but in fact are scattered around the entire system and worse, controlled by runtime conditionals. In the absence of any additional information, a call graph constructor has to assume an indirect call can go to any target with a matching signature; this introduces lots of false-positive call arcs in the graph.

DMS uses a (conservative) global points-to analysis as part of the call-graph extraction process, to determine where such indirect calls go, while minimizing false-positives. See Flow analysis and call graphs for more examples of what DMS can extract, and a sample graph extracted from a system of 250,000 functions.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Ira, the toolkit above would provide me with a control-flow graph, correct? Does it also provide a procedure dependence graph, in a similar manner to grammatech? (http://www.grammatech.com/research/papers/slicing/slicingWhitepaper.html) – Joeblackdev Jun 15 '11 at 13:55
  • It doesn't provide a PDG per se. It does provide use-def chains, a full control flow graph per method/function, and a global call graph. Your interest is apparantly in slicing Java (determined from other SO interactions); there's enough there to slice Java (we use the same information to slice C and COBOL). – Ira Baxter Jun 15 '11 at 15:55
  • Is this tool freely available? Or is it a commercial product? – Joeblackdev Jul 26 '11 at 22:20
  • 1
    @Joeblackdev: DMS is a commercial product. – Ira Baxter Jul 26 '11 at 22:37
  • Is it available to trial? If so, I'm looking to generate a static call graph for a given Java class method. Would the aforementioned toolkit suffice? Thanks – Joeblackdev Jul 26 '11 at 22:39
  • @Joeblackdev: contact me offline. See my bio. – Ira Baxter Jul 26 '11 at 23:12
2

JProfiler is a decent Java profiler which will generate the call graph as well as allows you to export it in XML format.

I have not used Soot , so I cannot comment on how JProfiler stands as compared to Soot, but expect JProfiler to require 2.5-3 times memory as compared to the application.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
rajeshnair
  • 1,587
  • 16
  • 32
  • 2
    A dynamic analyzer can only construct the part of the call graph that is actually traversed during execution. To get a semi-complete graph, you need to exercise the system pretty thoroughly, and given that most test suites only get to 70-80% there will be quite a number of possible calls that simply aren't listed. A dynamic analysis gives an "underestimate". A static analyzer (see my answer) figures out the call graph by inspecting the code. Because any safe analysis must be conservative, a static analyzers gives an "overestimate", but it doesn't miss any potential calls. – Ira Baxter Jan 31 '10 at 22:59
  • OP said he would also be interested in dynamic call graph generation tools, but your point is well made. – jbranchaud May 24 '12 at 20:47
1

Check out http://semmle.com/

I have used their tool when it was in beta. It builds a database of program information that you can programmatically query. The company is a startup and the product is no longer in beta though I cannot find anywhere on their site how to purchase it or how much it costs.

NDepend (http://www.ndepend.com/) is a similar tools for .NET that I have also used but I am not sure if one can access it programatically. XDepend (http://www.xdepend.com/) is their tool for Java, which I have not used.

TheEvilPenguin
  • 5,634
  • 1
  • 26
  • 47
Faron
  • 1,354
  • 10
  • 23
1

1.5 GB is not very much memory for a realistic call graphs. I guess Soot just gives you what you are asking for. Call graphs by other tools may be smaller, but then they will likely be incomplete.

Eric
  • 1,343
  • 1
  • 11
  • 19
  • We've built reasonably precise call graphs (using pointer analysis for function pointers) for a system with 26,000 compilation units, 250,000 functions in a 32 bit address space. That seems pretty realistic to me. The *points-to* analysis required 95 Gigabytes VM. (Yes, GB). – Ira Baxter Jun 15 '11 at 15:57