2

I have an idea for finding unused ('dead') methods in a large Java project but I need help deriving an implementation.

  1. Use AspectJ to add a 'before' aspect to ALL methods in project packages. The aspect will simply record (?) that the method has been executed.
  2. I compile a list of all classes/methods in project packages (probably using a service locator/reflection).
  3. The advised code is subjected to a full regression test. Ideally, I'd like to put this into production for a while too (if a suitably performant solution can be found).
  4. The lists of executed methods (Step 1) and available methods (Step 2) are compared, yielding a comprehensive list of all methods that were never called (i.e. dead code).

Since steps 2 and 4 can be conducted offline, I'm really only looking for help with Step 1.

Specifically, how can I record when a method is executed? I figure I'm going to encounter OutOfMemoryErrors pretty soon if I attempt any kind of in-memory storage. Likewise, if I store the data in a database/on the file-system, the volume of calls is likely to cause major performance issues. Has anyone ever done something similar? Any advice/suggestions appreciated.

hoipolloi
  • 7,984
  • 2
  • 27
  • 28

2 Answers2

1

Try checking out popular test coverage libraries like Cobertura or EMMA. They do exactly what you're talking about and then some, though not with AspectJ. Cobertura, at least, seems to have no problem storing invocation information down to the line in memory.

Ryan Stewart
  • 126,015
  • 21
  • 180
  • 199
  • I considered Cobertura but I'm concerned about the performance hit of instrumented code that records line-by-line analysis. As you can see, I'm not even sure analysis at the method level is feasible. However, if anyone has any evidence to the contrary I'd love to hear it. – hoipolloi Jul 22 '11 at 00:26
  • @hoipolloi: I just meant to look at it and see how it stores its collected data in memory while being gathered. Given the amount of data it collects, it might give you an idea of where you need to be. – Ryan Stewart Jul 22 '11 at 01:39
  • @hoipolloi: Also, are you talking about running this analysis long-term in production, or just in a testing environment, like a test coverage tool? – Ryan Stewart Jul 22 '11 at 02:03
  • Ah, good call - I'll take a look. I was thinking of long term (a few months) so we can be sure all elements of the application are used. – hoipolloi Jul 22 '11 at 02:29
  • 1
    @hoipolloi: If you're going long-term, then you might use what you find in cobertura/emma for collecting results in memory, and then periodically (hourly, daily, whatever) flush the collected results to a database and clear them from memory. You'd have to be careful of concurrency issues since you'd be removing the stuff from memory as other stuff was potentially being added. I don't know of any library or framework that would help much with that other than using a scheduler for running the flushes. – Ryan Stewart Jul 22 '11 at 02:43
0

Well you want a before call advice. On that advice you'd want to log the method being called. Probably you'd want to keep a set of the methods called so that you don't get duplicates. You could get the current method off of the thisJointPoint. I could give the AspectJ code for it but I think it's beside the point.

I think you'd be better off just using a tool to parse the binary .class files using BECL or ASM, starting with methods you know get called, and build a call graph. This is similar to how the JVM does garbage collection.

But, really you should ask yourself for what you are this. Is it performance? Because the impact should be none. If you want to reduce the size of the .class files you'd be better off using something like ProGuard. That will take care of that and other issues and its already made.

That said I think the best approach if you really want to do this is to instrument your code using Cobertura, and do a run-through of your application and then look at your coverage reports. It shouldn't take long for 90% of your used code to be painted. Any uncalled methods that you can delete without causing a compile error are dead code.

Sled
  • 18,541
  • 27
  • 119
  • 168
  • Static code analysis will fail as a lot of the code is executed at runtime via reflection. Significant manual effort is also not feasible given the size of the code base (>5m lines). Additionally, a lot of business knowledge has been lost over time so there are areas of code we don't 'know'. Finally, it is the process of 'logging' the execution that I'm interested in. I need a way of doing it without bringing the application to it's knees (performance wise). – hoipolloi Jul 22 '11 at 02:38
  • @hoipolloi Well if you know where reflection is coming from and generally where it goes you can add those points to your list of base points. If there is lots reflection that goes everywhere there is probably something really wrong... But I wouldn't worry about performance since you should be able to find the live code paths with a deployment to a test environment and running through the basic functions of the app. – Sled Jul 22 '11 at 14:36
  • Agreed, in a decent environment a full regression test should cover all points. However, this is a legacy app (I suspect with a lot of obsolete code) and I have zero faith in our current regression test ability. Usual story I'm afraid; under funded & under resourced. A good solid step in bringing the code back under control would be to strip out the vast amounts of dead code so we don't waste time writing tests for code that is no longer used. – hoipolloi Jul 22 '11 at 22:58