Processing gcov data files for tracing purposes

Question

I'm trying to create a tool similar to TraceGL, but for C-type languages:

enter image description here

As you can see, the tool above highlights code flows that were not executed in red.

In terms of building this tool for Objective-C, for example, I know that gcov (and libprofile_rt in clang) output data files that can help determine how many times a given line of code has been executed. However, would the gcov data files be able to tell me when a given line of code occurred during a program's execution?

For example, if line X is called during code paths A and B, would I be able to ascertain from the gcov that code paths A and B called line X given line X alone?

score 0 · Answer 1 · answered Apr 19 '14 at 20:51

As far as I know, GCOV instrumentation data only tells that some point in the code was executed (and maybe how many times). But there is no relationship between the code points that are instrumented.

It sounds like what you want is to determine paths through the code. To do that, you either need to do static analysis of the code (requiring a full up C parser, name resolver, flow analyzer), or you need to couple the dynamic instrumentation points together in execution order.

The first requires you find machinery capable of processing C in all of its glory; you don't want to repeat that yourself. GCC, Clang, our DMS Toolkit are choices. I know the GCC and Clang do pretty serious analysis; I'm pretty sure you could find at least intraprocedural control flow analysis; I know that DMS can do this. You'd have to customize GCC and Clang to extract this data. You'd have to configure DMS to extract this data; configuration is easier than customization because it is a design property rather than a "custom" action. YMMV.

Then, using the GCOV data, you could determine the flows between the GCOV data points. It isn't clear to me that this buys you anything beyond what you already get with just the static control flow analysis, unless your goal is to exhibit execution traces.

To do this dynamically, what you could do is force each data collection point in the instrumented code to note that it is the most recent point encountered; before doing that, it would record the most recent point encountered before it was. This would produce in effect a chain of references between points which would match the control flow. This has two problems from your point of view, I think: a) you'd have to modify GCOV or some other tool to insert this different kind of instrumentation, b) you have to worry about what and how you record "predecessors" when a data collection point gets hit more than once.

score 0 · Answer 2 · answered Apr 21 '14 at 19:11

gcov (or lcov) is one option. It does produce most of the information you are looking for, though how often those files are updated depends on how often __gcov_flush() is called. It's not really intended to be real time, and does not include all of the information you are looking for (notably, the 'when'). There is a short summary of the gcov data format here and in the header file here. lcov data is described here.

For what you are looking for DTrace should be able to provide all of the information you need, and in real time. For Objective-C on Apple platforms there are dtrace probes for the runtime which allow you to trace pretty much anything. There are a number of useful guides and examples out there for learning about dtrace and how to write scripts. Brendan Gregg provides some really great examples. Big Nerd Ranch has done a series of articles on it.

Processing gcov data files for tracing purposes

2 Answers2