2

My goal is to profile the execution time of each function in a kernel module. Using the sample scripts I saw online, I came up with the following script to fulfill my need. But occasionally I get negative values for calculated latencies. Although, they happen rarely but I guess that indicates something is wrong with my script. Can anyone help me with that please?

probe module(@1).function(@2).call { 
     begin = gettimeofday_ns()
}

probe module(@1).function(@2).return {
  if (begin>0)
     stats <<< gettimeofday_ns() - begin
}

probe end {
    if (begin == 0) {
        printf("No samples observed so far.\n");

    } else {
        printf("Distribution of %s latencies (in nanoseconds) for %d samples\n", @2, @count(stats))
        printf("max/avg/min: %d/%d/%d\n", @max(stats), @avg(stats), @min(stats))
        print(@hist_log(stats))
    }
}


global begin, stats
soofyaan
  • 299
  • 3
  • 12

1 Answers1

1

The gettimeofday_*() functions can only approximate wallclock time. It is possible that across CPUs, or across a time adjustment moment, the values won't move monotonically the way you expect. get_cycles() is more monotonic on a given CPU, and a few other clock-related functions are available.

Also, your begin variable is a simple scalar. What if the same function is being called from multiple threads/cpus, or if recursion occurs? It'll get overwritten. This should be enough (and work correctly, from a nesting/concurrency point of view):

// no probe FOO.call
probe module(@1).function(@2).return {
  stats <<< gettimeofday_ns() - @entry(gettimeofday_ns())
}
fche
  • 2,641
  • 20
  • 28
  • 1
    Thanks, that worked! I was suspecting that the problem is with "begin" variable being scalar, but since this is my experience with SystemTap, had no clue what to do. Anyways, just out of curiosity: If I wanted to keep the prob FOO.call and use a begin variable as before (basically not using @entry), how could I solve the scalar variable problem? – soofyaan Jul 17 '15 at 00:27
  • 1
    You could make it an array, and index it with tid(). (Other complications could arise, such as the function being recursive, in which case you need to index also with a nesting level; or the kernel might miss a .call or .return probe for whatever reason, in which case the script must tolerate the bad state.) – fche Jul 18 '15 at 10:59