2

I'm trying to using stap to just print out all the functions that a program calls. I did some research online and found this script (called para-callgraph.stp):

#! /usr/bin/env stap

function trace(entry_p, extra) {
  printf("%s%s%s %s\n",
         thread_indent (entry_p),
         (entry_p>0?"->":"<-"),
         ppfunc (),
         extra)
}

probe $1.call   { trace(1, $$parms) }
probe $1.return { trace(-1, $$return) }

Which is intended to be run like this:

sudo stap para-callgraph.stp 'process.function("*")' -c `pwd`/my-program

Now when I run this, I run into a problem. Everything works fine at first, but soon systemtap prints this to stderr, and exits:

ERROR: probe overhead exceeded threshold
WARNING: Number of errors: 1, skipped probes: 0
WARNING: There were 62469 transport failures.
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.

Doing some research online revealed to me that a stap heuristic is being triggered and shutting me off, and that I could turn it off by adding two flags -g and --suppress-time-limits. (This suggestion is backed up by man stap on my system.) However, that solution simply does not work and the command:

sudo stap -g --suppress-time-limits para-callgraph.stp 'process.function("*")' -c `pwd`/core-cpu1

Prints a very similar error message, and exits:

ERROR: probe overhead exceeded threshold
WARNING: Number of errors: 1, skipped probes: 0
WARNING: There were 67287 transport failures.
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.

Why isn't this flag an appropriate solution to my problem? And can this problem be solved some other way, or is systemtap simply not appropriate for this kind of use-case?

If it matters, I'm running this on a 32-bit Ubuntu VM.

N.B. I'm most interested in why systemtap fails here, not other ways to accomplish the same thing using other software. (Indeed, it turns out for my use case, the above code was an abuse of systemtap.)

Others
  • 2,876
  • 2
  • 30
  • 52
  • --suppress-time-limits ought to have worked in disabling the overload prevention machinery. Would you mind forwarding a fuller bug report to the mailing list (incl. systemtap version, and some idea of the nature of your core-cpu1 program) ? – fche Oct 31 '16 at 13:35

1 Answers1

2

Transport buffer between probes and consumer is also limited, so if you will print in probes faster than consumer can take, you will see There were NN transport failures error in SystemTap or DTrace drops on CPU X error on DTrace.

The answer to that problem is simple: be less verbose, take data from buffer more frequently (regulated by cleanrate tunable in DTrace) or increase buffer size (-b option and bufsize tunable in DTrace and -s option in SystemTap).

refer: http://myaut.github.io/dtrace-stap-book/dtrace-stap-book-ns.pdf

skytree
  • 1,060
  • 2
  • 13
  • 38