17

When I use the perf record on my code, I find three choices for the --call-graph option: lbr (last branch record), dwarf and fp.

What is difference between these?

Zulan
  • 21,896
  • 6
  • 49
  • 109
The flash
  • 171
  • 1
  • 3
  • @Zulan: heh, I was wondering what "pf" was. Didn't think of it being a typo for "frame pointer", but now this makes sense for stack-unwinding. – Peter Cordes Aug 09 '19 at 14:02
  • 3
    I believe this question is on-topic: It is about the usage of `perf`, an established tool for which we have a dedicated tag. While this particular topic is somewhat covered in the documentation, I believe this question and the answer add more value. – Zulan Aug 09 '19 at 16:39
  • https://lwn.net/Articles/680996/ – firo Apr 21 '23 at 08:52

1 Answers1

17

The option --call-graph refers to the collection of call graphs / call chains, i.e. the function stack for a sample.

The default, fp, uses frame pointers. This is very efficient but can be unreliable, particularly for optimized code. By explicitly using -fno-omit-frame-pointer, you can ensure that this is available for your code. Nevertheless, the result for libraries may vary.

With dwarf, perf actually collects and stores a part of the stack memory itself and unwinds it with post-processing. This can be very resource consuming and may have limited stack depth. The default stack memory chunk is 8 kiB, but can be configured.

lbr stands for last branch records. This is a hardware mechanism support by Intel CPUs. This will probably offer the best performance at the cost of portability. lbr is also limited to userspace functions.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • 3
    LBR is not really recent, it's been there since the P6. – Hadi Brais Aug 09 '19 at 17:15
  • 3
    After digging deeper, it turns out that the LBR call stack support requires particular LBR feature (bit 9 of `MSR_LBR_SELECT`), which is only available starting with Haswell. If the user specified to `perf record` profile *only* kernel callchains, then the tool will fall back to `fp` mode. Also, if branch sampling is enabled, `perf record` will fall back to `fp` mode because both features use the same LBR registers, so that they cannot be used together. When `perf record` falls back, a warning is printed to the user. – Hadi Brais Aug 09 '19 at 21:50
  • 1
    See the [code](https://github.com/torvalds/linux/blob/a9815a4fa2fd297cab9fa7a12161b16657290293/tools/perf/util/evsel.c#L693). Although I'm not sure what happens when running on a processor that doesn't support bit 9 of `MSR_LBR_SELECT`. Maybe it will also fall back to `fp`, but I didn't find this check in the code. – Hadi Brais Aug 09 '19 at 21:51
  • when I use "--call-graph dwarf",I found that some of the function information was printed incorrectly(the position of the function on the flame graph is not quite correct.), and I suspect it was a thread problem, so I would like to ask what attention should be paid to using perf to track and print multithreaded information. – The flash Aug 14 '19 at 06:39
  • 1
    I don't know any general issues in that direction. This is best addressed in an additional question, but make sure to include a [mcve]. – Zulan Aug 14 '19 at 07:16
  • @Theflash I am seeing this problem now, same capture dwarf vs lbr produces incorrect flamegraph in dwarf case. Have you got any further findings in your case? – wick Mar 18 '21 at 14:59