What do the perf record choices of LBR vs DWARF vs fp do?

Question

When I use the perf record on my code, I find three choices for the --call-graph option: lbr (last branch record), dwarf and fp.

What is difference between these?

@Zulan: heh, I was wondering what "pf" was. Didn't think of it being a typo for "frame pointer", but now this makes sense for stack-unwinding. — Peter Cordes, Aug 09 '19 at 14:02
I believe this question is on-topic: It is about the usage of `perf`, an established tool for which we have a dedicated tag. While this particular topic is somewhat covered in the documentation, I believe this question and the answer add more value. — Zulan, Aug 09 '19 at 16:39

Zulan · Answer 1 · 2019-08-09T21:09:56.043

17

The option --call-graph refers to the collection of call graphs / call chains, i.e. the function stack for a sample.

The default, fp, uses frame pointers. This is very efficient but can be unreliable, particularly for optimized code. By explicitly using -fno-omit-frame-pointer, you can ensure that this is available for your code. Nevertheless, the result for libraries may vary.

With dwarf, perf actually collects and stores a part of the stack memory itself and unwinds it with post-processing. This can be very resource consuming and may have limited stack depth. The default stack memory chunk is 8 kiB, but can be configured.

lbr stands for last branch records. This is a hardware mechanism support by Intel CPUs. This will probably offer the best performance at the cost of portability. lbr is also limited to userspace functions.

edited Aug 09 '19 at 21:09

answered Aug 09 '19 at 14:23

Zulan

21,896
6
49
109

3

LBR is not really recent, it's been there since the P6. – Hadi Brais Aug 09 '19 at 17:15
3

After digging deeper, it turns out that the LBR call stack support requires particular LBR feature (bit 9 of `MSR_LBR_SELECT`), which is only available starting with Haswell. If the user specified to `perf record` profile *only* kernel callchains, then the tool will fall back to `fp` mode. Also, if branch sampling is enabled, `perf record` will fall back to `fp` mode because both features use the same LBR registers, so that they cannot be used together. When `perf record` falls back, a warning is printed to the user. – Hadi Brais Aug 09 '19 at 21:50
1

See the [code](https://github.com/torvalds/linux/blob/a9815a4fa2fd297cab9fa7a12161b16657290293/tools/perf/util/evsel.c#L693). Although I'm not sure what happens when running on a processor that doesn't support bit 9 of `MSR_LBR_SELECT`. Maybe it will also fall back to `fp`, but I didn't find this check in the code. – Hadi Brais Aug 09 '19 at 21:51
when I use "--call-graph dwarf",I found that some of the function information was printed incorrectly(the position of the function on the flame graph is not quite correct.), and I suspect it was a thread problem, so I would like to ask what attention should be paid to using perf to track and print multithreaded information. – The flash Aug 14 '19 at 06:39
1

I don't know any general issues in that direction. This is best addressed in an additional question, but make sure to include a [mcve]. – Zulan Aug 14 '19 at 07:16
@Theflash I am seeing this problem now, same capture dwarf vs lbr produces incorrect flamegraph in dwarf case. Have you got any further findings in your case? – wick Mar 18 '21 at 14:59

What do the perf record choices of LBR vs DWARF vs fp do?

1 Answers1