Recent Intel processors provide a hardware feature (a.k.a., Precise Event-Based Sampling (PEBS)
) to access precise information about the CPU state on some sampled CPU events (e.g., e
). Here is an extract from Intel 64 and IA-32 Achitecture's Software Developer's Manual: Volume 3:
18.15.7 Processor Event-Based Sampling (PEBS)
The debug store (DS) mechanism in processors based on Intel NetBurst microarchitecture allow two types of information to be collected for use in debugging and tuning programs: PEBS records and BTS records.
Based on Chapter 17
of the same reference, the DS format for x86-64
architecture is as follows:
The
BTS Buffer
records the last N
executed branches (N
is dependent on the microarchitecture), while the PEBS Buffer
records the following registers:
IIUC, a counter is set and each event (
e
) occurrence increments its value. When the counter overflows, an entry is added to both of these buffers. Finally, when these buffers reach a certain size (BTS Absolute Maximum
and PEBS Absolute Maximum
), an interrupt is generated and the contents of the two buffers are dumped to disk. This will happen, periodically. It seems that the --call-graph dwarf
backtrace data is also extracted in the same handler, Right?
1) Does this mean that LBR
and PEBS
(--call-graph --lbr
) state, perfectly, match together?
2) How about the --call-graph dwarf
output, which is not part of PEBS
(as seems obvious in the above reference)? (Some RIP/RSP
s do not match the backtrace)
Precisely, here is an LKML Thread, where Milian Wolff
shows that the second question is, NO. But I do not fully understand the reason?
The answer to the first question is also, NO (expressed by Andi Kleen
in the latter messages of the thread), which I do not understand at all.
3) Does this mean that the whole DWARF
call-graph information is completely corrupted?
The above thread does not show this, and in my experiments I do not see any RIP
not matching the backtrace. In other words, can I trust the majority of the backtraces?
I do not prefer the LBR
method which may, itself, be imprecise. It is also limited in the size of the backtrace. Although, here is a patch to overcome the size issue. But this is recent and may be bogus.
UPDATE:
- How is it possible to force
Perf
to store only a single record inPEBS Buffer
? Is it only possible to force this configuration, indirectly, e.g., when call-graph information is required for aPEBS
event?