I ran Perf
multiple times and there were many backtraces in all of them not reproducible in Pin
. Note that GDB
could reproduce those backtraces. Here, I will show one of them:
evince 13293 3713.930210: 100 mem_load_uops_retired.l3_miss:uppp: 7ffff5733440 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
7ffff54b0226 g_private_get_impl+0x6 (inlined)
7ffff54b0226 g_private_get+0x6 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff5487f23 thread_memory_from_self+0x93 (inlined)
7ffff5487f23 g_slice_free1+0x93 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff5466c72 g_source_callback_unref+0x22 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff54676eb g_source_destroy_internal+0x6b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff546a30f g_main_dispatch+0x1df (inlined)
7ffff546a30f g_main_context_dispatch+0x1df (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff546a64f g_main_context_iterate+0x1ff (inlined)
7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)
The backtrace says that g_source_callback_unref()
should call g_slice_free1()
at the first point before offset 0x22
. But under Pin
, the immediate sequential execution of the call point and callee entry point does not happen, at all. Here is the call point disassembly (look at offset 33
):
Dump of assembler code for function g_source_callback_unref:
0x0000000000048c50 <+0>: lock subl $0x1,(%rdi)
0x0000000000048c54 <+4>: je 0x48c60 <g_source_callback_unref+16>
0x0000000000048c56 <+6>: repz retq
0x0000000000048c58 <+8>: nopl 0x0(%rax,%rax,1)
0x0000000000048c60 <+16>: push %rbx
0x0000000000048c61 <+17>: mov 0x18(%rdi),%rax
0x0000000000048c65 <+21>: mov %rdi,%rbx
0x0000000000048c68 <+24>: test %rax,%rax
0x0000000000048c6b <+27>: je 0x48c73 <g_source_callback_unref+35>
0x0000000000048c6d <+29>: mov 0x10(%rdi),%rdi
0x0000000000048c71 <+33>: callq *%rax
0x0000000000048c73 <+35>: mov %rbx,%rdi
0x0000000000048c76 <+38>: pop %rbx
0x0000000000048c77 <+39>: jmpq 0x51ba0 <g_free>
End of assembler dump.
In fact in all problematic cases that I checked, the calls were indirect. What is wrong with the Pin
execution?
UPDATE:
In Pin
, I used two analyses both of which failed to find the sequence:
Suppose that IP1
and IP2
are the caller (the callq
instruction above) and the called (the start address of g_slice_free1()
) instruction pointers.
1) (High overhead) A simple analysis routine that checks every single instruction and records the last and second-to-last instruction until these match IP1
and IP2
.
2) (Low overhead) An analysis routine that checks the target of the callq
instruction at IP1
and notifies if it matches IP2
. The instrumentation part is something like this:
...
if (INS_Address(ins) == IP1)
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)targetFinder, IARG_BRANCH_TARGET_ADDR, IARG_THREAD_ID, IARG_END)
...
UPDATE 2:
I did the high overhead analysis in DynamoRIO
and faced exactly the same problem. In other words, g_slice_free1()
is never called in the benchmark based on the analyses in both Pin
and DynamoRIO
. Therefore, it seems reasonable to doubt Perf
output. Is it possible for Perf
to report a backtrace that has never occured during program execution?
UPDATE 3:
It seems that in most of the cases the problem is with Perf
. Most of the backtraces are corrupted with repeated function calls. For example, here is one of them with two gmallocn()
s:
EvJobScheduler 10021 8653.926478: 100 mem_load_uops_retired.l3_miss:uppp: 7fffd1062a00 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
7ffff4b07207 tcache_get+0x197 (inlined)
7ffff4b07207 __GI___libc_malloc+0x197 (inlined)
7fffd9872fb9 gmalloc+0x59 (inlined)
7fffd9872fb9 gmallocn+0x59 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd9872fb9 gmallocn+0x59 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd9951e6f _ZN8TextLine8coalesceEP10UnicodeMap+0xff (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd9952f82 _ZN9TextBlock8coalesceEP10UnicodeMapd+0x752 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd995bc37 _ZN8TextPage8coalesceEbdb+0x1507 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd995cb71 _ZN13TextOutputDev7endPageEv+0x31 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffe803c6d2 _ZL26poppler_page_get_text_pageP12_PopplerPage+0x92 (/usr/lib/x86_64-linux-gnu/libpoppler-glib.so.8.9.0)
7fffe803deb3 poppler_page_get_selection_region+0x63 (/usr/lib/x86_64-linux-gnu/libpoppler-glib.so.8.9.0)
7fffe82ab650 [unknown] (/opt/evince-3.28.4/lib/evince/4/backends/libpdfdocument.so)
7ffff795f165 ev_job_page_data_run+0x2f5 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
7ffff7961309 ev_job_thread+0xe9 (inlined)
7ffff7961309 ev_job_thread_proxy+0xe9 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
7ffff5492194 g_thread_proxy+0x54 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff4e686da start_thread+0xda (/lib/x86_64-linux-gnu/libpthread-2.27.so)
7ffff4b9188e __GI___clone+0x3e (inlined)
While offset 0x59
(i.e., 89
in decimal) does not show any recursion. It is justified below:
...
0x00000000000b2fb0 <+80>: je 0xb2fd0 <gmallocn(int, int)+112>
0x00000000000b2fb2 <+82>: mov %rax,%rdi
0x00000000000b2fb5 <+85>: callq 0xae880 <malloc@plt>
0x00000000000b2fba <+90>: test %rax,%rax
0x00000000000b2fbd <+93>: je 0xb2fd8 <gmallocn(int, int)+120>
...
It is a simple call to malloc()
.
I'm totally confused. Please share your opinions. Regards.