0

I ran Perf multiple times and there were many backtraces in all of them not reproducible in Pin. Note that GDB could reproduce those backtraces. Here, I will show one of them:

evince 13293  3713.930210:        100 mem_load_uops_retired.l3_miss:uppp:     7ffff5733440         5080022 N/A|SNP N/A|TLB N/A|LCK N/A
    7ffff54b0226 g_private_get_impl+0x6 (inlined)
    7ffff54b0226 g_private_get+0x6 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff5487f23 thread_memory_from_self+0x93 (inlined)
    7ffff5487f23 g_slice_free1+0x93 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff5466c72 g_source_callback_unref+0x22 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff54676eb g_source_destroy_internal+0x6b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff546a30f g_main_dispatch+0x1df (inlined)
    7ffff546a30f g_main_context_dispatch+0x1df (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff546a64f g_main_context_iterate+0x1ff (inlined)
    7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
    555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
    7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
    555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)

The backtrace says that g_source_callback_unref() should call g_slice_free1() at the first point before offset 0x22. But under Pin, the immediate sequential execution of the call point and callee entry point does not happen, at all. Here is the call point disassembly (look at offset 33):

Dump of assembler code for function g_source_callback_unref:
   0x0000000000048c50 <+0>: lock subl $0x1,(%rdi)
   0x0000000000048c54 <+4>: je     0x48c60 <g_source_callback_unref+16>
   0x0000000000048c56 <+6>: repz retq 
   0x0000000000048c58 <+8>: nopl   0x0(%rax,%rax,1)
   0x0000000000048c60 <+16>: push   %rbx
   0x0000000000048c61 <+17>: mov    0x18(%rdi),%rax
   0x0000000000048c65 <+21>: mov    %rdi,%rbx
   0x0000000000048c68 <+24>: test   %rax,%rax
   0x0000000000048c6b <+27>: je     0x48c73 <g_source_callback_unref+35>
   0x0000000000048c6d <+29>: mov    0x10(%rdi),%rdi
   0x0000000000048c71 <+33>: callq  *%rax
   0x0000000000048c73 <+35>: mov    %rbx,%rdi
   0x0000000000048c76 <+38>: pop    %rbx
   0x0000000000048c77 <+39>: jmpq   0x51ba0 <g_free>
End of assembler dump.

In fact in all problematic cases that I checked, the calls were indirect. What is wrong with the Pin execution?

UPDATE:

In Pin, I used two analyses both of which failed to find the sequence:

Suppose that IP1 and IP2 are the caller (the callq instruction above) and the called (the start address of g_slice_free1()) instruction pointers.

1) (High overhead) A simple analysis routine that checks every single instruction and records the last and second-to-last instruction until these match IP1 and IP2.

2) (Low overhead) An analysis routine that checks the target of the callq instruction at IP1 and notifies if it matches IP2. The instrumentation part is something like this:

...
if (INS_Address(ins) == IP1)
    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)targetFinder, IARG_BRANCH_TARGET_ADDR, IARG_THREAD_ID, IARG_END)
...

UPDATE 2:

I did the high overhead analysis in DynamoRIO and faced exactly the same problem. In other words, g_slice_free1() is never called in the benchmark based on the analyses in both Pin and DynamoRIO. Therefore, it seems reasonable to doubt Perf output. Is it possible for Perf to report a backtrace that has never occured during program execution?

UPDATE 3:

It seems that in most of the cases the problem is with Perf. Most of the backtraces are corrupted with repeated function calls. For example, here is one of them with two gmallocn()s:

EvJobScheduler 10021  8653.926478:        100 mem_load_uops_retired.l3_miss:uppp:     7fffd1062a00         5080022 N/A|SNP N/A|TLB N/A|LCK N/A
   7ffff4b07207 tcache_get+0x197 (inlined)
   7ffff4b07207 __GI___libc_malloc+0x197 (inlined)
   7fffd9872fb9 gmalloc+0x59 (inlined)
   7fffd9872fb9 gmallocn+0x59 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
   7fffd9872fb9 gmallocn+0x59 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
   7fffd9951e6f _ZN8TextLine8coalesceEP10UnicodeMap+0xff (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
   7fffd9952f82 _ZN9TextBlock8coalesceEP10UnicodeMapd+0x752 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
   7fffd995bc37 _ZN8TextPage8coalesceEbdb+0x1507 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
   7fffd995cb71 _ZN13TextOutputDev7endPageEv+0x31 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
   7fffe803c6d2 _ZL26poppler_page_get_text_pageP12_PopplerPage+0x92 (/usr/lib/x86_64-linux-gnu/libpoppler-glib.so.8.9.0)
   7fffe803deb3 poppler_page_get_selection_region+0x63 (/usr/lib/x86_64-linux-gnu/libpoppler-glib.so.8.9.0)
   7fffe82ab650 [unknown] (/opt/evince-3.28.4/lib/evince/4/backends/libpdfdocument.so)
   7ffff795f165 ev_job_page_data_run+0x2f5 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
   7ffff7961309 ev_job_thread+0xe9 (inlined)
   7ffff7961309 ev_job_thread_proxy+0xe9 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
   7ffff5492194 g_thread_proxy+0x54 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
   7ffff4e686da start_thread+0xda (/lib/x86_64-linux-gnu/libpthread-2.27.so)
   7ffff4b9188e __GI___clone+0x3e (inlined)

While offset 0x59 (i.e., 89 in decimal) does not show any recursion. It is justified below:

...
0x00000000000b2fb0 <+80>:   je     0xb2fd0 <gmallocn(int, int)+112>
0x00000000000b2fb2 <+82>:   mov    %rax,%rdi
0x00000000000b2fb5 <+85>:   callq  0xae880 <malloc@plt>
0x00000000000b2fba <+90>:   test   %rax,%rax
0x00000000000b2fbd <+93>:   je     0xb2fd8 <gmallocn(int, int)+120>
...

It is a simple call to malloc().

I'm totally confused. Please share your opinions. Regards.

TheAhmad
  • 810
  • 1
  • 9
  • 21
  • TheAhmad, Hello, could you please check is offset 33 is decimal or hex offset? 0x48c50+0x22 = 0x48c72; 33 decimal is 0x21. Could you please also point where are Pin results in your question? – osgx Mar 04 '20 at 18:42
  • @osgx: Those `<+29>` / `<+33>` offsets from the function-start in GDB-style dumps are decimal offsets. `mov 0x10(%rdi),%rdi` is a 4-byte instruction, not 10. – Peter Cordes Mar 04 '20 at 18:51
  • Pin can need with the app memory layout, are ip1 and ip2 absolute addresses or relative ones? – nitzanms Mar 04 '20 at 20:45
  • They are absolute addresses. OK. At least, now, I know that it seems also strange to you. Perhaps I did something wrong. – TheAhmad Mar 05 '20 at 02:03

0 Answers0