I used the following command to sample backtraces for an ffmpeg
benchmark:
sudo perf record -d --call-graph dwarf,65528 -c 1000000 -e mem_load_uops_retired.l3_miss:u ffmpeg -i /media/ahmad/DATA/Videos/video.mp4 -threads 1 -vf spp out.mp4
As can be seen, PEBS
is not used, the stack size is set to the maximum and the sampling period is quite large. I also limited the thread count, but this is the first part of perf script --no-demangle
output:
ffmpeg 11750 6670.061261: 1000000 mem_load_uops_retired.l3_miss:u: 0 5080021 N/A|SNP N/A|TLB N/A|LCK N/A
7fffeab68844 x264_pixel_avg_w16_avx2+0x4 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
ffmpeg 11750 6670.274835: 1000000 mem_load_uops_retired.l3_miss:u: 0 5080021 N/A|SNP N/A|TLB N/A|LCK N/A
7fffeab68844 x264_pixel_avg_w16_avx2+0x4 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
ffmpeg 11750 6670.496159: 1000000 mem_load_uops_retired.l3_miss:u: 0 5080021 N/A|SNP N/A|TLB N/A|LCK N/A
7fffeab8ef89 x264_pixel_sad_x4_16x16_avx2+0x49 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
ffmpeg 11750 6670.852598: 1000000 mem_load_uops_retired.l3_miss:u: 0 5080021 N/A|SNP N/A|TLB N/A|LCK N/A
7fffeaac97b3 pixel_memset+0x293 (inlined)
7fffeaac97b3 plane_expand_border+0x293 (inlined)
7fffeaac97b3 x264_frame_expand_border_filtered+0x293 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab463bc x264_fdec_filter_row+0x69c (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab49523 x264_slice_write+0x1873 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab85285 x264_stack_align+0x15 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab45bdb x264_slices_write+0xfb (/usr/lib/x86_64-linux-gnu/libx264.so.152)
5555561e3d87 [unknown] ([heap])
ffmpeg 11750 6671.110007: 1000000 mem_load_uops_retired.l3_miss:u: 0 5080021 N/A|SNP N/A|TLB N/A|LCK N/A
7fffeab6cdde x264_frame_init_lowres_core_avx2+0x8e (/usr/lib/x86_64-linux-gnu/libx264.so.152)
ffmpeg 11750 6671.463562: 1000000 mem_load_uops_retired.l3_miss:u: 0 5080021 N/A|SNP N/A|TLB N/A|LCK N/A
7fffeaabf806 x264_macroblock_load_pic_pointers+0x886 (inlined)
7fffeaabf806 x264_macroblock_cache_load+0x886 (inlined)
7fffeaabf806 x264_macroblock_cache_load_progressive+0x886 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab49204 x264_slice_write+0x1554 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab85285 x264_stack_align+0x15 (/usr/lib/x86_64-linux-gnu/libx264.so.152)
7fffeab45bdb x264_slices_write+0xfb (/usr/lib/x86_64-linux-gnu/libx264.so.152)
1c [unknown] ([unknown])
None of the backtraces are correct. Because none of them begin with _start
or __GI___clone
. I also used LBR
, instead. But it has more size constraints and, therefore, not suitable. Any suggestions on how to get around the problem?
UPDATE:
The problem happens for all events that I checked. When I used mem_load_uops_retired.l3_miss
or LLC-load-misses
the problem was visible from the beginning. I also checked the output with the cycles
event and everything worked fine, at the beginning. But after that, the same problem was seen.
Also, note that, the problem disappears when I sample only kernel mem_load_uops_retired.l3_miss
events.