0

I am facing a strange issue such that program freezes upon calling PAPI_start_counters function.

For example, when I am using code from here and after compiling as "gcc -o papitest high_level.c -lpapi" and running ./papitest I am getting output: There are 11 counters in this system and then nothing. If I try to kill program with Ctrl+C nothing happens and also nothing happens with kill -9. My system is as follows:

Operating System: Debian GNU/Linux 8 (jessie) Kernel: Linux 3.16.0-5-amd64 Architecture: x86-64 With 32 cores of Intel(R) Xeon(R) CPU E5-4603 v2 @ 2.20GHz

I remember using PAPI in the past on the same server running an older kernel.

EDIT: here is what shows up when running dmesg:

[ 2039.025224] INFO: task papitest:2022 blocked for more than 120 seconds.
[ 2039.025284]       Tainted: G         C    3.16.0-5-amd64 #1
[ 2039.025335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2039.025421] papitest        D ffff88081daf0050     0  2022   2008 0x00000000
[ 2039.025427]  ffff88081fb180d0 0000000000000086 0000000000013b40 ffff88081e823fd8
[ 2039.025431]  0000000000013b40 ffff88081daf0050 ffff88103f4619f8 ffff88081e823de0
[ 2039.025435]  ffff88103f4619fc ffff88081daf0050 00000000ffffffff ffff88103f461a00
[ 2039.025439] Call Trace:
[ 2039.025447]  [<ffffffff815227c5>] ? schedule_preempt_disabled+0x25/0x70
[ 2039.025452]  [<ffffffff81524263>] ? __mutex_lock_slowpath+0xd3/0x1d0
[ 2039.025457]  [<ffffffff81133fc0>] ? remote_function+0x40/0x50
[ 2039.025461]  [<ffffffff8152437b>] ? mutex_lock+0x1b/0x2a
[ 2039.025466]  [<ffffffff81134800>] ? perf_event_read_value+0x30/0xd0
[ 2039.025470]  [<ffffffff811348cd>] ? __perf_read_group_add+0x2d/0x190
[ 2039.025475]  [<ffffffff81136e1a>] ? _perf_event_disable+0x5a/0xb0
[ 2039.025479]  [<ffffffff8113500f>] ? perf_read+0xbf/0x250
[ 2039.025483]  [<ffffffff811afca3>] ? vfs_read+0x93/0x170
[ 2039.025486]  [<ffffffff811b08d2>] ? SyS_read+0x42/0xa0
[ 2039.025492]  [<ffffffff81525c00>] ? system_call_fast_compare_end+0x10/0x15
dbilid
  • 279
  • 2
  • 10
  • Have you used a debugger? What is the stack of the application while it freezes? – Zulan May 02 '18 at 13:49
  • I don't know how to do that since I cannot interrupt the execution of the program, in order for instance to execute command "where" in gdb – dbilid May 02 '18 at 14:19
  • Just run the program with `gdb ./a.out` and interrupt it with ctrl+c. – Zulan May 02 '18 at 16:26
  • I cannot do that. Program is not responding to interrupt, and also if I kill the terminal, process becomes defunct. – dbilid May 02 '18 at 17:18
  • Have you tried exactly what I suggested? Typically `gdb` gets the interrupt and is going to tell you about the program's stack. – Zulan May 02 '18 at 17:33
  • Yes, I tried but nothing happens. By the way, I added kernel messages in case it helps – dbilid May 02 '18 at 20:07
  • Looks like a deadlock in the kernel. Shouldn't jesse be at least on 3.16.56-1? – Zulan May 03 '18 at 07:38
  • Yes, it should. I had an issue with a package that was keeping my dist-upgrade back. I fixed that, but the problem still remains with kernel 3.16.56-1. On the other hand, upgrading to Debian 9 with kernel 4.9.88-1 fixes the problem. – dbilid May 04 '18 at 10:59

0 Answers0