0

My runtime environment is CentOS 7.9(kernel is version 5.16.11) in the VMBox virtual machine, it is allocated 1G memory and 8 CPU cores.

[root@dev236 ~]# uname -a
Linux dev236 5.16.11-1.el7.elrepo.x86_64 #1 SMP PREEMPT Tue Feb 22 10:22:37 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

I ran a computation-intensive program that used 8 threads to continuously use the CPU.
After some time, the operating system issues a bug alert, like this:

[root@dev236 src]# ./server --p:command-threads-count=8

[31274.179023] rcu: INFO: rcu_preempt self-detected stall on CPU
[31274.179039] watchdog: BUG: soft lockup - CPU#3 stuck for 210S! [server:1356]
[31274.179042] watchdog: BUG: soft lockup - CPU#1 stuck for 210S! [server:1350]
[31274.179070] watchdog: BUG: soft lockup - CPU#7 stuck for 210S! [server:1355]
[31274.179214] rcu: 0-...!: (1 GPs behind) idle=52f/1/0x4000000000000000 softirq=10073864/10073865
fqs=0

Message from syslogd@dev236 at Jan 25 18:59:49 ...
 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 210S! [server:1356]

Message from syslogd@dev236 at Jan 25 18:59:49 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 210S! [server:1350]

Message from syslogd@dev236 at Jan 25 18:59:49 ...
 kernel:watchdog: BUG: soft lockup - CPU#7 stuck for 210S! [server:1355]
 
^C
[root@dev236 src]#

Then, I looked at the program log, and the log file was constantly being appended, which indicated that my test program was still running.

I wonder if I can ignore this bug tip?

Or, do I have to do something?
for example:
    Reduce the computational intensity of the program?
    Give the CPU a break once in a while?
    Reduce the number of threads started in the program?

Thank you all

czg
  • 45
  • 3
  • 210 seconds is a really long time; that doesn't happen just from load; the kernel has timer interrupts to regain control of the CPU every 10 milliseconds or so. If your hardware is getting flaky under load, check the temperatures, and check the RAM. e.g. run a Prime95 stress test, and/or boot into memtest86+, and see if your computer is sometimes flipping bits incorrectly. – Peter Cordes Jan 26 '23 at 01:30
  • Maybe your host has trouble scheduling all those hypervisor threads in time and the guest kernel believe its it fault. – Margaret Bloom Jan 26 '23 at 09:03
  • I am interested in understanding how you are running a 5 kernel when centos 7.9 is a 3.10.0 kernel. – Mikel F Mar 14 '23 at 21:37

0 Answers0