0

We have an old physical server that we can't lose, so I have virtualised it on VMWare. I have powered it up and am testing it. It froze earlier. In the kernel logs I found the following:

Feb 20 08:22:10 mrtg kernel: Call Trace:
Feb 20 08:22:10 mrtg kernel:  [<ffffffff80064c6f>] __mutex_lock_slowpath+0x60/0x9b
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8000cce1>] do_path_lookup+0x275/0x2f1
Feb 20 08:22:10 mrtg kernel:  [<ffffffff80064cb9>] .text.lock.mutex
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8003c9f9>] do_unlinkat+0x66/0x141
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8005e229>] tracesys+0x71/0xe0
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
Feb 20 08:22:10 mrtg kernel: INFO: task sendmail:7174 blocked for more than 120 seconds 
Feb 20 08:22:10 mrtg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
Feb 20 08:22:10 mrtg kernel: sendmail       D ffff81022ecc6af0      0  7174   3893                 5726 (NOTLB)
Feb 20 08:22:10 mrtg kernel: ffff8101c81fbc78 0000000000000082 0007810000000007 ffff8100348470c0
Feb 20 08:22:10 mrtg kernel: 0000000007f459c0 0000000000000007 ffff81026b899100 ffff8100348470c0
Feb 20 08:22:10 mrtg kernel: 0000384813cc7ab8 00000000000004b2 ffff810216b892e8 0000000107f458c0
Feb 20 08:22:10 mrtg kernel: Call Trace:
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8012ac81>] avc_has_perm=0x46/0x58
Feb 20 08:22:10 mrtg kernel:  [<ffffffff80064c6f>] __mutex_lock_slowpath+0x60/0x9b
Feb 20 08:22:10 mrtg kernel:  [<ffffffff80064cb9>] .text.lock.mutex+0xf/0x14
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8000cef1>] do_lookup+0x90/0x1e6
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8000a23c>] __link_path_walk+0xa01/0xf42
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8000ea11>] link_path_walk+0x42/0xb2
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8000cce1>] do_path_lookup+0x275/0x2f1
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8001283d>] getname+0x15b/0x1c2
Feb 20 08:22:10 mrtg kernel:  [<ffffffff800238c3>] __user_walk_fd+0x37/0x4c
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8002889d>] vfs_stat_fd+0x37/0x4c
Feb 20 08:22:10 mrtg kernel:  [<ffffffff80067b88>] do_page_fault+0x1b/0x4a
Feb 20 08:22:10 mrtg kernel:  [<ffffffff800235f5>] sys_newstat+0x19/0x31
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8005e229>] tracesys+0x71/0xe0
Feb 20 08:22:10 mrtg kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0

Can anyone shed any light on the cause and fix for this please?

Alexander Tolkachev
  • 4,608
  • 3
  • 14
  • 23
  • You could try a `selinux=0` in the kernel boot line. Are there any dangling mounts, NFS mounts maybe? – Gerrit Feb 20 '20 at 16:33
  • I will try disabling selinux as you suggested. There are no NFS mounts. – John Seldon Feb 20 '20 at 16:52
  • Maybe I shouldn't have suggested that. It may be hitting symptoms without addressing a cause. Did you use a vmware converter tool for the physical machine? It maybe that the kernel of the machine is not fit for vmware. See https://kb.vmware.com/s/article/1002402 or this http://www.ezrahill.co.uk/2014/07/21/linux-p2v-centos-56-vmware-converter/ – Gerrit Feb 20 '20 at 17:01
  • I converted it manually, by creating a new VM and doing a minimal install of Centos5.5 on it, then tarring up the files and folders form the source server and untarring them onto the base machine. Everything seems fine except for these occasional kernel issues. – John Seldon Feb 21 '20 at 10:54
  • Did you exclude /lib/modules, /usr/src and /boot from your tars? – Gerrit Feb 21 '20 at 11:17
  • I excluded /proc, /sys, /mnt, /media, /tmp, /dev, /boot, /etc/fstab and /etc/sysconfig/network-scripts/ifcfg-eth*. I did not exclude /lib/modules and /usr/src. Do you think that might be causing an issue? – John Seldon Feb 21 '20 at 11:22
  • Well it depends if there were doublures in /lib/modules or /usr/src, if not then you mainly added only unused directories. Otherwise, yes it can be problematic. But I am wondering, did you also include selinux ,extended attributes (xattrs) and posix acls into your tar files? – Gerrit Feb 21 '20 at 11:50
  • This was the tar command: "sudo tar -cvpf /mnt/usb_drive/mrtg.backup.tar --exclude=/proc --exclude=/sys --exclude=/mnt --exclude=/media --exclude=/tmp --exclude=/dev --exclude=/boot --exclude=/etc --exclude=/tftpboot /" then i separately tarred the /etc folder while excluding fstab and the ifcfg-eth files – John Seldon Feb 21 '20 at 11:57
  • Then I suspect that selinux context is not copied, unless this is default on centos. Also the id's of the system daemon users must correspond exactly (roughly the id's below 1000 in /etc/passwd). Maybe it is not such a bad idea to try booting with selinux=0, but this compromises security. You should really tar with --selinux --xattrs --acls. Also the mount options on both systems have to be similar. And you should exclude /var/run /run /lib/modules /usr/src. And of course any important service like database, webserver, mail must be stopped at least 30 seconds before tarring. – Gerrit Feb 21 '20 at 12:32
  • OK thanks. I will try again with your suggestions and see how it goes. – John Seldon Feb 21 '20 at 14:54
  • Post notes: --xattrs alone is enough for the tar program to also include acls and selinux. Untarring should be on as good as can be a quiescent system. Offline mounts from a rescue ISO would obviously be better (but take care to mount them with extended attributes on). It probably doesn't matter if the original userids are a bit different, unless the programs are running while you unpack. – Gerrit Feb 21 '20 at 15:24

0 Answers0