We have three servers running on a same ESX host, all virtual disks are from a remote SAN storage controller. These tree servers hanged and restarted several days ago, and it happened to the DB server today once more. The weird thing is there is not any panic log, crash log, error log when the problem occurred.
Server1. Web Server FreeBSD Meduna 8.1-RELEASE-p2 FreeBSD 8.1-RELEASE-p2 #2: Mon Feb 14 12:57:36 MYT 2011 hailang@Meduna:/usr/obj/usr/src/sys/Meduna amd64
Meduna# cat /var/log/messages | grep panic
Meduna# bzcat /var/log/messages.?.bz2 | grep panic
Meduna# cat /var/log/messages | grep error
Meduna# bzcat /var/log/messages.?.bz2 | grep error
May 28 16:05:04 Meduna kernel: /var: mount pending error: blocks 4 files 1
Server2. DB Server FreeBSD Moncalvo 8.1-RELEASE-p2 FreeBSD 8.1-RELEASE-p2 #1: Mon Jan 10 13:02:48 MYT 2011 hailang@Moncalve:/usr/obj/usr/src/sys/Moncalve amd64
Moncalvo# cat /var/log/messages | grep panic
Moncalvo# cat /var/log/messages | grep panic
Moncalvo# bzcat /var/log/messages.?.bz2 | grep panic
Moncalvo# cat /var/log/messages | grep error
Moncalvo# bzcat /var/log/messages.?.bz2 | grep error
May 28 16:17:17 Moncalvo kernel: /var: mount pending error: blocks -32 files 0
Server3. Not_In_Use FreeBSD Mecure 8.1-RELEASE-p2 FreeBSD 8.1-RELEASE-p2 #0: Fri Feb 11 14:45:55 MYT 2011 hailang@ServerX:/usr/obj/usr/src/sys/Mecure amd64
Mecure# cat /var/log/messages | grep panic
Mecure# bzcat /var/log/messages.?.bz2 | grep panic
Mecure# bzcat /var/log/messages.?.bz2 | grep error
Mecure# cat /var/log/messages | grep error
May 28 15:42:41 Mecure kernel: g_vfs_done():da0s1d[WRITE(offset=3275046912, length=16384)]error = 5
May 28 15:42:41 Mecure kernel: g_vfs_done():da0s1d[READ(offset=4062199808, length=16384)]error = 5
May 28 15:42:41 Mecure kernel: g_vfs_done():da0s1d[WRITE(offset=3281371136, length=10240)]error = 5
This is how /var/log/messages looks like when the problem occurs
May 28 13:06:26 Meduna kernel: icmp redirect from 10.16.10.250: 113.23.142.94 => 10.16.10.18
May 28 13:07:01 Meduna kernel: icmp redirect from 10.16.10.250: 202.186.13.232 => 10.16.10.18
May 28 13:15:00 Meduna kernel: icmp redirect from 10.16.10.250: 113.23.142.94 => 10.16.10.18
May 28 13:15:35 Meduna kernel: icmp redirect from 10.16.10.250: 202.186.13.232 => 10.16.10.18
May 28 13:41:36 Meduna syslogd: kernel boot file is /boot/kernel/kernel
May 28 13:41:36 Meduna kernel: Copyright (c) 1992-2010 The FreeBSD Project.
May 28 13:41:36 Meduna kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
[!]It just hanged for about half an hour and restarted without any error.
May 28 13:13:14 Moncalvo kernel: icmp redirect from 10.16.10.250: 60.49.152.98 => 10.16.10.18
May 28 13:14:25 Moncalvo kernel: icmp redirect from 10.16.10.250: 210.48.150.200 => 10.16.10.18
May 28 13:16:58 Moncalvo kernel: icmp redirect from 10.16.10.250: 183.78.169.57 => 10.16.10.18
May 28 15:59:06 Moncalvo syslogd: kernel boot file is /boot/kernel/kernel
May 28 15:59:06 Moncalvo kernel: Copyright (c) 1992-2010 The FreeBSD Project.
May 28 15:59:06 Moncalvo kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
[!]And this server hanged for more than 2 hours to restart
I suspect that this might be a storage problem but without any prove for that. Could you please give me some advise to solve/dig the issue. Any help is highly appreciated!
Best Regards,
Hai Lang