0

I have development box with apache and userdir mod enabled.

Sometimes, entire /home partition becomes inaccessible. Apache can't access scripts stored there, and I cant cd to /home nor ls its content in any way.

Otherwise everything work ok. Apache works (when not accessing /home), db works, browsing other partitions works but /home is stuck.

Software RAID 5 is used.

I looked on every log I found, check raid with cat /proc/mdstat, all is good, no error nor anything suspicious.

I don't know where else to look or which diagnostic command to run.

Edit: it was running about 5 years without problem. These trouble starts today in the morning when server was turned on. No system update was done in past few days and no config or anything was changed. My guess is faulty hdd.

Any leads?

Box is runing Gentoo Linux 2.6.34-r2


df -h

Filesystem             Size  Used  Avail Use%  Mounted on
rootfs                 58G   47G   11G   81%   /
/dev/root              58G   47G   11G   81%   /
rc-svcdir             1,0M   76K  948K    8%   /lib/rc/init.d
udev                   10M  320K  9,7M    4%   /dev
none                 1007M     0 1007M    0%   /dev/shm
/dev/md5               29G   25G  4,5G   85%   /home
/dev/md6               58G  879M   57G    2%   /var/svn
/dev/md7              144G   12G  132G    9%   /var/www
/dev/md8              407G  406G  1,3G  100%   /var/company

mount -v /home

/dev/md5 on /home type reiserfs (rw,noatime,acl)

cat /etc/mtab

rootfs / rootfs rw 0 0
/dev/root / reiserfs rw,noatime 0 0
none /proc proc rw,relatime 0 0
rc-svcdir /lib/rc/init.d tmpfs rw,nosuid,nodev,noexec,relatime,size=1024k,mode=755 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
udev /dev tmpfs rw,nosuid,relatime,size=10240k,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620 0 0
none /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime 0 0
/dev/md6 /var/svn reiserfs rw,noatime 0 0
/dev/md7 /var/www reiserfs rw,noatime,acl 0 0
/dev/md8 /var/esoft reiserfs rw,noatime,acl 0 0
usbfs /proc/bus/usb usbfs rw,noexec,nosuid,devmode=0664,devgid=85 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,noexec,nosuid,nodev 0 0
/dev/md5 /home reiserfs rw,noatime,acl 0 0

cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 sdd1[2] sdc1[3] sdb1[1] sda1[0]
  40064 blocks [4/4] [UUUU]

md2 : active raid5 sdd2[2] sdc2[3] sdb2[1] sda2[0]
  6024000 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md3 : active raid5 sdd3[2] sdc3[3] sdb3[1] sda3[0]
  60026496 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md5 : active raid5 sdd5[2] sdc5[3] sdb5[1] sda5[0]
  30025152 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md6 : active raid5 sdd6[2] sdc6[3] sdb6[1] sda6[0]
  60026496 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md7 : active raid5 sdd7[2] sdc7[3] sdb7[1] sda7[0]
  150030720 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md8 : active raid5 sdd8[2] sdc8[3] sdb8[1] sda8[0]
  426332544 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>
Peter
  • 405
  • 1
  • 5
  • 8
  • 1
    Looks like the apache issue is just a symptom of a deeper problem. Can you add in the output of `mount` and `df -h`? – SmallClanger Dec 10 '10 at 10:56
  • Did you check the files permissions? – Khaled Dec 10 '10 at 11:45
  • @SmallClanger - I updated OP with some system info – Peter Dec 10 '10 at 11:47
  • @Khaled - Yes, no problem with that. – Peter Dec 10 '10 at 11:48
  • That all looks intact. I'd suspect a faulty HDD myself, but if there's no I/O problems on the other partitions in that array, then that doesn't seem likely. Any likely messages in `/var/log/kern.log` or `dmesg` around the time the problem appears? Perhaps something is trying to mount itself at /home (or is dismounting the existing partition). Maybe a mis-firing backup script of some sort? – SmallClanger Dec 10 '10 at 14:10
  • @SmallClanger - I believe its faulty hhd, we found that it err's when certain files is requested from server, but will look at it some more. Thanks for your input. – Peter Dec 10 '10 at 15:02

2 Answers2

0

Check if you have faulty drive from all 4 that make md5

during slow periods when you notice the problem run iostat and look at read/write on md5

if nothing can be seen again when is slow run lsof and look what files are open on /home by apache

a different thing you may try - move a directory from home somewhere else and make a symlink - tell apache to follow symlinks for that directory if the problem is not present then repeat until something is wrong - if nothing is wrong then you have a bad disk into that md5.

silviud
  • 2,687
  • 2
  • 18
  • 19
  • sorry for double answer - my phone reported lost of – silviud Dec 10 '10 at 13:44
  • forgot to say - look if there are any limits on the filesystem – silviud Dec 10 '10 at 13:50
  • Thanks for your input, ill try to look whats going on when problem shows itself. At the time it runs good, but I think its faulty hd. Symlink is good idea so devs may continue their work without interruption. – Peter Dec 10 '10 at 14:53
0

Check if you have faulty drive from all 4 that make md5

during slow periods when you notice the problem run iostat and look at read/write on md5

if nothing can be seen again when is slow run lsof and look what files are open on /home by apache

a different thing you may try - move a directory from home somewhere else and make a symlink - tell apache to follow symlinks for that directory if the problem is not present then repeat until something is wrong - if nothing is wrong then you have a bad disk into that md5.

silviud
  • 2,687
  • 2
  • 18
  • 19