2

I just now rebooted my debian server for maintainance reasons (changed the kernel). However, it seemed to not complete the reboot. So I logged in using Remote KVM and found it hanging at a forced disk check. I (now) know how to avoid the forced disk check:

sudo tune2fs -c 0 -i 0 /dev/sdaX

However, I wonder what's good practice for actual Webservers regarding disk checks. Do you guys simply never run disk checks on your server systems? Do you occasionally take the hour or so of downtime to have it run a disk check or is there even a way to have it run during regular uptime?

3 Answers3

2

Generally speaking if your system is always shut down cleanly you should not need the forced (mount or days) filesystem consistency checks -- the question is basically "Do you trust your filesystem not to screw up if left to its own devices?", and a forced fsck is basically a "No".
On my systems (BSD/UFS) regular disk checks aren't part of the filesystem design, and are not routinely run, and if you want to run one on a mounted filesystem that's possible (background fsck). There are some hacks that accomplish something similar with ext2/ext3 filesystems.

If the filesystem was not cleanly unmounted (e.g. due to a crash) I think you may be out of luck -- again on BSD systems the disk check can run in the background (albeit with substantial performance penalties), but I don't know if the background fsck hacks for Linux can be used at boot-time.

voretaq7
  • 79,879
  • 17
  • 130
  • 214
0

I'm not sure if you are referring to checks of the physical disk or file system checks, but in any case, here is what we do:

File system (fsck) checks are run on as as needed basis unless we start seeing issues that indicate a potential file system corruption.

Physical checks we never run unless we have disks indicating failure. Now this isn't to say we don't monitor our physical disks. System Center Operations Manager and Dell Open Manage do a great job with Dell servers and monitoring their hardware for failures and potential/impending failures.

Eli
  • 372
  • 2
  • 8
0

Those guys who use LVM(-2), have a better way to go: they do snapshot, fsck it and if it's ok, they remove it and then postpone scheduled fscking.

poige
  • 9,448
  • 2
  • 25
  • 52