5

I need to write a script to check the disk every minute and report if it is failing by any reason. The error could be the absolute disk failure and a bad sector and so on .

First, I wonder if there is any script out there that does the same as it should be a standard procedure (because I really do not want to reinvent the wheel).

Second, I wonder if I want to look for errors in /var/log/messages, is there any list of standard error strings for disks that I can use?

I look for that on the net a lot, there are lots of info and at the same time no info about that.

Any help will be much appreciated.

Thanks,

Amir
  • 5,996
  • 13
  • 48
  • 61
  • 3
    Do the drives support SMART? If so, do you have access to the `smartctl` utility? If so, keep in mind that you might already have smartmontools installed, which *includes a daemon to do exactly what you're trying to do already*. – Charles Jan 17 '12 at 01:18

2 Answers2

4

You could simply parse the output of dmesg which usually reports fairly detailed information about drive errors, well that's how I've collected stats on failing drives before.

You might get better more well documented information by using Parse::Syslog or lower level kernel reporting directly though.

Jeff Burdges
  • 4,204
  • 23
  • 46
  • dmesg gives me boot up info. My servers do not boot that frequently. – Amir Jan 17 '12 at 01:22
  • dmesg does report kernel driver errors too, not just the kernel bootup sequence. I've realized that dmesg output isn't nearly as standardized as syslog output, so maybe you want that if you don't know what the errors look like. I've used dmesg when I was receiving drive errors and wanted to know mroe details. – Jeff Burdges Jan 17 '12 at 01:26
  • Do you know any signature (or list of signatures) in syslog, using which you can say that some disk error/failure is going on? – Amir Jan 17 '12 at 19:56
2

Logwatch does the /var/log/messages part of the ordeal (as well as any other logfiles that you choose to add). You can either choose to use that, or to use its code to roll your own sollution (it's all written in perl).

If your harddrives support SMART, i suggest you use smartctl output for diagnostics as it includes a lot of nice info that can be monitored over time to detect failure.

Jarmund
  • 3,003
  • 4
  • 22
  • 45