What's the best way to check for HDD errors and early signs of failure on CentOS?
-
1how frequent should the checks be? daily - weekly? – inac Jun 12 '10 at 04:19
6 Answers
I would recommend installing smartmon (http://sourceforge.net/apps/trac/smartmontools/wiki) to your machine this is some software which can check the health of your disks otherwise its going to be checking /var/log/messages or /var/log/syslog for any mentions of scsi errors

- 593
- 2
- 6
-
smartmon seems it, although its stats mention it'd catch only 60% of failing drives.. if i set smartmon to scan daily, would this actually help the hdd die faster -- it's a seagate 7200.10? – inac Jun 12 '10 at 04:18
-
@inac smartmon will help hdds to die faster? Where did you read this? Please add an URL. – 030 Feb 26 '15 at 12:19
dmesg
The kernel will log any diagnostic messages about I/O devices, so you can check those messages out with the dmesg command.

- 3,974
- 12
- 41
- 47
-
-
either. you could create a script to dump it with "dmesg > dmesg.dump.txt" and run that daily with cron. – Banjer Jun 14 '10 at 19:25
SMART monitoring is a good way. As root, smartctl -a /dev/hda
, where hda is the drive you want... could be hdb, sda, etc. Also recommend setting your email address in /etc/aliases as the person who should get root's mail.
That's a very vague answer though. If you have a server made by any of the big manufacturers (Dell, HP, etc), chances are there are better monitoring capabilities available.

- 4,077
- 5
- 34
- 42
You can try full check of partition /dev/sda1 (for example) as
fsck -f /dev/sda1
or, try full write-read non-descructive test of given partition
badblocks -vn /dev/sda1

- 109
- 1