Check for hard disk errors / signs of failure on CentOS Server

Question

What's the best way to check for HDD errors and early signs of failure on CentOS?

how frequent should the checks be? daily - weekly? – inac Jun 12 '10 at 04:19 — inac, Jun 12 '10 at 04:19

score 3 · Accepted Answer · answered Jun 11 '10 at 15:03

3

I would recommend installing smartmon (http://sourceforge.net/apps/trac/smartmontools/wiki) to your machine this is some software which can check the health of your disks otherwise its going to be checking /var/log/messages or /var/log/syslog for any mentions of scsi errors

answered Jun 11 '10 at 15:03

Paul

593
2
6

smartmon seems it, although its stats mention it'd catch only 60% of failing drives.. if i set smartmon to scan daily, would this actually help the hdd die faster -- it's a seagate 7200.10? – inac Jun 12 '10 at 04:18
@inac smartmon will help hdds to die faster? Where did you read this? Please add an URL. – 030 Feb 26 '15 at 12:19

score 3 · Answer 2 · answered Jun 11 '10 at 15:12

3

dmesg

The kernel will log any diagnostic messages about I/O devices, so you can check those messages out with the dmesg command.

answered Jun 11 '10 at 15:12

Banjer

3,974
12
41
47

but you'd have to run this manually or cronjob dmesg dump to vi? – inac Jun 12 '10 at 04:17
either. you could create a script to dump it with "dmesg > dmesg.dump.txt" and run that daily with cron. – Banjer Jun 14 '10 at 19:25

score 2 · Answer 3 · answered Jun 11 '10 at 15:39

SMART monitoring is a good way. As root, smartctl -a /dev/hda, where hda is the drive you want... could be hdb, sda, etc. Also recommend setting your email address in /etc/aliases as the person who should get root's mail.

That's a very vague answer though. If you have a server made by any of the big manufacturers (Dell, HP, etc), chances are there are better monitoring capabilities available.

score 1 · Answer 4 · answered Jun 11 '10 at 15:04

1

You can run fsck on the device to check for errors.

answered Jun 11 '10 at 15:04

cdated

199
1
1
9

score 0 · Answer 5 · answered Jun 11 '10 at 15:26

0

As Paul says, the SMART logs are a good place to check.

I'd also recommend running BadBlocks. If you've got a RAID card, you might have to use the monitoring on that.

answered Jun 11 '10 at 15:26

Dentrasi

3,752
1
24
19

score 0 · Answer 6 · answered Jul 30 '13 at 15:58

0

You can try full check of partition /dev/sda1 (for example) as

fsck -f /dev/sda1

or, try full write-read non-descructive test of given partition

badblocks -vn /dev/sda1

answered Jul 30 '13 at 15:58

Liibo

109
1

`/dev/sda1 is mounted; it's not safe to run badblocks!` – 030 Feb 26 '15 at 11:58
`e2fsck: Cannot continue, aborting.` – 030 Feb 26 '15 at 11:58
@030 Drop to a runlevel where the main disk is not mounted. – awiebe Aug 10 '18 at 10:00

Check for hard disk errors / signs of failure on CentOS Server

6 Answers6