2

At our shop we have nagios checks for the SMART status of hard disks in linux servers, but it hasn't been really useful so far: by the time we get a SMART alarm the system already had problems so we already knew about it :)

We then developed a practice to routinely run SMART background self tests on disks (smartctl -t long, during off-peak hours)and track that data by hand. We log the disk model and s/n, date of the last test, the number of reallocated sectors (we usually try to change every disk with >0 reallocated sectors), and the Power On Hours accumulated by the disk so we can know at a glance which of our disks are older.

Since the number of systems (and thus disks) is increasing we'd like to automate the task of running tests and collecting results. Before starting to reinvent the wheel I started to look out for existing solutions but I had no luck.

Is there any software to automate SMART self tests and collect the resulting data, under Linux - or maybe to integrate that into some hardware inventory management system?

Luke404
  • 5,826
  • 4
  • 47
  • 58
  • 1
    You are not the only one to notice that SMART data is not a reliable failure prediction indicator. Google has published a [research paper](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf) on that topic. – the-wabbit Dec 10 '11 at 23:22
  • SMART may be used as an indicator that there are problems but it most certainly cannot be relied upon as an indicator that all is well. – John Gardeniers Dec 11 '11 at 22:36

2 Answers2

2

Are you already polling these servers via SNMP? If so, if the agent is based on net-snmp, you could use its "extend" functionality (via NET-SNMP-EXTEND-MIB) to stuff the results of arbitrary scripts into OIDs of your choice.

Centreon has a nice howto on their wiki for using net-snmp to monitor SMART data.

If you're not already collecting and storing SNMP, Cricket is an open-source, lightweight solution for the server side, and the net-snmp agent is supported on most Unix-likes.

Royce Williams
  • 1,362
  • 8
  • 16
1

smartd from smartmontools package can run self-tests on schedule, sends e-mail alerts when something serious happens and can run specified programs on specified changes. It also logs SMART attribute changes to syslog which logwatch includes into its daily reports (the reports aren't machine-friendly though). See Comparison of S.M.A.R.T. tools for this and other options.

ivan_pozdeev
  • 352
  • 4
  • 13