On the servers I have, with HDD or SSD, I have a cron that periodically runs:
/usr/sbin/smartctl --test=short/long /dev/sd1
(for each disk)
While it runs, it just looks at the output of /usr/sbin/smartctl -c /dev/sd1
,
looping until it no longer contains:
[0-9]+% of test remaining.
And then checks if it completed without errors:
( 0) The previous self-test routine completed
However, it appears that smartctl
doesn't yet support testing of NVMe, as of version 7.0, and as per: https://www.smartmontools.org/wiki/NVMe_Support
It does say that
The smartd daemon tracks health (-H), error count (-l error) and temperature (-W DIFF,INFO,CRIT)
but what does actually run the tests?
I'm not sure if the output of -H
and -l
update unless we run short/long tests?
I also read about nvme-cli
, but I don't seem to find ways of running health tests on disks with it.
Any ideas?
Using CentOS 7 here.