3

On the servers I have, with HDD or SSD, I have a cron that periodically runs:

/usr/sbin/smartctl --test=short/long /dev/sd1

(for each disk)

While it runs, it just looks at the output of /usr/sbin/smartctl -c /dev/sd1, looping until it no longer contains:

[0-9]+% of test remaining.

And then checks if it completed without errors:

(   0)  The previous self-test routine completed

However, it appears that smartctl doesn't yet support testing of NVMe, as of version 7.0, and as per: https://www.smartmontools.org/wiki/NVMe_Support

It does say that

The smartd daemon tracks health (-H), error count (-l error) and temperature (-W DIFF,INFO,CRIT)

but what does actually run the tests? I'm not sure if the output of -H and -l update unless we run short/long tests?

I also read about nvme-cli, but I don't seem to find ways of running health tests on disks with it.

Any ideas?

Using CentOS 7 here.

Marcus Müller
  • 500
  • 4
  • 13
Nuno
  • 553
  • 2
  • 8
  • 26
  • I don't *know*, but I would be surprised if running any explicit test would have a very large knowledge advantage for SSDs – these things are in a perfect position to track their own health, since wear leveling literally knows how often each memory segment has been used, *and* due to the comprehensive error-correction code inherent to NVMe devices, you get a very good picture of device aging simply from day-to-day usage. – Marcus Müller Nov 24 '21 at 13:16

2 Answers2

2

SMART self-test were conceived for mechanical disks. SATA SSDs almost completely mirrors earlier HDD interface-level behavior supporting such self-test but not doing very much when you run it, actually. NVMe drives dropped such SMART self-test routines entirely.

For flash-based disks one should really track cells wear, spare block count and reallocated sectors rather then relying on old self-test routines which are not supported on NVMe drives.

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • Thank you very much. Makes sense. Do you know if I just leave `smartd` running, will it let me know of any NVMe disk problems though syslog messages? All I want is to rest assured that I'm covered, and not negligent :-) – Nuno Nov 24 '21 at 20:08
  • 1
    As far I know, `smartd` should be capable of monitoring NVMe SSD health as well to alert in case the drive itself reports a non-healthy status. – shodanshok Nov 24 '21 at 20:28
0

Get the NVME test client installed

sudo apt install nvme-cli

Find the drive you want to check

nvme list
sudo nvme smart-log /dev/nvme0n1

There are some other self-test commands you can run with this command too, I believe these give the old short/long tests that smartctl did.

nvme device-self-test /dev/nvme0 -n 1 -s 1
nvme self-test-log /dev/nvme0n1
jamboNum5
  • 361
  • 1
  • 2
  • 10