1

* Update * As it turns out the human readable portion of SMART reporting numbers is pretty useless for the UDMA CRC errors and you just have to track the RAW value. After flushing through over a dozen hard drives or so I never saw the readable portion change only the RAW value. This is also backed by some of what else I've read due to the lack of manufacturers failing to have a cohesive standard adoption.

* Original Post * I have a RAID card consist of multiple cables, backplanes and a multiplex. Originally the drives were all in a RAID, there is an upstream failure creating SMART CRC logs. I need a better way to track these errors other than the SMART reporting

What doing is diagnosing the root cause (eg cable, BP, etc), I do not have an issue doing this. However the only way I'm aware to monitor for this failure is through SMART CRC reporting. However, all the drives have reached the reporting limit, in my case 200 reports.

Is there a software level means of checking this. So for example I run a stress on the individual drives I could see which drive has more of these related reports (even if it doesn't show as a SMART error), then through elimination I could swap out the bad part.

Either Linux or Windows is fine. I'm just not aware whether the SMART CRC reports can be counted elsewhere on a system or if there is an alternative, since it's CRC I'm assuming the RAID controller is involved as well. The RAID software is pretty basic and does not provide any details in the logs, or SMART.I've been able to duplicate the issue with another set of drives, but this is exhausting as you can imagine.

**Notes: - I'm not here for hardware help, Thus I don't need responses asking about what my setup looks like, etc.. - If you do not know what CRC errors are they are upstream failures from the drive and not the drive itself.

Whyudodis
  • 11
  • 3

1 Answers1

0

This seems to be vendor specific, several approaches might work. Promising sounds HTR (HDD Repair Tool) aka HDD.exe/HDD48.exe.

https://forum.hddguru.com/viewtopic.php?f=1&t=36754

http://www.hddoracle.com/viewforum.php?f=30

http://www.hddoracle.com/viewtopic.php?f=22&t=1765


Also the serial approach sounds solid for Seagate drives:

https://askubuntu.com/a/687455


After resetting the counters smartd from the smartmontools can be used to monitor the disks during the tests.

https://linux.die.net/man/8/smartd

It can be configured to live poll smart data and notify on errors.

hargut
  • 3,908
  • 7
  • 10
  • Thanks, that's why i was trying to find something that potentially see SMART reporting live from the OS, or from the controller level. I dug into clearing the SMART errors and seems to be not a viable option. – Whyudodis Jan 06 '20 at 21:49
  • Added an update to the initial answer. As SMART data is internal data managed by the firmware I don't see a way other than polling the data from the device. – hargut Jan 07 '20 at 07:06