2

I have 8 drives in the Proliant G7, and when under disk load there is a clicking sound from one of them.

There are NO SMART ERROR MESSAGES.

It is running ESXi, which shows warnings about i/o latency increasing (sometimes to multiple seconds) on the HP internal SCSI Disk (this is just the logical disk), but does not provide drive specific information.

I do have the HP custom ESXi 6.0 build running, but there is no error on the SMART flags for any of the drives.

I can audibly hear the drive clicking, and I have a spare ready, but I do not know which drive to replace.

youcantexplainthat
  • 215
  • 1
  • 2
  • 11
  • What version of ESXi? – ewwhite May 03 '21 at 13:48
  • version 6.0 - hp custom image – youcantexplainthat May 03 '21 at 14:22
  • I’m voting to close this question because it's ignoring the tools available to the OP and it's a very, very narrow case. – ewwhite May 04 '21 at 11:30
  • @ewwhite: Irrespective of the comment thread below, the titled question has not been answered- how do you identify a drive that's clicking if there are no smart errors. I am exploring esxtop and other logs and will update if I discover something. Thus far, the best answer is actually joeqwerty's which actually makes me question the source of the noise - something I'm going to investigate further with the case off this weekend. With regard to it being a narrow case - Is that really grounds for closing? My experience has been that edge cases often provide new insight. – youcantexplainthat May 04 '21 at 12:28

2 Answers2

1

Are the drives accessible while the server is running? If so, grab yourself a screwdriver and place the tip of the blade on each hard drive. Then place your ear on the tip of the handle and listen for the clicking.

https://youtu.be/U927cYhQXB4?t=39

joeqwerty
  • 109,901
  • 6
  • 81
  • 172
  • Thank you for this suggesting. Interestingly, while I can hear the drive I/O while doing this, the clicking does not sound more apparent. Maybe I need to take the server cover off and touch a piece on the inside of the frame rather than the outside of the drive. – youcantexplainthat May 03 '21 at 16:34
1

You can check status using the hpssacli utility from the CLI on the VMware ESXi host.

Please provide the output of:

/opt/hp/hpssacli/bin/hpssacli ctrl all show config

and

/opt/hp/hpssacli/bin/hpssacli ctrl all show config detail

Edit:

The drive and array status is healthy. The Smart Array controller has a set of heuristics that determine drive health, which may include SMART data, retries, scrubbing, etc.

Disks are consumable. If one fails or indicates pre-failure, let it fail.

You have he LED indicators, vSphere health (assuming you have vCenter), the ILO and you can run the commands listed above.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Please see ouput here: https://pastebin.com/ruWKhuna – youcantexplainthat May 03 '21 at 15:47
  • From the perspective of the controller, your array is healthy. I don't think there's any action you need to take. – ewwhite May 03 '21 at 15:57
  • I'm not convinced there isn't a problem. There is a morning daily process that is disk intensive that's been crashing while I see huge i/o latency almost every morning for the last couple weeks (when a different drive failed). I've since noticed the clicking. Is there anything else I can do to vet the drives? – youcantexplainthat May 03 '21 at 16:32
  • @youcantexplainthat: Seven of the drives are operating on a firmware version with a potential data loss issue. – Greg Askew May 03 '21 at 17:03
  • @greg-askew - Really? HPD8 is the latest version available in the HP SPP update manager for G7s. ...but thank you for pointing out that one of them is outdated. I'm not sure I know how to update those drivers without using the HP SUM. – youcantexplainthat May 03 '21 at 18:01
  • @youcantexplainthat: The HPDA version was released after the last G7 SPP. Given the age I would be careful updating it though. https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00040080en_us – Greg Askew May 03 '21 at 21:13
  • @greg-askew - I just noticed that the model number is different on that 8th drive - EH0300FBQDD, which explains why the SUP/SPP did not update it. I might give it a try - thank you. – youcantexplainthat May 03 '21 at 21:54
  • @youcantexplainthat: The eighth drive is a different model not affected by the issue. It also seems less expensive so I'm guessing it is newer, probably a spare replacement for a previous failed drive. – Greg Askew May 03 '21 at 23:31
  • @youcantexplainthat Firmware can be updated directly from the OS. You search for the drive part number on HPE's support site and download the firmware executable for your platform. But this is also silly. I don't know what type of answer you want. Turn the server off and see which drives spin up if you insist on replacing something now. – ewwhite May 04 '21 at 11:33
  • @ewwhite: I agree - I'm skeptical it will resolve either the noise or the performance issue. ...but since this is a test machine, it's worth a try and I will update either way. These minor edge case issues are useful for others. – youcantexplainthat May 04 '21 at 12:30