We have a HPC setup with four OSS server(OSS1 to OSS4) and two MDS Nodes(MDS1 to MDS2) It has been running till yesterday without any problem. Today morning i found that OSS4 is in shutdown condition. I have verified the OSS3 logs and found that it has been got to fencing state I have again switched on OSS4 now its running
In OSS4 logs i saw some "unreadable" error as mentioned below
Feb 26 04:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
Feb 26 04:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
Feb 26 05:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
Feb 26 05:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
Feb 26 06:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
Feb 26 06:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
Feb 26 07:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently unreadable (pending) sectors
/dev/sda
is a local hard disk. Is it possible the Node fencing is due to this error ?
While running the e2fsck will resolve this issue ?
Herewith i have attached the /var/log/messages
of OSS3 and OSS4
can anybody please analyse the log file and kindly assist me what to do ?