HDFS Dead Datanode

Question

I'm working in a HDP-3.1.0.0 environment, the HDFS version I'm using is the 3.1.1.3.1, the cluster is composed by 2 Namenodes and 4 Datanodes. After a reboot of the HDP services (stop all and start all), the cluster seems working well, but I see the following alert:

How can I investigate this problem?

The services in my cluster don't have problems, except from the HBase Region Servers (0/4 live) and the Ambari Metrics Collector. I'm not using Hbase, so I didn't pay attention to it, could it be the root cause? I have tried to start the Ambari Metrics Collector but it always fails.

Checking the logs? Location might be somewhere in `/var/log/`, most probably `/var/log/hadoop/$hdfs` — Technext, Jul 30 '20 at 09:48
Sorry for the delay. In one of the datanode logs, I can see this:`2020-08-24 02:40:58,278 ERROR datanode.DataNode (DataXceiverServer.java:run(174)) - DataNode is out of memory. Will retry in 30 seconds. 2020-08-24 02:41:57,958 ERROR datanode.DirectoryScanner (DirectoryScanner.java:getDiskReport(558)) - Error compiling report for the volume, StorageId: DS-dc437bb5-6de7-49f0-bf70-c707c7351c36 java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space` — GiuVi, Aug 24 '20 at 09:10
The cluster worked very well for a year, but since the end of July it starts to have a lot of problem: services take a very long time to restart, the datanodes become dead also if I try to restart them. I can't understand what could be the root cause. — GiuVi, Aug 24 '20 at 09:15

HDFS Dead Datanode

0 Answers0