3

I am trying to write to an HDFS directory at hdfs:///home/bryan/test_file/ by submitting a Spark job to a Dataproc cluster.

I get an error that the Name Node is in safe mode. I have a solution to get it out of safe mode, but I am concerned this could be happening for another reason.

Why is the Dataproc cluster in safe mode?

ERROR org.apache.spark.streaming.scheduler.JobScheduler: Error running job streaming job 1443726448000 ms.0
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /home/bryan/test_file/_temporary/0. Name node is in safe mode.
The reported blocks 125876 needs additional 3093 blocks to reach the threshold 0.9990 of total blocks 129098.
The number of live datanodes 2 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
Community
  • 1
  • 1
BAR
  • 15,909
  • 27
  • 97
  • 185

2 Answers2

3

What safe mode means

The NameNode is in safemode until data nodes report on which blocks are online. This is done to make sure the NameNode does not start replicating blocks even though there is (actually) sufficient (but unreported) replication.

Why this happened

Generally this should not occur with a Dataproc cluster as you describe. In this case I'd suspect a virtual machine in the cluster did not come online properly or ran into an issue (networking, other) and, therefore, the cluster never left safe mode. The bad news is this means the cluster is in a bad state. Since Dataproc clusters are quick to start, I'd recommend you delete the cluster and create a new one. The good news, these errors should be quite uncommon.

James
  • 2,321
  • 14
  • 30
0

The reason is that you probably started the master node (housing namenode) before starting the workers. If you shutdown all the nodes, start the workers first and then start the master node it should work. I suspect the master node starting first, checks the workers are there. If they are offline it goes into a safe mode. In general this should not happen because of the existence of heart beat. However, it is what it is and restart of master node will resolver the matter. In my case it was with spark on Dataproc.

HTH

Mich Talebzadeh
  • 117
  • 2
  • 12