0

I am running hadoop 0.20.2 (yes it's a legacy app). I have a simple master-slave setup with 2 nodes. I can start up the cluster fine with jps command on master:

4513 TaskTracker
4225 DataNode
4116 NameNode
4565 Jps
4329 SecondaryNameNode
4410 JobTracker

And jps command on slave:

2409 Jps
2363 TaskTracker
2287 DataNode

However if I run a command which interacts with hdfs like:

hadoop dfs -ls /

it takes a couple of minutes and then one of the datanodes dies. Looking in the log I can see this which is a known bug(the directory is already locked hadoop):

2017-07-05 16:12:59.986 INFO main org.apache.hadoop.hdfs.server.common.Storage - Cannot lock storage /srv/shared/hadoop/dfs/data. The directory is already locked.
Cannot lock storage /srv/shared/hadoop/dfs/data. The directory is already locked.

I have tried stopping all daemons and deleting dfs/data and formatting the namenode. After doing that I can successfully start the cluster again with everything up but as soon as I interact with hdfs or run a MR job a datanode dies.

The exact steps I am taking according to other posts are: 1. stop all daemons 2. delete dfs/data dir 3. run hadoop namenode -format 4. start all daemons

Not sure what else I can try.

C.A
  • 620
  • 1
  • 11
  • 23

1 Answers1

0

As Remus Rusanu correctly pointed out HDFS is stored on a shared mounted folder and that is the problem. Specifying separate data.dirs solves the problem.

C.A
  • 620
  • 1
  • 11
  • 23