Datanode keeps dying

Question

I am running hadoop 0.20.2 (yes it's a legacy app). I have a simple master-slave setup with 2 nodes. I can start up the cluster fine with jps command on master:

4513 TaskTracker
4225 DataNode
4116 NameNode
4565 Jps
4329 SecondaryNameNode
4410 JobTracker

And jps command on slave:

2409 Jps
2363 TaskTracker
2287 DataNode

However if I run a command which interacts with hdfs like:

hadoop dfs -ls /

it takes a couple of minutes and then one of the datanodes dies. Looking in the log I can see this which is a known bug(the directory is already locked hadoop):

2017-07-05 16:12:59.986 INFO main org.apache.hadoop.hdfs.server.common.Storage - Cannot lock storage /srv/shared/hadoop/dfs/data. The directory is already locked.
Cannot lock storage /srv/shared/hadoop/dfs/data. The directory is already locked.

I have tried stopping all daemons and deleting dfs/data and formatting the namenode. After doing that I can successfully start the cluster again with everything up but as soon as I interact with hdfs or run a MR job a datanode dies.

The exact steps I am taking according to other posts are: 1. stop all daemons 2. delete dfs/data dir 3. run hadoop namenode -format 4. start all daemons

Not sure what else I can try.

Is `/srv/shared` a networks shared mounted folder? Is it shared by multiple data nodes? — Remus Rusanu, Jul 05 '17 at 14:36
Not sure I reason why would you use a shared folder for HDFS storage but anyway, make sure **each data node uses a different folder on the shared storage** — Remus Rusanu, Jul 05 '17 at 14:40
Hi Remus, I wouldn't ;-) its a legacy app. Ok thanks , yes that makes sense. — C.A, Jul 05 '17 at 14:43
Ie. node1 one has `dfs.datanode.data.dir` value `/srv/shared/node1/hadoop/dfs/data`, node2 uses `/srv/shared/node2/hadoop/dfs/data` and so on and so forth. — Remus Rusanu, Jul 05 '17 at 14:43

score 0 · Answer 1 · answered Jul 05 '17 at 14:56

0

As Remus Rusanu correctly pointed out HDFS is stored on a shared mounted folder and that is the problem. Specifying separate data.dirs solves the problem.

answered Jul 05 '17 at 14:56

C.A

620
1
11
23

Datanode keeps dying

1 Answers1