I have been trying to set up a Hadoop cluster; I managed to get it running in pseudo-distributed mode, and my one machine wordcounted Tolstoy's War and Peace in about thirty seconds.
I am now trying to add a second machine to my cluster; To help set it up, I created a user group Hadoop with permissions to start, stop, and run jobs on the Hadoop server (though I left editing the configuration files to root only). I ensured that all members of the group hadoop could ssh using their public keys from the master node to the slave node. I installed hadoop 1.0.0.3 using dpkg. I edited the masters and slaves files correctly on the master node and the slave node, and changed the configurations to point to the correct NameNode and JobTracker:
In core-site.xml:
fs.default.name=hdfs://$MASTER:9000
In mapred-site.xml:
mapred.job.tracker=$MASTER:9001
where $MASTER is the hostname of my master machine.
My NN, SNN, and JobTracker are starting correctly; however, my slave node is not able to connect to my master node! This is the behavior I see in my DataNode log:
2012-05-25 09:36:23,390 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: $MASTER/10.23.95.197:9000. Already tried 0 time(s).
2012-05-25 09:36:23,390 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: $MASTER/10.23.95.197:9000. Already tried 1 time(s).
...
...
connect to server: $MASTER/10.23.95.197:9000. Already tried 9 time(s).
2012-05-25 09:36:31,394 INFO org.apache.hadoop.ipc.RPC: Server at $MASTER/10.23.95.197:9000 not available yet, Zzzzz...
over and over and over again. I see the same thing in the TaskTracker log, except the port number listed there is 9001. lsof tells me that the correct processes are listening on both ports. What is going wrong???
All logs from $MASTER can be found at http://pastebin.com/ZzyKBQVJ
Thanks; please let me know if you have any quetions.
java PID 12324 TCP localhost:9000->localhost:52373 ESTABLISHED
java PID 12598 TCP localhost:52373->localhost:9000 ESTABLISHED
12324 is my NameNode process, 12598 is a DataNode running on $MASTER. – ILikeFood May 25 '12 at 17:03