2

After I setup hadoop multi node cluster I did ran famous wordcount map reduce example. But I didnt get any output but freezing the task. Here What I get..

12/09/12 13:01:29 INFO input.FileInputFormat: Total input paths to process : 3
12/09/12 13:01:29 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/09/12 13:01:29 WARN snappy.LoadSnappy: Snappy native library not loaded
12/09/12 13:01:30 INFO mapred.JobClient: Running job: job_201209121300_0002
12/09/12 13:01:31 INFO mapred.JobClient:  map 0% reduce 0%
12/09/12 13:01:45 INFO mapred.JobClient:  map 33% reduce 0%
12/09/12 13:01:48 INFO mapred.JobClient:  map 100% reduce 0%
12/09/12 13:01:54 INFO mapred.JobClient:  map 100% reduce 11%

But there are no exceptions in logs in both master and slave. But in slaves task tracker following logs print continously.

2012-09-12 13:23:14,573 INFO org.apache.hadoop.mapred.TaskTracker:     attempt_201209121300_0002_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.04 MB/s) >

Before this I did configure hadoop single node, run the above task and I get the successful output.

p.s : 1.I have two nodes and work as master and slave. Both ip addresses are in /etc/hosts in both nodes.
2.can ssh to each master and slave without password. (passwordless login)
3.After I run the start-dfs.sh in master it also run in the slave machine (I checked with jps)
4. Here is the tutorial that I follow. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
5.Firewall disabled in both machines.

How can I resolve this issue ?

Rajith Delantha
  • 713
  • 11
  • 21
  • Similar post: http://stackoverflow.com/questions/10165549/hadoop-wordcount-example-stuck-at-map-100-reduce-0 – Lorand Bendig Sep 12 '12 at 20:19
  • this could help http://stackoverflow.com/questions/32511280/hadoop-1-2-1-multinode-cluster-reducer-phase-hangs-for-wordcount-program/32551259#32551259 – Bruce_Wayne Sep 13 '15 at 20:14

2 Answers2

4

Finally I made it. Here is I what I did. I checked the wrong logs when running map reduce job. Each and every job hadoop generate some job logs that locate in logs/userlogs/job_id/* . So when I checking these logs i can finally see some exception that coming know as UnknownHostException. So I found my problem. Add slaves computer name with ipadress to my masters /etc/host and restart the hadoop map reduce job.
We can check the above job logs in UI
1.first goto jobtracker.jsp
2.Click the job
3.You can see what is running right now or click map or reduce and check the logs.

Rajith Delantha
  • 713
  • 11
  • 21
2

I would consider playing with the mapred.reduce.slowstart.completed.maps property. By default it is set to 5% meaning that the shuffle starts when 5% of the mapper tasks are done. Your mappers seem to be finished, but it can happen that the last mapper stuck somehow even if the progress bar shows 100% . In this case the shuffle will also hang since it's waiting for the last map to complete.

Besides this you can also change mapred.reduce.parallel.copies (# of parallel copy threads at shuffle) to see if it better fits your HW.

Lorand Bendig
  • 10,630
  • 1
  • 38
  • 45