keep failing in running hadoop distributed mode

Question

I'm stuck on this problem for a very long time. I try to run something in distibuted node. I have 2 datanodes and a master with namenode and jobtracker. I keep getting the following error in tasktracker.log of each of the nodes

<
2012-01-03 08:48:30,910 WARN  mortbay.log - /mapOutput: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_1/output/file.out.index in any of the configured local directories
2012-01-03 08:48:40,927 WARN  mapred.TaskTracker - getMapOutput(attempt_201201031846_0001_m_000000_2,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_2/output/file.out.index in any of the configured local directories
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
    at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:324)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>

and this error in hadoop.log of the slave:

2012-01-03 10:20:36,732 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 adding host localhost to penalty box, next contact in 4 seconds
2012-01-03 10:20:41,738 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 copy failed: attempt_201201031954_0006_m_000001_2 from localhost
2012-01-03 10:20:41,738 WARN  mapred.ReduceTask - java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
    at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
    ... 4 more

2012-01-03 10:20:41,739 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 adding host localhost to penalty box, next contact in 4 seconds
2012-01-03 10:20:46,761 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 copy failed: attempt_201201031954_0006_m_000000_3 from localhost
2012-01-03 10:20:46,762 WARN  mapred.ReduceTask - java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000000_3&reduce=1
    at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000000_3&reduce=1
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
    ... 4 more

This is my configuration:

mapred-site:

    <property>
<name>mapred.job.tracker</name>
<value>10.20.1.112:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.</description>
</property>

<property> 
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description> 
</property> 

<property> 
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description> 
</property> 

<property>
  <name>mapred.system.dir</name>
  <value>filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>filesystem/mapreduce/local</value>
</property>

<property>
  <name>mapred.submit.replication</name>
  <value>2</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>tmp</value>
</property>

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
</property>

core-site:

<property>
<name>fs.default.name</name>
<value>hdfs://10.20.1.112:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
</description>
</property>

I've tried playing with tmp dir - didnt help. I've tried playing with mapred.local.dir - didn't help.

I also tired to see what is in the filesystem dir during runtime. I found that the path : taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_1/ exists, but it doesn't have output folder in it.

any idea?

thanks.

java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1 is this valid URL? do you have any files at specified location? Another approach I would try will be, change localhost to machinename (or) 127.0.0.1 — kosa, Jan 03 '12 at 19:01
I changed in hosts file of the slaves 127.0.1.1 to be slave1/slave2. not it the same exception but with slave1/slave2 in the url — AAaa, Jan 03 '12 at 19:11

score 2 · Answer 1 · edited Sep 04 '13 at 20:49

2

Here I think the question is: Your tasktracker wants to ask the map output from master, so it should be:

http://10.20.1.112:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1

but in your tasknode, it tried to get it from

http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1

so the problem occurs, and the main problem is not hadoop.tmp.dir, mapred.system.dir and mapred.local.dir, I'm facing this problem too, and I resolved the problem by deleting the "127.0.0.1 localhost" in /etc/hosts of master, maybe you can try it!

EDIT

In summary, go to the etc/hosts file in the file structure of the node that's causing the error and remove the line 127.0.0.1 localhost

edited Sep 04 '13 at 20:49

planty182

57
1
9

answered Feb 17 '12 at 09:48

Breakinen

619
2
7
13

**Make sure you replace the localhost on the node that's causing the error** this seemed to work for me, but it would be great to know what was causing the call to localhost instead of the actual 'master' namenode that should be set in the hosts file.... – planty182 Sep 04 '13 at 20:25
Also, I'm **not 100% positive that this does solve it**, after looking at my datanode logs after doing this **I see** `2013-09-04 21:34:35,748 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201309042109_0002 2013-09-04 21:34:35,748 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201309042109_0002 being deleted.` but nothing saying that the task has taken place on this node. This makes me think that this makes the **datanode just not processing anything**, and then the **job being assigned to another node** when that node has completed its processing – planty182 Sep 04 '13 at 20:57

score 1 · Answer 2 · answered Jan 04 '12 at 05:46

hadoop.tmp.dir, mapred.system.dir and mapred.local.dir should be absolute paths and not relative. The directory location should start with a /. These properties are also defaulted and there is no need to specify them.

Couple of suggestions if you are new to Hadoop

Start with the Hadoop tutorial 1 and 2 on setting up Hadoop.
Start with the minimum configuration parameters specified in the above tutorials. Once successful, then additional tuning/features can be done. There is no need to specify some of the parameters like mapred.reduce.tasks which default to 2.
If you are new to Linux then start with a Hadoop VM like CDH. Here are the instructions.
For any queries in SO or forums, mention the version of Hadoop.

thanks for your reply. some of the parameters I added becuase of this problem that I'm stuck with. I thought that adding them might solve the problem. I will eventually go to the Hadoop VM you've mentiond, but I already have 3 vms installed so I want to try to solve these problems. Can you post an example of your hosts file if you're using hadoop? When I change the parameters to absolute paths I get this excpetion: java.io.IOException: Undefined job output-path — AAaa, Jan 04 '12 at 07:55

score 0 · Answer 3 · answered Mar 28 '13 at 07:17

Although two warn, but also affect the operating efficiency, they still try to resolve the cause of the error is unable to find a job in the middle of the output file. Need to make the following checks:

a, configuration mapred.local.dir properties b, df-h to see space in the cache path adequacy c, free look at the memory space adequacy d, to ensure that the cache path writable permissions e, check disk corruption

score 0 · Answer 4 · answered Apr 22 '13 at 12:50

0

I faced the same problem I solved by running Hadoop with a sudo command you being an owner.

ie., 1) sudo su Owner_Of_Hadoop

 2) sudo .start-all.sh

Also make sure all the files have proper permissions.

answered Apr 22 '13 at 12:50

Divz

1,393
2
8
6

keep failing in running hadoop distributed mode

4 Answers4

Linked