3

I am currently learning to use Hadoop mapred an have come across this error:

packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py, /tmp/hadoop-unjar4635332780289131423/] [] /tmp/streamjob8641038855230304864.jar tmpDir=null
16/10/31 17:41:12 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:13 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:15 INFO mapred.FileInputFormat: Total input paths to process : 1
16/10/31 17:41:17 INFO mapreduce.JobSubmitter: number of splits:2
16/10/31 17:41:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477933345919_0004
16/10/31 17:41:19 INFO impl.YarnClientImpl: Submitted application application_1477933345919_0004
16/10/31 17:41:19 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1477933345919_0004/
16/10/31 17:41:19 INFO mapreduce.Job: Running job: job_1477933345919_0004
16/10/31 17:41:38 INFO mapreduce.Job: Job job_1477933345919_0004 running in uber mode : false
16/10/31 17:41:38 INFO mapreduce.Job:  map 0% reduce 0%
16/10/31 17:41:56 INFO mapreduce.Job:  map 100% reduce 0%
16/10/31 17:42:19 INFO mapreduce.Job: Task Id : attempt_1477933345919_0004_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Am unable to work out how to fix this error and ahve been searching over the internet. The code I am using for my mapper is:

import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
        print '%s\t%s' % (word, 1)

The code for the reducer is:

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)

    try:
        count = int(count)
    except ValueError:
        continue

    if current_word == word:
        current_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word, current_count)
        current_count = count
        current_word = word

if current_word == word:
    print '%s\t%s' % (current_word, current_count)

In order to run the task I am using :

hduser@master:/opt/hadoop-2.7.3/share/hadoop/tools/lib $ hadoop jar hadoop-streaming-2.7.3.jar -file /home/hduser/mapper.py -mapper "python mapper.py" -file /home/hduser/reducer.py -reducer "python reducer.py" -input ~/testDocument -output ~/results1

Any help would be appreciated as I am new to Hadoop. If any more logs or information are required please don't hesitate to ask.

hudsond7
  • 666
  • 8
  • 25
  • 143 means out of memory. Check `dmesg` output on the host where your container(s) was started. You can also find more useful information in the yarn nodemanager log (hard to say the default path). – Misko Nov 03 '16 at 12:21
  • were you able to solve this? – ishan3243 May 27 '17 at 13:50

1 Answers1

0

Look at the logs for an error in your python code. For EMR/yarn you can find your logs from the WEB UI or on the cluster master shell as shown below (your application id will differ it is printed when the jobs starts). There is a lot of output, redirect it into a file as I show and look for python stack traces.

$ yarn logs -applicationId application_1503951120983_0031 > /tmp/log 
gae123
  • 8,589
  • 3
  • 35
  • 40