hadoop, python, subprocess failed with code 127

Question

I'm trying to run very simple task with mapreduce.

mapper.py:

#!/usr/bin/env python
import sys
for line in sys.stdin:
    print line

my txt file:

qwerty
asdfgh
zxc

Command line to run the job:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py

Error:

INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

How to fix this and run the code? When I use cat /home/cloudera/Documents/test.txt | python /home/cloudera/Documents/map.py it works fine

!!!!!UPDATE

Something wrong with my *.py file. I have copied file from github 'tom white hadoop book' and everything is working fine.

But I cant understand what is the reason. It is not the permissions and charset (if I am not wrong). What else can it be?

score 16 · Answer 1 · answered Apr 02 '18 at 11:58

16

I faced the same problem.

Issue: When the python file is created in Windows environment the new line character is CRLF. My hadoop runs on Linux which understands the newline character as LF

Solution: After changing the CRLF to LF the step ran successfully.

answered Apr 02 '18 at 11:58

akshay lad

171
2
6

This fixed my issue, many thanks! I used Notepad++, Edit>EOL conversion > unix – Yassine Mar 21 '23 at 16:24

score 1 · Answer 2 · answered Mar 28 '17 at 16:11

In -mapper argument you should set command, for running on cluster nodes. So there are no /home/cloudera/Documents/map.py file there. Files that you pass with -files option are placed in working directory, so you can simply use it in this way: ./map.py

I don't remember what permissions are set to this file, so if there are no execute permissions use it as python map.py

so the full command is

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper "python map.py" \
-file /home/cloudera/Documents/map.py

score 0 · Answer 3 · answered Mar 24 '19 at 18:26

0

You have an error in your mapper.py or reducer.py.for example:

Not using #!/usr/bin/env python on top of files.
Syntax or logical error in your python codes. (for example print has different syntax in python2 and python3.)

answered Mar 24 '19 at 18:26

mrr

372
1
3
15

score 0 · Answer 4 · edited Jul 17 '19 at 08:14

0

First Check python --version. If output of python --version is

Command 'python' not found, but can be installed with:

sudo apt install python3       
sudo apt install python        
sudo apt install python-minimal

You also have python3 installed, you can run 'python3' instead.

Install python by using sudo apt install python and run your hadoop job

On my PC it worked and finally it's working

edited Jul 17 '19 at 08:14

barbsan

3,418
11
21
28

answered Jul 17 '19 at 07:52

Nikita Bhanderi

11
1

On WLS I also needed to install python first to make it running correctly. – Konrad Apr 22 '21 at 21:02

score 0 · Answer 5 · answered May 05 '20 at 23:46

0

On local HADOOP 3.2.1 on macOS, I have solved my issue java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 here : https://stackoverflow.com/a/61624913/4201275

answered May 05 '20 at 23:46

jeugregg

116
6

score 0 · Answer 6 · edited Sep 06 '21 at 00:18

Lets assume this is your streaming job that is how it looks in windows. The .py file has new line character is CRLF. So you need do manually the clean up CRLF to LF or use this SED command and you should be good.

!sed -i -e 's/\r$//' WordCount/reducer.py 
!sed -i -e 's/\r$//' WordCount/mapper.py

I used the ! here to tell the Python notebook that I am executing in a VM machine on Windows

!hadoop jar {JAR_FILE} \
  -files WordCount/reducer.py,WordCount/mapper.py \
  -mapper mapper.py \
  -reducer reducer.py \
  -input {HDFS_DIR}/alice.txt \
  -output {HDFS_DIR}/wordcount-output \
  -cmdenv PATH={PATH}

score 0 · Answer 7 · answered Mar 09 '23 at 11:27

I solved the issue my installing python3 on all the docker containers.
Docker containers:

namenode
datanode(s)
resourcemanager
nodemanager

apt update
apt install python3

If that does not solve the issue, then this might:

Make your mapper.py and reducer.py as executable
```
   chmod 777 mapper.py reducer.py
```
And add the shebang at the top of both scripts
```
   #!/usr/bin/env python3
```

score 0 · Answer 8 · answered Mar 30 '23 at 21:17

As it may be the case for someone else beside myself...: it might be due to the shebang at the top of the file(s).

For instance, if you have #!/usr/bin/env python and python is not recognized in your system ($ which python returns blank or python not found), it will throw that very unspecific error. To solve it, you can just change it to what you might actually have: #!/usr/bin/env python3 (or install).

hadoop, python, subprocess failed with code 127

8 Answers8