I am launching a MapReduce job with python using the code[1]. The problem is that I am getting correct output data in the stderr field [3], instead of coming in the stdout field [2]. Why am I getting correct data in the stderr field? Am I using the Popen.communicate
correctly? Is there a better way to launch a java execution using python (not jython)?
[1] snippet that I use to launch a job in Hadoop
command=/home/xubuntu/Programs/hadoop/bin/hadoop jar /home/xubuntu/Programs/hadoop/medusa-java.jar mywordcount -Dfile.path=/home/xubuntu/Programs/medusa-2.0/temp/1443004585/job.attributes /input1 /output1
try:
process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out,err = process.communicate()
print ("Out %s" % out)
print ("Error %s" % err)
if len(err) > 0: # there is an exception
# print("Going to launch exception")
raise ValueError("Exception:\n" + err)
except ValueError as e:
return e.message
return out
[2] Output that is in stdoutdata:
[2015-09-23 07:16:13,220: WARNING/Worker-17] Out My Setup
My get job name
My get job name
My get job name
org.apache.hadoop.mapreduce.lib.partition.HashPartitioner
---> Job 0: /input1, : /output1-1443006949
10.10.5.192
10.10.5.192:8032
[3] Output that is in stderrdata field:
[2015-09-23 07:16:13,221: WARNING/Worker-17] Error 15/09/23 07:15:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/23 07:15:53 INFO client.RMProxy: Connecting to ResourceManager at /10.10.5.192:8032
15/09/23 07:15:54 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 07:15:54 INFO input.FileInputFormat: Total input paths to process : 4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: number of splits:4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442999930174_0009
15/09/23 07:15:54 INFO impl.YarnClientImpl: Submitted application application_1442999930174_0009
15/09/23 07:15:54 INFO mapreduce.Job: The url to track the job: http://hadoop-coc-1:9046/proxy/application_1442999930174_0009/
15/09/23 07:15:54 INFO mapreduce.Job: Running job: job_1442999930174_0009
15/09/23 07:16:00 INFO mapreduce.Job: Job job_1442999930174_0009 running in uber mode : false
15/09/23 07:16:00 INFO mapreduce.Job: map 0% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job: map 100% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job: Job job_1442999930174_0009 completed successfully
15/09/23 07:16:13 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=423900
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=472
HDFS: Number of bytes written=148
HDFS: Number of read operations=20
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=41232
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=41232
Total vcore-seconds taken by all map tasks=41232
Total megabyte-seconds taken by all map tasks=42221568
Map-Reduce Framework
Map input records=34
Map output records=34
Input split bytes=406
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=532
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=245039104
Virtual memory (bytes) snapshot=1272741888
Total committed heap usage (bytes)=65273856
File Input Format Counters