1

I am launching a MapReduce job with python using the code[1]. The problem is that I am getting correct output data in the stderr field [3], instead of coming in the stdout field [2]. Why am I getting correct data in the stderr field? Am I using the Popen.communicate correctly? Is there a better way to launch a java execution using python (not jython)?

[1] snippet that I use to launch a job in Hadoop

command=/home/xubuntu/Programs/hadoop/bin/hadoop jar /home/xubuntu/Programs/hadoop/medusa-java.jar mywordcount -Dfile.path=/home/xubuntu/Programs/medusa-2.0/temp/1443004585/job.attributes /input1 /output1

try:
    process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out,err = process.communicate()
    print ("Out %s" % out)
    print ("Error %s" % err)

    if len(err) > 0:  # there is an exception
        # print("Going to launch exception")
        raise ValueError("Exception:\n" + err)
except ValueError as e:
    return e.message

return out

[2] Output that is in stdoutdata:

[2015-09-23 07:16:13,220: WARNING/Worker-17] Out My Setup
My get job name
My get job name
My get job name
org.apache.hadoop.mapreduce.lib.partition.HashPartitioner
---> Job 0: /input1, : /output1-1443006949
10.10.5.192
10.10.5.192:8032

[3] Output that is in stderrdata field:

[2015-09-23 07:16:13,221: WARNING/Worker-17] Error 15/09/23 07:15:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/23 07:15:53 INFO client.RMProxy: Connecting to ResourceManager at  /10.10.5.192:8032
15/09/23 07:15:54 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 07:15:54 INFO input.FileInputFormat: Total input paths to process : 4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: number of splits:4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442999930174_0009
15/09/23 07:15:54 INFO impl.YarnClientImpl: Submitted application application_1442999930174_0009
15/09/23 07:15:54 INFO mapreduce.Job: The url to track the job: http://hadoop-coc-1:9046/proxy/application_1442999930174_0009/
15/09/23 07:15:54 INFO mapreduce.Job: Running job: job_1442999930174_0009
15/09/23 07:16:00 INFO mapreduce.Job: Job job_1442999930174_0009 running in uber mode : false
15/09/23 07:16:00 INFO mapreduce.Job:  map 0% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job: Job job_1442999930174_0009 completed successfully
15/09/23 07:16:13 INFO mapreduce.Job: Counters: 30
    File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=423900
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=472
            HDFS: Number of bytes written=148
            HDFS: Number of read operations=20
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=8
    Job Counters 
            Launched map tasks=4
            Data-local map tasks=4
            Total time spent by all maps in occupied slots (ms)=41232
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=41232
            Total vcore-seconds taken by all map tasks=41232
            Total megabyte-seconds taken by all map tasks=42221568
    Map-Reduce Framework
            Map input records=34
            Map output records=34
            Input split bytes=406
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=532
            CPU time spent (ms)=1320
            Physical memory (bytes) snapshot=245039104
            Virtual memory (bytes) snapshot=1272741888
            Total committed heap usage (bytes)=65273856
    File Input Format Counters 
xeon123
  • 819
  • 1
  • 10
  • 25

1 Answers1

1

Hadoop (specifically Log4j) just logs all of the [INFO] messages to stderr. From their entry on Configuration:

Hadoop logs messages to Log4j by default. Log4j is configured via log4j.properties on the classpath. This file defines both what is logged and where. For applications, the default root logger is "INFO,console", which logs all message at level INFO and above to the console's stderr. Servers log to the "INFO,DRFA", which logs to a file that is rolled daily. Log files are named $HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-.log

I have never actually tried redirecting the logs to stdout so I can't really help with that, but a promising answer from another user suggests:

// Answer by Rajkumar Singh
// to get your stdout and log message on the console you can use apache
// commons logging framework in to your mapper and reducer.

public class MyMapper extends Mapper<..,...,..,...>{
public static final Log log = LogFactory.getLog(MyMapper.class)
public void map() throws Exception{
// Log to stdout file
System.out.println("Map key "+ key);

//log to the syslog file
log.info("Map key "+ key);

if(log.isDebugEanbled()){
log.debug("Map key "+ key);
}
context.write(key,value);
}

I suggest giving it a try.

Community
  • 1
  • 1
Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253