3

I'm running a simple mapreduce program wordcount agian Apache Hadoop 2.6.0. The hadoop is running distributedly (several nodes). However, I'm not able to see any stderr and stdout from yarn job history. (but I can see the syslog)

The wordcount program is really simple, just for demo purpose.

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCount {
  public static final Log LOG = LogFactory.getLog(WordCount.class);

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      LOG.info("LOG - map function invoked");
      System.out.println("stdout - map function invoded");
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
              ) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
          sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
      }
    }

    public static void main(String[] args) throws Exception {
      Configuration conf = new Configuration();
      conf.set("mapreduce.job.jar","/space/tmp/jar/wordCount.jar");
      Job job = Job.getInstance(conf, "word count");
      job.setJarByClass(WordCount.class);
      job.setMapperClass(TokenizerMapper.class);
      job.setCombinerClass(IntSumReducer.class);
      job.setReducerClass(IntSumReducer.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);
      FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/user/jsun/input"));              
      FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/user/jsun/output"));

      System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
  }

Note in the map function of Mapper class, I added two statements:

LOG.info("LOG - map function invoked");
System.out.println("stdout - map function invoded");

These two statements are to test whether I can see logging from hadoop server. I can successfully run the program. But if I go to localhost:8088 to see the application history and then "logs", I see nothing in "stdout", and in "stderr":

log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

I think there is some configuration needed to get those output, but not sure which piece of information is missing. I searched online as well as in stackoverflow. Some people mentioned container-log4j.properties but they are not specific about how to configure that file and where to put.

One thing to note is I also tried the job with Hortonworks Data Platform 2.2 and Cloudera 5.4. The result is the same. I remember when I dealt with some previous version of hadoop (hadoop 1.x), I can easily see the loggings from same place. So I guess this is something new in hadoop 2.x

=======

As a comparison, if I make the apache hadoop run in local mode (meaning LocalJobRunner), I can see some loggings in console like this:

[2015-09-08 15:57:25,992]org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:998) INFO:kvstart = 26214396; length = 6553600
[2015-09-08 15:57:25,996]org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402) INFO:Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[2015-09-08 15:57:26,064]WordCount$TokenizerMapper.map(WordCount.java:28) INFO:LOG - map function invoked
stdout - map function invoded
[2015-09-08 15:57:26,075]org.apache.hadoop.mapred.LocalJobRunner$Job.statusUpdate(LocalJobRunner.java:591) INFO:
[2015-09-08 15:57:26,077]org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1457) INFO:Starting flush of map output
[2015-09-08 15:57:26,077]org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1475) INFO:Spilling map output

These kind of loggings ("map function is invoked") is what I expected in hadoop server logging.

pythonician_plus_plus
  • 1,244
  • 3
  • 15
  • 38

1 Answers1

1

All the sysout written in Map-Reduce program can not be seen on console. It is because map-reduce run in multiple parallel copies across the cluster, so there is no concept of a single console with output.

However, The System.out.println() for map and reduce phases can be seen in the job logs. Easy way to access the logs is

open the jobtracker web console - http://localhost:50030/jobtracker.jsp
click on the completed job
click on map or reduce task
click on tasknumber
Go to task logs
Check stdout logs.

Please note that if you are not able to locate URL, just look into the console log for jobtracker URL.

Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53
  • 1
    The problem has clearly described that, there is nothing in 'stdout' in yarn job history as expected. And I'm not trying to find the output from the 'console'. And since you are saying 'job tracker', I guess you mean mapreduce v1. But the problem is discussing about yarn. – pythonician_plus_plus Sep 09 '15 at 21:11