55

I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in my program. But I cant seem to find them in any of the logs.

jason
  • 3,471
  • 6
  • 30
  • 43

5 Answers5

61

Actually stdout only shows the System.out.println() of the non-map reduce classes.

The System.out.println() for map and reduce phases can be seen in the logs. Easy way to access the logs is

http://localhost:50030/jobtracker.jsp->click on the completed job->click on map or reduce task->click on tasknumber->task logs->stdout logs.

Hope this helps

Orkun
  • 6,998
  • 8
  • 56
  • 103
rOrlig
  • 2,489
  • 4
  • 35
  • 48
26

Another way is through the terminal:

1) Go into your Hadoop_Installtion directory, then into "logs/userlogs".
2) Open your job_id directory.
3) Check directories with _m_ if you want the mapper output or _r_ if you're looking for reducers.

Example: In Hadoop-20.2.0:

> ls ~/hadoop-0.20.2/logs/userlogs/attempt_201209031127_0002_m_000000_0/
log.index   stderr      stdout      syslog

The above means:
Hadoop_Installation: ~/hadoop-0.20.2
job_id: job_201209031127_0002
_m_: map task , "map number": _000000_

4) open stdout if you used "system.out.println" or stderr if you used "system.err.append".

PS. other hadoop versions might have a sight different hierarchy but they're all should be under $Hadoop_Installtion/logs/userlogs.

Mark
  • 261
  • 3
  • 3
16

On a Hadoop cluster with yarn, you can fetch the logs, including stdout, with:

yarn logs -applicationId application_1383601692319_0008

For some reason, I've found this to be more complete than what I see in the webinterface. The webinterface did not list the output of System.out.println() for me.

MacFreek
  • 3,207
  • 2
  • 31
  • 41
  • 4
    Thanks for giving answer for hadoop2. can you tell me why I am getting this error after executing that command? `Logs not available at /tmp/logs/hadoopuser/logs/application_1441282624447_3854` and `Log aggregation has not completed or is not enabled` – Jagadish Talluri Nov 23 '15 at 16:25
  • The job history interface corresponding to hadoop 2.7 also does not list System.out.println for me whereas the command provided here does. – Paul Nov 05 '21 at 17:53
8

to get your stdout and log message on the console you can use apache commons logging framework in to your mapper and reducer.

public class MyMapper extends Mapper<..,...,..,...> {

    public static final Log log = LogFactory.getLog(MyMapper.class)

    public void map() throws Exception{
        // Log to stdout file
        System.out.println("Map key "+ key);

        //log to the syslog file
        log.info("Map key "+ key);

        if(log.isDebugEanbled()){
            log.debug("Map key "+ key);
        }

        context.write(key,value);
    }
}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Rajkumar Singh
  • 1,557
  • 12
  • 5
0

After most of the options above did not work for me, I realized that on my single node cluster, I can use this simple method:

static private PrintStream console_log;
  static private boolean node_was_initialized = false;
  private static void logPrint(String line){
    if(!node_was_initialized){
      try{
        console_log = new PrintStream(new FileOutputStream("/tmp/my_mapred_log.txt", true));
      } catch (FileNotFoundException e){
        return;
      }
      node_was_initialized = true;
    }
    console_log.println(line);
  }

Which, for example, can be used like:

public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
      logPrint("map input: key-" + key.toString() + ", value-" + value.toString());
      //actual impl of 'map'...
    }

After that, the prints can be viewed with: cat /tmp/my_mapred_log.txt. To get rid of prints from prior hadoop runs you can simple use rm /tmp/my_mapred_log.txt before running hadoop again.

notes:

  • The solution by Rajkumar Singh is likely better if you have the time to download and integrate a new library.
  • This could work for multi-node clusters if you have a way to access "/tmp/my_mapred_log.txt" on each worker node machine.
  • If for some strange reason you already have a file named "/tmp/my_mapred_log.txt", consider changing the name (just make sure to give an absolute path).
yosef
  • 124
  • 5