Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

Question

I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in my program. But I cant seem to find them in any of the logs.

score 61 · Accepted Answer · edited Jun 19 '15 at 08:23

61

Actually stdout only shows the System.out.println() of the non-map reduce classes.

The System.out.println() for map and reduce phases can be seen in the logs. Easy way to access the logs is

http://localhost:50030/jobtracker.jsp->click on the completed job->click on map or reduce task->click on tasknumber->task logs->stdout logs.

Hope this helps

edited Jun 19 '15 at 08:23

Orkun

6,998
8
56
103

answered Apr 26 '11 at 03:07

rOrlig

2,489
4
35
48

The same approach applies also using Oozie under Hue. Oozie schedules MR2 map jobs, but does not show logs properly. To see them, you should go under jobtracker.jsp. – Ameba Spugnosa Jul 01 '16 at 09:19
JobTracker doesn't existe in Hadoop 2 – Luis Vazquez Dec 02 '17 at 03:50
I found mine on: http://localhost:9870/logs/userlogs/ – Jan D.M. Apr 11 '21 at 15:12

score 26 · Answer 2 · answered Sep 03 '12 at 18:59

Another way is through the terminal:

1) Go into your Hadoop_Installtion directory, then into "logs/userlogs".
2) Open your job_id directory.
3) Check directories with _m_ if you want the mapper output or _r_ if you're looking for reducers.

Example: In Hadoop-20.2.0:

> ls ~/hadoop-0.20.2/logs/userlogs/attempt_201209031127_0002_m_000000_0/
log.index   stderr      stdout      syslog

The above means:
Hadoop_Installation: ~/hadoop-0.20.2
job_id: job_201209031127_0002
_m_: map task , "map number": _000000_

4) open stdout if you used "system.out.println" or stderr if you used "system.err.append".

PS. other hadoop versions might have a sight different hierarchy but they're all should be under $Hadoop_Installtion/logs/userlogs.

MacFreek · Answer 3 · 2015-09-14T21:58:43.287

16

On a Hadoop cluster with yarn, you can fetch the logs, including stdout, with:

yarn logs -applicationId application_1383601692319_0008

For some reason, I've found this to be more complete than what I see in the webinterface. The webinterface did not list the output of System.out.println() for me.

edited Sep 14 '15 at 21:58

answered Sep 12 '15 at 05:44

MacFreek

3,207
2
31
41

4

Thanks for giving answer for hadoop2. can you tell me why I am getting this error after executing that command? `Logs not available at /tmp/logs/hadoopuser/logs/application_1441282624447_3854` and `Log aggregation has not completed or is not enabled` – Jagadish Talluri Nov 23 '15 at 16:25
The job history interface corresponding to hadoop 2.7 also does not list System.out.println for me whereas the command provided here does. – Paul Nov 05 '21 at 17:53

score 8 · Answer 4 · edited Jul 29 '17 at 04:25

to get your stdout and log message on the console you can use apache commons logging framework in to your mapper and reducer.

public class MyMapper extends Mapper<..,...,..,...> {

    public static final Log log = LogFactory.getLog(MyMapper.class)

    public void map() throws Exception{
        // Log to stdout file
        System.out.println("Map key "+ key);

        //log to the syslog file
        log.info("Map key "+ key);

        if(log.isDebugEanbled()){
            log.debug("Map key "+ key);
        }

        context.write(key,value);
    }
}

yosef · Answer 5 · 2021-09-29T10:07:44.160

After most of the options above did not work for me, I realized that on my single node cluster, I can use this simple method:

static private PrintStream console_log;
  static private boolean node_was_initialized = false;
  private static void logPrint(String line){
    if(!node_was_initialized){
      try{
        console_log = new PrintStream(new FileOutputStream("/tmp/my_mapred_log.txt", true));
      } catch (FileNotFoundException e){
        return;
      }
      node_was_initialized = true;
    }
    console_log.println(line);
  }

Which, for example, can be used like:

public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
      logPrint("map input: key-" + key.toString() + ", value-" + value.toString());
      //actual impl of 'map'...
    }

After that, the prints can be viewed with: cat /tmp/my_mapred_log.txt. To get rid of prints from prior hadoop runs you can simple use rm /tmp/my_mapred_log.txt before running hadoop again.

notes:

The solution by Rajkumar Singh is likely better if you have the time to download and integrate a new library.
This could work for multi-node clusters if you have a way to access "/tmp/my_mapred_log.txt" on each worker node machine.
If for some strange reason you already have a file named "/tmp/my_mapred_log.txt", consider changing the name (just make sure to give an absolute path).

Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

5 Answers5

Linked

Related