21

Are setup and cleanup methods called in each mapper and reducer tasks respectively? Or are they called only once at the start of overall mapper and reducer jobs?

kee
  • 10,969
  • 24
  • 107
  • 168

5 Answers5

27

They are called for each task, so if you have 20 mappers running, the setup / cleanup will be called for each one.

One gotcha is the standard run method for both Mapper and Reducer does not catch exceptions around the map / reduce methods - so if an exception is thrown in these methods, the clean up method will not be called.

2020 Edit: As noted in the comments, this statement from 2012 (Hadoop 0.20) is no longer true, the cleanup is called as part of a finally block.

Community
  • 1
  • 1
Chris White
  • 29,949
  • 4
  • 71
  • 93
  • 4
    One can always call cleanup method in the catch clause of an exception in map/reduce. However this requires intelligent analysis of possible exceptions and putting in `try/catch` clauses to catch them. – abhinavkulkarni Oct 09 '13 at 18:16
  • 1
    `One gotcha is the standard run method for both Mapper and Reducer does not catch exceptions around the map / reduce methods` This is not true, or at least not true anymore. The default run() implementation wraps the map() and reduce() methods in a try/finally. [source](https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Mapper.java#L142-L151). – Nick ODell Sep 29 '20 at 18:01
  • @NickODell - Thanks for the update, feel free to edit the answer in future – Chris White Sep 29 '20 at 23:51
5

One clarification is helpful. The setup/cleanup methods are used for initialization and clean up at task level. Within a task, first initialization happens with a single call to setup() method and then all calls to map() [or reduce()] function will be done. After that another single call will be made to cleanup() method before exiting the task.

user3163904
  • 51
  • 1
  • 1
4

It's called per Mapper task or Reducer task. Here is the hadoop code.

public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
      }
    } finally {
      cleanup(context);
    }
  }
KaiZhao
  • 51
  • 1
  • 4
2

According to the mapreduce documentation setup and cleanup are called for each Mapper and Reducer tasks.

Adaephon
  • 16,929
  • 1
  • 54
  • 71
1

on the reducer you can on the job do job.setNumReduceTasks(1); and that way the setup and clean-up of the reducer only will be run once.

Astronaut
  • 6,691
  • 18
  • 61
  • 99