Are setup and cleanup methods called in each mapper and reducer tasks respectively? Or are they called only once at the start of overall mapper and reducer jobs?
5 Answers
They are called for each task, so if you have 20 mappers running, the setup / cleanup will be called for each one.
One gotcha is the standard run method for both Mapper and Reducer does not catch exceptions around the map / reduce methods - so if an exception is thrown in these methods, the clean up method will not be called.
2020 Edit: As noted in the comments, this statement from 2012 (Hadoop 0.20) is no longer true, the cleanup is called as part of a finally block.

- 1
- 1

- 29,949
- 4
- 71
- 93
-
4One can always call cleanup method in the catch clause of an exception in map/reduce. However this requires intelligent analysis of possible exceptions and putting in `try/catch` clauses to catch them. – abhinavkulkarni Oct 09 '13 at 18:16
-
1`One gotcha is the standard run method for both Mapper and Reducer does not catch exceptions around the map / reduce methods` This is not true, or at least not true anymore. The default run() implementation wraps the map() and reduce() methods in a try/finally. [source](https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Mapper.java#L142-L151). – Nick ODell Sep 29 '20 at 18:01
-
@NickODell - Thanks for the update, feel free to edit the answer in future – Chris White Sep 29 '20 at 23:51
One clarification is helpful. The setup/cleanup methods are used for initialization and clean up at task level. Within a task, first initialization happens with a single call to setup() method and then all calls to map() [or reduce()] function will be done. After that another single call will be made to cleanup() method before exiting the task.

- 51
- 1
- 1
It's called per Mapper task or Reducer task. Here is the hadoop code.
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
}
} finally {
cleanup(context);
}
}

- 51
- 1
- 4
According to the mapreduce documentation setup and cleanup are called for each Mapper and Reducer tasks.

- 16,929
- 1
- 54
- 71

- 51
- 5
on the reducer you can on the job do job.setNumReduceTasks(1); and that way the setup and clean-up of the reducer only will be run once.

- 6,691
- 18
- 61
- 99