In terms of performance, there is no reason not to re-use the containers, Execution Efficiency section of this paper explains very well, and this is why the default value for this parameter is true
.
But, I think there are some cases which might explain why this feature is still configurable;
- You may want to disable it for workaround purpose. For example, this hive ticket is still unresolved and when
tez.am.container.reuse.enabled=false
the problematic query works fine. If my production case is critical, instead of being completely blocked, I may prefer running my jobs without re-using the containers.
- The property may conflict with some other properties, and based on your priority, you may wanna give up on performance. For example in Configure Tez Container Reuse doc, there is a warning which says;
Do not use the tez.queue.name
configuration parameter because it sets
all Tez jobs to run on one particular queue.
- As a last item, I saw another warning on this doc;
Enabling this parameter improves performance by avoiding the memory overhead of reallocating container resources for every task. However, disable this parameter if the tasks contain memory leaks or use static variables.