Is it true that generally for smaller queries (expecting result in interactive fashion, in minutes, than hours) Tez performs better and for batch queries (taking hours) MR performs better as an execution engine? Or can we say that irrespective of query type, Tez is always the best choice?
1 Answers
Tez simplifies processing for both small scale (low-latency) and large-scale (high throughput) workloads. The more complex query is the more benefit from TEZ. For simple queries consisting of single map step it will be most probably no difference at all because there is nothing to optimize. TEZ represents query as a DAGĀ (directed acyclic graph) for a single job and eliminates unnecessary steps like read/write to durable storage, sort of the output from each Map, also enables containers reuse. Tez is always the best choice, for simple queries it will work not worse than MR and much better for complex queries. And consider this: For MR and for TEZ you have to tune different sets of configuration parameters, there are a lot of TEZ-specific and a lot of MR-specific. Choose TEZ and you will simplify you life even in cases when there is nothing to optimize. Also, Hive-on-MR has been deprecated in Hive 2 releases.

- 36,950
- 8
- 57
- 116