Join two large tables in Tez

Asked Mar 17 '17 at 08:16

Active Sep 16 '17 at 07:41

Viewed 381 times

Are there any recommendations to speed up query performance for joining two very large hive tables (> 2 TB) ? Execution engine used is Tez . Both the tables are unpartitioned and in text format. Cluster is having 64 nodes with 128 GB ram each.

edited Sep 22 '17 at 17:48

Community

asked Mar 17 '17 at 08:16

KBR

1

The smaller the data volumes, the faster the query - therefore, columnar. – David דודו Markovitz Mar 17 '17 at 08:19
thanks Dudu...are you suggesting to use columnar data format for the tables like Parquet or ORC ? – KBR Mar 17 '17 at 08:27
Definitely. In addition -https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution. (Currently for ORC, I think there is a similar project for Parquet) – David דודו Markovitz Mar 17 '17 at 08:34

Join two large tables in Tez

0 Answers0