How bucketing helps in case of more than two tables, if at all it does.( Hive Sort Merge Bucket Join)

Asked Jun 17 '19 at 08:28

Active Jul 22 '19 at 12:27

Viewed 60 times

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle).

Ex: For join between two tables select a.col1,b.col2 from a join b on a.col1=b.col1 (both the tables are bucketed on col1 into same no of buckets)

But while joining with 3 or more tables on different columns,

Ex: Select a. col1,b.col3,c.col2,d.date from a join b on a.id=b.id join c on a.state=b.state join d on c.date=d.date

A scenario like this, how bucketing will help, if we don't want to split up the query in multiple smaller queries.

edited Jul 22 '19 at 12:27

asked Jun 17 '19 at 08:28

user3123372

0 Answers0