3

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle).

Ex: For join between two tables select a.col1,b.col2 from a join b on a.col1=b.col1 (both the tables are bucketed on col1 into same no of buckets)

But while joining with 3 or more tables on different columns,

Ex: Select a. col1,b.col3,c.col2,d.date from a join b on a.id=b.id join c on a.state=b.state join d on c.date=d.date

A scenario like this, how bucketing will help, if we don't want to split up the query in multiple smaller queries.

user3123372
  • 704
  • 1
  • 10
  • 26

0 Answers0