0

I have two large tables. I am joining these two tables in Spark sql like

select * from table1 A Join table2 B on(A.client=B.client,A.sitecode=B.sitecode,A.spec_nbr=B.spec_nbr).

table 1 has skewed data and making the query run longer. I want to avoid skewed data by using the salting technique.

For this scenario how to apply the salting technique?

I am not able to find any relevant material on how to apply the salting technique. Any help is appreciated.

lingamaneni
  • 65
  • 2
  • 10
  • 1
    Does this answer your question? [Skewed dataset join in Spark?](https://stackoverflow.com/questions/40373577/skewed-dataset-join-in-spark) – Eyal Oct 26 '21 at 06:58

1 Answers1

0

You could take a look at this answer and the article there. Possibly this is a duplicate.

https://stackoverflow.com/a/40376978/5723349

Fateax
  • 203
  • 1
  • 2
  • 11