Apache Spark Handling Skewed data -Composite Key

Question

I have two large tables. I am joining these two tables in Spark sql like

select * from table1 A Join table2 B on(A.client=B.client,A.sitecode=B.sitecode,A.spec_nbr=B.spec_nbr).

table 1 has skewed data and making the query run longer. I want to avoid skewed data by using the salting technique.

For this scenario how to apply the salting technique?

I am not able to find any relevant material on how to apply the salting technique. Any help is appreciated.

Does this answer your question? [Skewed dataset join in Spark?](https://stackoverflow.com/questions/40373577/skewed-dataset-join-in-spark) — Eyal, Oct 26 '21 at 06:58

score 0 · Answer 1 · answered Jan 28 '19 at 22:00

0

You could take a look at this answer and the article there. Possibly this is a duplicate.

answered Jan 28 '19 at 22:00

Fateax

1 Answers1