I have a simple query:
Select *
from A
left join b on A.b = b.b
left join c on A.c = c.c
left join d on A.d = d.d
left join e on A.e = e.e
...
~20 tables
All tables b,c,d,e etc are small and therefore all joins are broadcast joins
The problem is that table A is big (like 300m records) and ill-partitioned. It has ~80 partitions and 15 of them contain 90% of data.
So, when broadcast join happens, only 15 cores do all join math and it takes 2 hours to finish.
My solution was to force shuffling of A by adding "order by"
Select *
from
(select * from A order by A.a) A
left join b on A.b = b.b
left join c on A.c = c.c
left join d on A.d = d.d
left join e on A.e = e.e
...
Here A.a is unique field combination.
Now A is shuffled to 200(by default) semi-equal partitions and resources are used in optimal way, it takes 35 mins to finish.
Is there more elegant way to force shuffle of table A, like some hint magic?