I am doing join operation in hive. But when reducer reaches 99% reducer gets stuck.
Then i identified that there is skew data in table. Ex. In table A there is 1 million data and table B has 10k only.In table A joining column has 80% values are same and rest is other. So hive reducer stuck at that value.
Here is my query :
INSERT INTO TABLE xyz SELECT m.name, m.country, m.user_type, m.category FROM A m JOIN category n ON (m.name = n.name) where country=2 GROUP BY m.name, m.country, m.user_type, m.category;
So please suggest possible solution. How can i process join operation on this kind of data.