Below is the query which I am trying to run on Hive with execution engine as tez.
SELECT A.CITY, A.NAME, B.PRICE
,(ROW_NUMBER() OVER (PARTITION BY A.NAME ORDER BY B.PRICE) ) AS RNUM
FROM TABLE1 A
LEFT JOIN TABLE2 B
ON A.NAME = B.NAME
WHERE ( A.COLUMN2 >= B.COLUMN3 AND A.COLUMN2 < B.COLUMN4)
GROUP BY A.CITY, A.NAME, B.PRICE;
- When I run above query on my data in Hive it keeps running for hours without any results, but syntactically above query is correct.
- Both TABLEA and TABLEB has millions of records.
I tried changing the data format, increasing the container size, changing the number of reducers and changing heap size. Whatever parameter I change the query is being stuck.
In my further investigation I noticed that the where condition and window function are causing the query to run infinitely.
Here is my question:
- What is the change that will make the query run end to end on huge data?
Thanks for your help