I have partitioned table, Table Structure
create table tab1
(
col1 int,
col2 string,
...
col50 int,
col51 int
)
partitioned by
(col50 int, col51 int)
stored as orc;
Currently we have ~17000 partitions and each partition will have minimum of ~50k records.
Below Query is taking more time ~ 90Mins
SELECT DISTINCT col2 FROM tab1
select col2 from (select col2, row_number() over (partition by col2 order by col3) as rnk from tab1) t1 where t1.rnk=1
Is there a way we can reduce the execution time, Thanks in advance