2

I have three partitioned tables (Yearly Partitions) in Hive and all the tables have multiple partitions. I am joining all these three tables as part of my requirement. Now I want to run this sql only for the latest partition and not for the ones created before.

I tried to use max(partition) in where clause but it seems it is not supported

I did something like below (not the exact code. Just a code concept)

select
a.*,
b.*,
c.*
from table1 a
left join table2 b on a.ID = b.ID
left join table3 c on a.ID = c.ID
where
a.year = max(a.year) and
b.year = max(b.year) and
c.year = max(c.year)

I got this error

FAILED: SemanticException [Error 10128]: Line 108:23 Not yet supported place for UDAF 'max'

I can use multiple where clauses with a subquery containing "select max(year) from table" for all the table but that's not seem to be a feasible one. Any ideas on how to achieve this ?

UPDATE I tried the where clause with conditions below but it seems only one suq-query is supported in the where clause. Not sure how to resolve this. Appreciating any inputs on this

where
a.year in (select max(year) from table1) and
b.year in (select max(year) from table2) and
c.year in (select max(year) from table3
JKC
  • 2,498
  • 6
  • 30
  • 56
  • why do you think it is not feasible? i'm using that one to get maxes – mangusta Oct 31 '18 at 13:15
  • Ok. For easy reference I mentioned as the table has only year as the partition but in actual case it has more partition columns and I have more tables to join. So I don't think it's feasible. – JKC Oct 31 '18 at 13:24
  • does it matter how many partition types the table has? it's all about modifying "where" clause. moreover, your aggregate function (i.e. max) is applied on partition column, not a usual one, so even the speed is not a concern here – mangusta Oct 31 '18 at 13:56
  • 1
    @mangusta Thanks for your inputs . I will try the same and post it here if there is any performance discrepancy. – JKC Oct 31 '18 at 14:06
  • @mangusta I am not able to introduce more than one subqueries in the where clause. Updated the question to contain more details. Any ideas ? – JKC Nov 02 '18 at 05:28
  • how about using the one in my answer below – mangusta Nov 02 '18 at 06:18

1 Answers1

2

Modified version:

    select
    <columns>
    from  
    (  
     select 
     <columns> 
     from 
     table1 a 
     where a.year in (select max(year) from table1) 
    ) a1
    left join 
    (
     select 
     <columns> 
     from 
     table2 b 
     where b.year in (select max(year) from table2) 
    ) b1 on a1.ID = b1.ID
    left join 
    (
     select 
     <columns> 
     from 
     table3 c 
     where c.year in (select max(year) from table3) 
    ) c1 on a1.ID = c1.ID
;
mangusta
  • 3,470
  • 5
  • 24
  • 47