2

I want to create hive query for the following.

insert into  tempTableName  
select distinct col_a
        ,  first_value(col_b)  
            over (partition by col_a 
            order by nvl(col_c,0) desc, length(col_b) asc, col_b asc) 
from tableA

As hive does not support first value.I want to know what could be the equivalent in simple query for first_value function. Any suggestions ??

user2978621
  • 803
  • 2
  • 11
  • 20
  • As there is growing interest in using Hive SQL through the SQL engine of [Apache Spark](https://spark.apache.org/docs/latest/sql-programming-guide.html), it's worth noting that Spark does support `first_value()`. – Sim Jul 21 '15 at 03:53

2 Answers2

2

I am not exactly familiar with the oracle semantics here, but isn't this just a group by and arg-min? Structs in hive compare in the order of their fields, so you can do something like this:

select col_a,
min(
  named_struct(
    'col_c', -coalesce(col_c, 0),
    'len' , length(col_b),
    'col_b', col_b
  )
).col_b
from tableA
group by col_a
Joe K
  • 18,204
  • 2
  • 36
  • 58
1

HIVE 0.11 does support FIRST_VALUE.

But as per HIVE JIRA, there's an open issue that you cannot have more than one ORDER BY column in first_value. You haven't reported what error you are getting, but if it's FAILED: SemanticException Range based Window Frame can have only 1 Sort Key, then you have to modify the ORDER BY columns.

Edit: If you are not on HIVE 0.11, then I would suggest installing a UDF for FIRST_VALUE. I guess that would be the straightforward way to do this. You might want to take a look at these UDFS.

visakh
  • 2,503
  • 8
  • 29
  • 55