0

I have been banging my head against the wall for a while now on how to get a hive equivalent of MS SQL's IDENTITY column added to a table and auto incremented. I have found many references to org.apache.hadoop.hive.contrib.udf.UDFRowSequence but I have no idea where that is in my HortonWorks 2.3 install of my cluster. I have no idea where to start on this. I have seen an java file here which I assume I have to compile but once I have a .jar where does it go? I have tried using a SerDe jar for another task and I could never get hive to see/use it (see my question on this here). I have tried to follow along with this case study on creating a custom UDF here. However, I can find no path like they are describing in my Hortonworks install (the path looks something like ql/src/java/org/apache/hadoop/hive/ql/udf/generic/).

It seems like each tutorial/guide/reference to creating a UDF is assuming some knowledge that I do not have yet. How can I create/use the UDFRowSequence functionality in Hortonworks install of hive?

Community
  • 1
  • 1
wergeld
  • 14,332
  • 8
  • 51
  • 81

2 Answers2

1

In order to use the UDF inside Hive, compile the Java code and package the UDF bytecode class file into a JAR file. Then, open your Hive session, add the JAR to the classpath

hive> ADD JAR full path to jar file;

The path here needs to be the full path of your local filesystem where you have to put your jar file.

and use a CREATE FUNCTION statement to define a function that uses the Java class:

hive> CREATE TEMPORARY FUNCTION functionname
> AS 'classname with full package name';

Then you can use the functionname in your hive session.

The path ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ is actually referring to the package name of java class.

madhu
  • 1,140
  • 8
  • 14
  • When doing the ADD JAR is this permanent? In a later hive session can I just recreate my TEMPORARY FUNCTION again or do I need to ADD JAR again as well? If so, how to make this JAR permanently part of my hive? – wergeld Oct 13 '15 at 15:57
  • We can create a .hiverc file in $HIVE_HOME/bin/ and/or add ADD JAR full path to jar file CREATE TEMPORARY FUNCTION myfunction AS 'classname with full package name' This file is sourced before each invocation of the Hive shell, making it an ideal place to add jars, add files and register UDFs. Then we will be able to use our own UDFs permanently. – madhu Oct 14 '15 at 05:11
  • Where is `$HIVE_HOME`? This is the issue I am facing as per my linked question as well. The docs that list where $HIVE_HOME should be I do not have on my system. – wergeld Oct 14 '15 at 11:35
  • Your hive installation directory in local filesystem is your $HIVE_HOME – madhu Oct 14 '15 at 12:24
0

Copy jar file inside $HIVE_HOME/lib and restart hive client

cp full_path_to_jar_file $HIVE_HOME/lib/

Kaustuv
  • 728
  • 8
  • 13