Questions tagged [hive-udf]

Please use this tag for user defined functions (UDF) for apache hive.

Apache Hive is a database built on top of Hadoop that provides the following:

  • Tools to enable easy data summarization (ETL)
  • Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
  • A mechanism to put structure on this data
  • An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

How to write good Hive question:

  1. Add clear textual problem description.
  2. Provide query and/or table DDL if applicable
  3. Provide exception message
  4. Provide input and desired output data example
  5. Questions about query performance should include EXPLAIN query output.
  6. Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
  7. Use proper code and text formatting

Official Website:

Useful Links:

64 questions
0
votes
1 answer

Using Hive Jars with Pyspark

The problem statement is usage of hive jars in py-spark code. We are following the below set of standard steps Create temporary function in pyspark code - spark.sql (" ") spark.sql("create temporary function public_upper_case_udf as…
0
votes
2 answers

Hive: Difference between CREATE FUNCTION and CREATE TEMPORARY FUNCTION in Hive UDF

I am new to the hive and I am working on a project where I need to create a few UDFs for data wrangling. During my research, I came across two syntaxes for creating UDF from added jars CREATE FUNCTION country AS…
0
votes
1 answer

Hive UDF : Generic UDF cannot access struct from nested map

here is my hive table create table if not exists dumdum (val map>>); insert into dumdum select map('A',map('1',named_struct('student_id','123a', 'age',11))); insert into dumdum select…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
0
votes
0 answers

Hive UDF return expected result but also added null and newline in result

I have written Hive UDF in Java for decoding the information, for that we used the below code. public Text evaluate(Text str) throws Exception { byte[] keyBytes = (SALT + KEY).getBytes("UTF8"); MessageDigest messageDigest =…
Sachin
  • 184
  • 1
  • 1
  • 7
0
votes
0 answers

Hive UDF - How to access column name

Would someone please let me know how to access the column name in simple hive udf. import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import Utils @Description(name =…
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137
0
votes
1 answer

Hive query executeQuery() hangs in java JDBC code

I have created a UDTF and I'm running below java hive JDBC code inside it to execute a hive query and get results. I'm able to get the connection to the hive2 server successfully but the code hangs indefinitely without any exception at…
MamtaJ
  • 1
  • 1
0
votes
1 answer

In INSERT INTO TABLE clause UDF is failing

I have created a UDF translateText(), which calls API to translate the given text and returns the correct result in select clause, but when I apply the INSERT INTO TABLE as given below: INSERT OVERWRITE TABLE gl_staging_eve.header_text select…
Farooque
  • 3,616
  • 2
  • 29
  • 41
0
votes
1 answer

Writing a UDF in Python using Pandas throwing error

We are trying to write UDFs of Hive in Python to clean the data. The UDF we tried was using Pandas and it is throwing the error. When we try using another python code without the Pandas it is working fine. Kindly help to understand the problem.…
s.c.
  • 23
  • 6
0
votes
1 answer

ImportError Python Hive UDF

I want to put some constants in one Python file and import it into another. I created two files, one with constants and one that imports it, and everything runs fine locally: constants.py: CONST = "hi guy" test_constants.py: from constants import…
Michael K
  • 2,196
  • 6
  • 34
  • 52
0
votes
0 answers

Hive: why CTAS can't read a file whereas select query can

I have put my file at /hadoop/yarn/local/usercache/root/test_abspath and want to read first line using my UDF. When I ran it using select test('ABCD','ABCD'); I could read the file but when I tried it using Create table as test_tb select…
TheBeginner
  • 405
  • 5
  • 23
0
votes
0 answers

Why Hive UDF not working with "Create Table as" query

I have written a UDF that works fine with select query. I have registered UDF with database 'db1' permanantly Eg. select db1.myUDF(column name, arg) from table_name; but when I am trying to create a new table from it, new table did not reflect…
TheBeginner
  • 405
  • 5
  • 23
0
votes
1 answer

GenericUDF's initialized method being called multiple times

I've a HiveUDF which extends GenericUDF, when I call the udf via spark.sql I am getting the correct results but the initialized method is called multiple times. Can't understand why that's happening?
0
votes
0 answers

hive udf with hbase connection on a secure cluster

I am trying to write a hive udf which connects to hbase table. But, the program is failing to access due to security exception and throwing below: javax.security.auth.login.LoginException: Unable to obtain password from user at Below is the code…
Raja
  • 513
  • 5
  • 18
0
votes
1 answer

Update JDBC Database table using storage handler and Hive

I have read that using Hive JDBC storage handler (https://github.com/qubole/Hive-JDBC-Storage-Handler), the external table in Hive can be created on different databases (MySQL, Oracle, DB2) and users can read from and write to JDBC databases using…
Ayan
  • 401
  • 1
  • 4
  • 10
0
votes
1 answer

Bug in my HiveUDF

Im trying to write a Hive UDF which checks a column in a Hive table and concatenates a string with it. my Hive table- cityTab schema and data: Schema: id int name char(30) rank int Data: 1 NewYork 10 2 Amsterdam 30 I…
Metadata
  • 2,127
  • 9
  • 56
  • 127