Questions tagged [hiveql]

Variant of SQL used in the Apache Hive data warehouse infrastructure. Use this tag for questions related to the Hive Query Language including HiveQL syntax and HiveQL-specific functions.

HiveQL is the dialect of used in Apache Hive. HiveQL is similar to other dialects of SQL, but has some limitations (e.g. absence of nested subqueries and top-level disjuncts) and additional features (e.g. MapJoin), compared to most other dialects. Apache Hive is a infrastructure package built on the distributed-computing framework, written in Java.

Please read How to write good SQL question section in SQL tag wiki and follow the same rules.

References

4862 questions
1
vote
0 answers

Union ALL with Null Arrays in Hive

I'm looking to union two tables together with slightly different columns. IN the past I have just added the additional columns to the smaller table and used select null as col3. In this recent example the column is an array and selecting null…
ampg1820
  • 11
  • 1
1
vote
1 answer

how to convert jsonarray to multi column from hive

example: there is a json array column(type:string) from a hive table like: "[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]" how to convert it into : name age alice 14 by hive sql? I've tried lateral view explode…
yoare
  • 11
  • 2
1
vote
2 answers

How to calculate week number in hive or python? The week should run from Saturday to Friday

I have tried the following: Select weekofyear(current_timestamp) But this calculates the week from Monday to Sunday. I want the week from Saturday to Friday, based on that it should populate week number. In hive if we can't do, how about in…
MJ_mj
  • 11
  • 2
1
vote
1 answer

Loop through parameter table and return union of query results

I have a Hive table and I need to run a query similar to the one below for different values of the parameters date,identifier1,identifier2,lower and upper and union the results together. Select col1, col2, new_time, sum(col3), case when "date"…
hghg hghg
  • 127
  • 9
1
vote
0 answers

Hive regexp_replace on multiple columns

I am using a hive query which has many columns, and I need to replace the comma character in all columns. There are hundreds of columns, some numeric, some text, and doing it one by one one will take me a long time. So is there something like:…
Victor
  • 16,609
  • 71
  • 229
  • 409
1
vote
1 answer

REGEXP_SUBSTR equivalent in HIVE SQL?

Good morning. I am looking for the equivalent in Hive of REGEXP_SUBSTR, which I use in other databases like Teradata. My specific case is: I have a field that has a string. It looks something like this: Row1: ABC ACC ADF AFA BAC CAF Row2: ACC BAC…
lottidah
  • 11
  • 1
  • 3
1
vote
1 answer

How to md5 an entire row in Hive?

Using Hive, I want to hash an entire row in the query. I tried the following (don't mind the ${xxx}, query is built from a bash script): SELECT md5(*) FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE ${TABLE_DATE_FIELD} <=…
Itération 122442
  • 2,644
  • 2
  • 27
  • 73
1
vote
2 answers

Difference between % vs * in string comparison in Hive

When trying to list down all tables names in a database having a specific name format, the following query works fine : show tables like '*case*'; while the following does not show tables like '%case%'; On the other hand, when comparing the…
Kshitij Kohli
  • 4,055
  • 4
  • 19
  • 27
1
vote
1 answer

Trying to get multiple rows with the values in the array in hive by lateral view and explode

I have a table with two columns and i am trying get multiple rows with the values in the array.My table is like this. date users 2019-01-01 ["U00001","U00002","U00002"] I am trying to get the output like below date …
Rahul
  • 467
  • 1
  • 8
  • 24
1
vote
1 answer

Truncate date values in Hive

I have the following value (format) for a column in a Hive table : 2019-04-22 08:25:43.243754000000 (yyyy-MM-dd HH:mm:ss.SSSSSSSSSSSS) Is there a way to truncate this value as below ? : 2019-04-22 08:25:43.243754 (yyyy-MM-dd HH:mm:ss.SSSSSS) Thank…
Mamaf
  • 345
  • 4
  • 10
1
vote
1 answer

Hive Query: How to use group by with rank?

I have a table like below year int month int …
tuk
  • 5,941
  • 14
  • 79
  • 162
1
vote
1 answer

Hive SQL: How to subset groups of records by max time

I have records grouped by ID1 and ID2 such as the following: ID1 ID2 date_time Apple pear 2020-03-09T12:11:25:622Z Apple pear 2020-03-09T12:23:36:825Z Apple lemon 2020-03-08T08:01:16:030Z Apple lemon …
lydias
  • 841
  • 1
  • 14
  • 32
1
vote
2 answers

hive - how to automatically append data to hive table every day?

I have a directory in HDFS, where .csv files with fixed structure and column names will be dumped at the end of every day that may look like this: I have a hive table that should have new data appended to it, at the beginning of every day, with…
Naveen Reddy Marthala
  • 2,622
  • 4
  • 35
  • 67
1
vote
2 answers

Hive: Convert string datetime with missing seconds in "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"

I'm using the following code to convert a string datetime variable to datetime, but the converted string is missing SSS part. Code used: cast(FROM_UNIXTIME(UNIX_TIMESTAMP(oldtime, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"),"yyyy-MM-dd HH:mm:ss.SSS") as…
lydias
  • 841
  • 1
  • 14
  • 32
1
vote
1 answer

Merge two hive table(Different column size)- pyspark

I have one hive table with schema Name ,Contact,Address,Subject Name Contact Address Subject abc 1111 Mumbai maths egf 2222 nashik science pqr 3333 delhi history And other table with schema **Name ,Contact**…
vishal
  • 25
  • 8