Questions tagged [hiveql]

Variant of SQL used in the Apache Hive data warehouse infrastructure. Use this tag for questions related to the Hive Query Language including HiveQL syntax and HiveQL-specific functions.

HiveQL is the dialect of used in Apache Hive. HiveQL is similar to other dialects of SQL, but has some limitations (e.g. absence of nested subqueries and top-level disjuncts) and additional features (e.g. MapJoin), compared to most other dialects. Apache Hive is a infrastructure package built on the distributed-computing framework, written in Java.

Please read How to write good SQL question section in SQL tag wiki and follow the same rules.

References

4862 questions
14
votes
2 answers

Difference between `load data inpath ` and `location` in hive?

At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me: 1 create table (name string, number double); load data inpath '/directory-path/file.csv'…
makansij
  • 9,303
  • 37
  • 105
  • 183
14
votes
10 answers

Comparing two tables for equality in HIVE

I have two tables, table1 and table2. Each with the same columns: key, c1, c2, c3 I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE): select count(*)…
Danzo
  • 553
  • 3
  • 13
  • 26
14
votes
3 answers

Finding the first row in a group using Hive

For a student database in the following format: Roll Number | School Name | Name | Age | Gender | Class | Subject | Marks how to find out who got the highest for each class? The below query returns the entire group, but I am interested in finding…
Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
13
votes
2 answers

DISTRIBUTE BY clause in HIVE

I am not able to understand what this DISTRIBUTE BY clause does in Hive. I know the definition that says, if we have DISTRIBUTE BY (city), this would send each city in a different reducer but I am not getting the same. Let us consider the data as…
User9523
  • 415
  • 1
  • 4
  • 18
13
votes
4 answers

No partition predicate found for Alias even when the partition predicate in present in the query

I have a table pos.pos_inv in hdfs which is partitioned by yyyymm. Below is the query: select DATE_ADD(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),5), to_date(from_unixtime(unix_timestamp(Inv.actvydt,…
jeff
  • 157
  • 1
  • 2
  • 8
13
votes
2 answers

HiveQL - How to find the column value is numeric or not using any UDF?

Basically i would like to return rows based on one column value. If the column contains non numeric values, then return those rows from a hive table. Any UDF is available in Hive?
Shankar
  • 8,529
  • 26
  • 90
  • 159
13
votes
2 answers

Row number functionality in Hive

How can I generate row numbers for an existing table while running a select query? For example: select row_number(), * from emp; I am using hive 0.13. I can't access external jars or udfs in my environment. The underlying files are in parquet…
Gayatri Mahesh
  • 327
  • 1
  • 3
  • 12
13
votes
3 answers

Hive: More clean way to SELECT AS and GROUP BY

I try to write Hive Sql like that SELECT count(1), substr(date, 1, 4) as year FROM *** GROUP BY year But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or…
twds
  • 333
  • 1
  • 4
  • 15
13
votes
4 answers

How to specify column list in hive insert into query

I have just installed and configured Apache Hive version 1.1.0. Then I have created a table by quering this query: create table person (name1 string, surname1 string); And then I want to add one row by: insert into person (name1, surname1) values…
Vasli Slavik
  • 309
  • 2
  • 4
  • 11
13
votes
6 answers

Find last day of a month in Hive

My question is : Is there a way to do find the last day of a month in Hive, like Oracle SQL function ? : LAST_DAY(D_Dernier_Jour) Thanks.
user2462699
  • 429
  • 2
  • 8
  • 17
12
votes
3 answers

What does the following fields: 'totalSize' and 'rawDataSize' mean in DESCRIBE EXTENDED query output in hive?

If one runs DESCRIBE EXTENDED command on any hive table the result presents totalSize and rawDataSize values near the end of the output. What do these fields mean? Ex: hive > DESCRIBE EXTENDED Output…
Henin RK
  • 298
  • 1
  • 2
  • 14
12
votes
5 answers

How to alter Hive partition column name

I have to change the partition column name (not partition spec), I looked for the commands in hive wiki and some google pages. I can find the options for altering the partition spec, i.e. For example In /table/country='US' I can change US to USA,…
Sunil
  • 131
  • 1
  • 1
  • 6
12
votes
4 answers

Hive LEFT SEMI JOIN for 'NOT EXISTS'

I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a. Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS…
mel
  • 1,566
  • 5
  • 17
  • 29
11
votes
1 answer

How to create an empty array of struct in hive?

I have a view in Hive 1.1.0, based on a condition, it should return an empty array or an array of struct Here is my code: select case when then array() else…
Joha
  • 935
  • 12
  • 32
11
votes
7 answers

Search a table in all databases in hive

In Hive, how do we search a table by name in all databases? I am a Teradata user. Is there any counterpart of systems tables (present in Teradata) like dbc.tables, dbc.columns which are present in HIVE?
v83rahul
  • 283
  • 2
  • 7
  • 20