Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

Tools to enable easy data summarization (ETL)
Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
A mechanism to put structure on this data
An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

Scalability (scale out with more machines added dynamically to the Hadoop cluster)
Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
Fault-tolerance
Loose-coupling with its input formats
Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

Add clear textual problem description.
Provide query and/or table DDL if applicable
Provide exception message
Provide input and desired output data example
Questions about query performance should include EXPLAIN query output.
Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
Use proper code and text formatting

Official links:

Useful Links:

21846 questions

votes

8 answers

Alter hive table add or drop column

I have orc table in hive I want to drop column from this table ALTER TABLE table_name drop col_name; but I am getting the following exception Error occurred executing hive query: OK FAILED: ParseException line 1:35 mismatched input 'user_id1'…

hadoop hive

asked Dec 10 '15 at 09:31

Aryan Singh

votes

4 answers

How to calculate Date difference in Hive

I'm a novice. I have a employee table with a column specifying the joining date and I want to retrieve the list of employees who have joined in the last 3 months. I understand we can get the current date using from_unixtime(unix_timestamp()). How do…

hadoop hive hiveql

asked May 29 '15 at 05:21

Holmes

1,059
2
17
25

votes

6 answers

Hive query to quickly find table size (number of rows)

Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce job? (Which is why I want to avoid COUNT(*).) I tried DESCRIBE EXTENDED, but that yielded numRows=0 which is obviously not…

hadoop hive

asked Jan 18 '14 at 19:04

xenocyon

2,409
3
20
22

votes

5 answers

how to replace characters in hive?

I have a string column description in a hive table which may contain tab characters '\t', these characters are however messing some views when connecting hive to an external application. is there a simple way to get rid of all tab characters in that…

hadoop hive

asked Aug 06 '13 at 21:05

user1745713

votes

7 answers

COALESCE with Hive SQL

Since there is no IFNULL, ISNULL, or NVL function supported on Hive, I'm having trouble converting NULL to 0. I tried COALESCE(*column name*, 0) but received the below error message: Argument type mismatch 0: The expressions after COALESCE should…

sql hive

asked Nov 19 '12 at 20:13

Parsa

1,137
1
11
15

votes

4 answers

How to rename a hive table without changing location?

Based on the Hive doc below: Rename Table ALTER TABLE table_name RENAME TO new_table_name; This statement lets you change the name of a table to a different name. As of version 0.6, a rename on a managed table moves its HDFS location as well.…

hadoop hive hiveql

asked Mar 12 '16 at 21:24

Osiris

1,007
4
17
30

votes

3 answers

Select top 2 rows in Hive

I'm trying to retrieve top 2 tables from my employee list based on salary in hive (version 0.11). Since it doesn't support TOP function, is there any alternatives? Or do we have define a UDF?

hadoop hive hiveql

asked May 25 '15 at 15:41

Holmes

1,059
2
17
25

votes

3 answers

How to calculate median in Hive

I have a hive table, name age sal A 45 1222 B 50 4555 c 44 8888 D 78 1222 E 12 7888 F 23 4555 I want to calculate median of age column. Below is my approach select min(age)…

hive hiveql

asked Nov 11 '14 at 10:51

Amaresh

3,231
7
37
60

votes

1 answer

Add a column in a table in HIVE QL

I'm writing a code in HIVE to create a table consisting of 1300 rows and 6 columns: create table test1 as SELECT cd_screen_function, SUM(access_count) AS max_count, MIN(response_time_min) as response_time_min, AVG(response_time_avg)…

hadoop hive hiveql

asked Oct 25 '13 at 12:09

user2532312

votes

3 answers

Is there a way to alter column type in hive table?

The current schema is: hive> describe tableA; OK id int ts timestamp I want to change ts column to be BIGINT without dropping table and recreate again. Is it possible?

hive metadata

asked Jul 05 '13 at 22:21

interskh

2,511
4
20
20

votes

3 answers

Compress file on S3

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed. I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the…

amazon-s3 compression hive file-transfer emr

asked Jan 24 '13 at 06:24

Matt Joiner

112,946
110
377
526

votes

5 answers

converting to timestamp with time zone failed on Athena

I'm trying to create to following view: CREATE OR REPLACE VIEW view_events AS ( SELECT "rank"() OVER (PARTITION BY "tb1"."innerid" ORDER BY "tb1"."date" ASC) "r" , "tb2"."opcode" , "tb1"."innerid" , "tb1"."date" ,…

sql date hive amazon-athena timestamp-with-timezone

asked Jun 13 '18 at 08:52

Gal Itzhak

votes

8 answers

Transferring hive table from one database to another

I need to move a hive table from one database to another. How can I do that?

hive hiveql

asked Apr 21 '14 at 21:43

user2942227

1,023
6
19
26

votes

5 answers

How to make shark/spark clear the cache?

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result. Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k…

hadoop hive apache-spark shark-sql

asked Dec 11 '13 at 11:19

venkat

votes

1 answer

What is hive, Is it a database?

I just started exploring Hive. It has all the structures similar to an RDBMS like tables, joins, partitions.. what i understand is Hive still uses HDFS for storage and it is an SQL abstraction of HDFS. From this I am not sure weather Hive itself is…

hadoop hbase hive

asked Nov 17 '13 at 12:03

Brainchild

1,814
5
27
52

Prev 1 2 3

…

99 100 Next