Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

Tools to enable easy data summarization (ETL)
Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
A mechanism to put structure on this data
An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

Scalability (scale out with more machines added dynamically to the Hadoop cluster)
Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
Fault-tolerance
Loose-coupling with its input formats
Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

Add clear textual problem description.
Provide query and/or table DDL if applicable
Provide exception message
Provide input and desired output data example
Questions about query performance should include EXPLAIN query output.
Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
Use proper code and text formatting

Official links:

Useful Links:

21846 questions

votes

6 answers

How to list all hive databases being in use or created so far?

Similar to SHOW TABLES command, do we have any such command to list all databases created so far?

hadoop hive hiveql

asked Nov 05 '13 at 04:33

Raja Reddy

votes

2 answers

metastore_db created wherever I run Hive

Folder metastore_db is created in any directory where I run Hive query. Is there any way to have only one metastore_db in a defined location and stop it from being created all over the places? Does it have anything to do with hive.metastore.local?

hive hiveql

asked Nov 29 '12 at 11:35

darcyy

5,236
5
28
41

votes

12 answers

Inserting Data into Hive Table

I am new to hive. I have successfully setup a single node hadoop cluster for development purpose and on top of it, I have installed hive and pig. I created a dummy table in hive: create table foo (id int, name string); Now, I want to insert data…

sql insert hadoop hive

asked Jun 15 '12 at 15:19

Tapan Avasthi

votes

3 answers

Amazon EC2 vs. Amazon EMR

I have implemented a task in Hive. Currently it is working fine on my single node cluster. Now I am planning to deploy it on AWS. I don't know anything about the AWS. If I plan to deploy it then what should I choose Amazon EC2 or Amazon EMR? I want…

amazon-ec2 amazon-web-services hive amazon-emr

asked Apr 11 '12 at 05:09

Bhavesh Shah

3,299
11
49
73

votes

3 answers

Exporting Hive Table to a S3 bucket

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this: CREATE TABLE csvimport(id BIGINT, time STRING, log STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; LOAD DATA LOCAL INPATH…

amazon-s3 hive elastic-map-reduce emr

asked Feb 28 '12 at 20:48

seedhead

3,655
4
32
38

votes

2 answers

LATERAL VIEW EXPLODE in presto

New to presto, any pointer how can I use LATERAL VIEW EXPLODE in presto for below table. I need to filter on names in my presto query CREATE EXTERNAL TABLE `id`( `id` string, `names` map>, `tags`…

amazon-web-services hive cloud presto trino

asked Jul 12 '18 at 20:48

rkj

votes

2 answers

Does Spark SQL use Hive Metastore?

I am developing a Spark SQL application and I've got few questions: I read that Spark-SQL uses Hive metastore under the cover? Is this true? I'm talking about a pure Spark-SQL application that does not explicitly connect to any Hive…

apache-spark hive apache-spark-sql

asked May 09 '17 at 15:35

user1888243

2,591
9
32
44

votes

5 answers

How to load CSV data with enclosed by double quotes and separated by tab into HIVE table?

I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' . But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is…

hadoop hive

asked Jun 04 '15 at 07:23

Sharad

3,562
6
37
59

votes

11 answers

Hive startup -[ERROR] Terminal initialization failed; falling back to unsupported

I have downloaded hive and modified HADOOP_HOME to HADOOP_HOME=${bin}/../../usr/local/hadoop my actual hadoop path is /usr/local/hadoop in .bashrc i have added the below env variables export HIVE_HOME=/usr/lib/hive/apache-hive-1.1.0-bin export…

java hadoop hive

asked Mar 11 '15 at 21:11

Venkat

votes

4 answers

Required field 'client_protocol' is unset

I am using Hive 0.12, and I'm trying the JDBC from apache. When I try to run the code, I get apache.thrift.TApplicationException. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import…

java hadoop jdbc hive

asked Jul 11 '14 at 09:27

user3782579

votes

6 answers

What is use of hcatalog in hadoop?

I'm new to Hadoop. I know that the HCatalog is a table and storage management layer for Hadoop. But how exactly it works and how to use it. Please give some simple example.

hadoop hive hbase hcatalog

asked Mar 20 '14 at 13:00

Vijay_Shinde

1,332
2
17
38

votes

6 answers

Hive: Filtering Data between Specified Dates when Date is a String

I'm trying to filter data between September 1st, 2010 and August 31st, 2013 in a Hive table. The column containing the date is in string format (yyyy-mm-dd). I can use month() and year() on this column. But how do I use them to filter data between…

date filter hive

asked Jan 29 '14 at 08:26

mixedbag99

votes

3 answers

What are the advantages of setting "hive.exec.parallel" to false in Hive ?

I came to know that when hive.exec.parallel is set to true in hive i.e set hive.exec.parallel=true; then independent tasks in a query can run in parallel. Thanks to Qubole for this: Are there any advantages of setting this parameter to false?…

hive

asked Aug 13 '13 at 17:51

Mayank Jaiswal

12,338
7
39
41

votes

7 answers

JSON output format for Hive Query results

Is there any way to convert the Hive query result in JSON format?

hadoop hive

asked Apr 03 '12 at 14:46

divinedragon

5,105
13
50
97

votes

2 answers

Create temporary table in Hive?

Does Hive support temporary tables? I can't find it in the apache docs.

hadoop hive

asked Mar 21 '11 at 23:59

CMaury

1,273
5
13
25

Prev 1 2 3

…

99 100 Next