Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

Tools to enable easy data summarization (ETL)
Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
A mechanism to put structure on this data
An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

Scalability (scale out with more machines added dynamically to the Hadoop cluster)
Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
Fault-tolerance
Loose-coupling with its input formats
Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

Add clear textual problem description.
Provide query and/or table DDL if applicable
Provide exception message
Provide input and desired output data example
Questions about query performance should include EXPLAIN query output.
Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
Use proper code and text formatting

Official links:

Useful Links:

21846 questions

votes

7 answers

How to convert .txt file to Hadoop's sequence file format

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?

asked Mar 21 '11 at 11:41

Abhishek Pathak

1,569
1
10
19

votes

2 answers

What does msck stands for in Msck repair command

Hive Msck repair command is used to repair partitions, but what is full form of MSCK. I already tried to find in hive doc's but hard luck.

hadoop hive hiveql

asked Dec 30 '17 at 15:36

Kaustubh Deshpande

votes

10 answers

How to create SparkSession with Hive support (fails with "Hive classes are not found")?

I'm getting an error while trying to run the following code: import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class App { public static void main(String[] args) throws…

java apache-spark hive apache-spark-sql

asked Sep 12 '16 at 06:31

Subhadip Majumder

votes

1 answer

Spark final task takes 100x times longer than first 199, how to improve

I am seeing some performance issues while running queries using dataframes. I have seen in my research, that long running finally tasks can be a sign that data is not disturbed optimally, but have not found a detailed process for resolving this…

scala apache-spark hive left-join

asked Jul 22 '16 at 03:46

Dan Ciborowski - MSFT

6,807
10
53
88

votes

11 answers

How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?

I'm using HiveContext with SparkSQL and I'm trying to connect to a remote Hive metastore, the only way to set the hive metastore is through including the hive-site.xml on the classpath (or copying it to /etc/spark/conf/). Is there a way to set this…

apache-spark hive apache-spark-sql

asked Aug 13 '15 at 06:04

amarouni

votes

4 answers

Display the SQL definition of a hive view

How to display the view definition of a hive view in its SQL form. Most relational databases supports commands like SHOW CREATE VIEW viewname;

hadoop hive

asked Jul 04 '14 at 19:13

rogue-one

11,259
7
53
75

votes

3 answers

How to update partition metadata in Hive , when partition data is manualy deleted from HDFS

What is the way to automatically update the metadata of Hive partitioned tables? If new partition data's were added to HDFS (without alter table add partition command execution) . then we can sync up the metadata by executing the command 'msck…

hive partitioning

asked Jan 14 '14 at 07:43

vinu.m.19

votes

3 answers

How to load a text file into a Hive table stored as sequence files

I have a hive table stored as a sequencefile. I need to load a text file into this table. How do I load the data into this table?

hadoop hive

asked Dec 28 '12 at 03:24

cldo

1,735
6
21
26

votes

8 answers

hive sql find the latest record

the table is: create table test ( id string, name string, age string, modified string) data like this: id name age modifed 1 a 10 2011-11-11 11:11:11 1 a 11 2012-11-11 12:00:00 2 b 20 2012-12-10 10:11:12 2 …

sql group-by hive max

asked Nov 23 '12 at 04:20

qiulp

votes

4 answers

SparkSQL - Read parquet file directly

I am migrating from Impala to SparkSQL, using the following code to read a table: my_data = sqlContext.read.parquet('hdfs://my_hdfs_path/my_db.db/my_table') How do I invoke SparkSQL above, so it can return something like: 'select col_A, col_B from…

scala apache-spark hive apache-spark-sql hdfs

asked Dec 21 '16 at 02:03

Edamame

23,718
73
186
320

votes

2 answers

distinct vs group by which is better

for the simplest case we all refer to: select id from mytbl group by id and select distinct id from mytbl as we know, they generate same query plan which had been repeatedly mentioned in some items like Which is better: Distinct or Group By In…

sql hadoop hive distinct

asked Aug 07 '15 at 11:01

Chiron

votes

3 answers

How to view the value of a hive variable?

How do you view the value of a hive variable you have set with the command "SET a = 'B,C,D'"? I don't want to use the variable- just see the value I have set it to. Also is there a good resource for Hive documentation like this? The Apache website…

sql apache hql hive

asked Jun 18 '13 at 20:41

abu

votes

3 answers

When creating an external table in hive can I point the location to specific files in a directory?

I have defined a table as such: create external table PageViews (Userid string, Page_View string) partitioned by (ds string) row format as delimited fields terminated by ',' stored as textfile location '/user/data'; I do not want all the files in…

hive external

asked Jun 29 '12 at 21:28

George TeVelde

1,561
2
12
13

votes

7 answers

Hive: writing column headers to local file?

Hive documentation lacking again: I'd like to write the results of a query to a local file as well as the names of the columns. Does Hive support this? Insert overwrite local directory 'tmp/blah.blah' select * from table_name; Also, separate…

syntax hive

asked Apr 13 '11 at 23:31

CMaury

1,273
5
13
25

votes

13 answers

HIVE Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I am getting the below error on creating a hive database FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/facebook/fb303/FacebookService$Iface Hadoop version:**hadoop-1.2.1** HIVE Version:…

hive

asked Apr 28 '14 at 05:26

user3579986

Prev 1 2 3

…

99 100 Next