Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

Tools to enable easy data summarization (ETL)
Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
A mechanism to put structure on this data
An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

Scalability (scale out with more machines added dynamically to the Hadoop cluster)
Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
Fault-tolerance
Loose-coupling with its input formats
Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

Add clear textual problem description.
Provide query and/or table DDL if applicable
Provide exception message
Provide input and desired output data example
Questions about query performance should include EXPLAIN query output.
Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
Use proper code and text formatting

Official links:

Useful Links:

21846 questions

votes

4 answers

Import data from .avro files to hive table

I created a hive table by following command and avro schema i had. CREATE TABLE table_name PARTITIONED BY (t string, y string, m string, d string, h string, hh string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS…

asked Jan 25 '17 at 11:20

KrunalParmar

1,062
2
18
31

votes

3 answers

How to cast a string to array of struct in HiveQL

I have a hive table with the column "periode", the type of the column is string. The column have values like the…

arrays struct hive explode hiveql

asked Jan 24 '17 at 17:16

Sidi Mahmoud Ould Rhil

votes

2 answers

Passing arguments to hive query

I am trying to pass command line arguments through the below ,but its not working . Can anybody help me with what I am doing wrong here! hive -f test2.hql -hiveconf partition=20170117 -hiveconf -hiveconf datepartition=20170120

hadoop mapreduce hive

asked Jan 19 '17 at 14:16

Babu

votes

1 answer

alter hive multiple column

How do we alter the datatype of multiple columns in Hive ? CREATE TABLE test_change (a int, b int, c int); ALTER TABLE test_change CHANGE a a string b b doube c c decimal(11,2);

hadoop hive alter

asked Jan 12 '17 at 12:44

Vinodh Sukumaran

votes

0 answers

Does order of columns matter during save hive table using Spark?

I have external hive table created : create external table test( b string, a string ) My spark code : case class Test(a:String,b:String) val list = Seq("bv","av")).toDF list.write.mode(SaveMode.Append).saveAsTable("test") The result "bv" is saved…

apache-spark hive hiveql parquet

asked Jan 04 '17 at 08:49

NoodleX

votes

2 answers

Purpose of using HBase in Hadoop instead of Hive

In my project, we are using Hadoop 2, Spark, Scala. Scala is the programming language and Spark is using here for analysing. we are using Hive and HBase both. I can access all details like file etc. of HDFS using Hive. But my confusions are - When…

hadoop hive hbase hdfs

asked Dec 29 '16 at 11:09

Avijit

1,770
5
16
34

votes

2 answers

Difference of performance between internal table and extenal table in hive

I want to do some actions to files on hdfs by using hive temporarily,so i do not want to use internal table.but my data is so huge ,for example 1TB,so I worry about the performance of external table. so I ask the question about difference of…

hadoop hive

asked Dec 24 '16 at 02:46

ElapsedSoul

votes

1 answer

How to handle comma separated decimal values in Hive?

hive hortonworks-data-platform decimal-point

asked Dec 20 '16 at 15:36

Shekhar

11,438
36
130
186

votes

3 answers

Hive - How to print the classpath of a Hive service

I need to check the classpath of the Hive service to see the location of the jars it loads while running the hive queries. I want to update the parquet jars for hive to latest parquet jars to read new parquet format data. I have updated the jars in…

hadoop hive hortonworks-data-platform parquet

asked Dec 17 '16 at 00:46

Munesh

1,509
3
20
46

votes

2 answers

Difference between hive, impala and beeline

I am new to Hadoop eco-system tools. Can anyone help me with understand the difference between hive, beeline and hive. Thanks in advance!

hive impala beeline hivecli

asked Dec 16 '16 at 11:26

Ramkrushna26

votes

1 answer

Hive sql: count and avg

I'm recently trying to learn Hive and i have a problem with a sql consult. I have a json file with some information. I want to get the average for each register. Better in example: country times USA 1 USA 1 USA 1 ES 1 ES …

sql count hive average

asked Dec 05 '16 at 13:57

Bob RO

votes

1 answer

Use hive metastore service WITHOUT Hadoop/HDFS

I know the question is a little bit strange. I love Hadoop & HDFS, but recently work on SparkSQL with Hive Metastore. I want to use SparkSQL as a vertical SQL engine to run OLAP query across different datasources like RDB, Mongo, Elastic ...…

hive apache-spark-sql metastore

asked Nov 16 '16 at 15:58

He Bai

votes

1 answer

Presto failing to query hive table

On EMR I created a dataset in parquet using spark and storing it on S3. I am currently able to create an external table and query it using hive but when I try to perform the same query using presto I obtain an error (the part referred changes at…

hadoop apache-spark hive emr presto

asked Nov 13 '16 at 16:14

Sebastiano Merlino

1,273
12
23

votes

3 answers

Sort a Spark data frame/ Hive result set

I'm trying to retrieve the list of columns from a Hive table and store the result in a spark dataframe. var my_column_list = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table""") But I'm unable to alphabetically sort the dataframe or even the…

scala apache-spark hive

asked Nov 08 '16 at 11:50

Amber

votes

0 answers

Sqoop export of a hive table partitioned on an int column

I have a Hive table partitioned on an 'int' column. I want to export the Hive table to MySql using Sqoop export tool. sqoop export --connect jdbc:mysql://XXXX:3306/temp --username root --password root --table emp --hcatalog-database temp…

hadoop apache-spark hive sqoop

asked Nov 07 '16 at 22:48

Munesh

1,509
3
20
46

Prev 1 2 3

…

99 100 Next