Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

  • Tools to enable easy data summarization (ETL)
  • Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
  • A mechanism to put structure on this data
  • An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

  • Scalability (scale out with more machines added dynamically to the Hadoop cluster)
  • Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
  • Fault-tolerance
  • Loose-coupling with its input formats
  • Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
  • Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
  • Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

  1. Add clear textual problem description.
  2. Provide query and/or table DDL if applicable
  3. Provide exception message
  4. Provide input and desired output data example
  5. Questions about query performance should include EXPLAIN query output.
  6. Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
  7. Use proper code and text formatting

Official links:

Useful Links:

21846 questions
57
votes
7 answers

How does Hive compare to HBase?

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much preferable to the HBase API we have…
mrhahn
  • 607
  • 1
  • 7
  • 7
57
votes
3 answers

Hive: Convert String to Integer

I am looking for a Built-in UDF to convert values of a string column to integer in my hive table for sorting using SELECT and ORDER BY. I searched in the Language Manual, but no use. Any other suggestions also welcome.
Srinivas
  • 2,479
  • 8
  • 47
  • 69
56
votes
3 answers

Does Hive have a String split function?

I am looking for a in-built String split function in Hive? e.g. if String is: A|B|C|D|E Then I want to have a function like: array split(string input, char delimiter) So that I get back: [A,B,C,D,E] Does such a in-built split function…
user855
  • 19,048
  • 38
  • 98
  • 162
54
votes
18 answers

How to Access Hive via Python?

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated. When I add this to /etc/profile: export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py I can then do the imports as listed in the link, with the…
Matthew Moisen
  • 16,701
  • 27
  • 128
  • 231
52
votes
6 answers

Hive load CSV with commas in quoted fields

I am trying to load a CSV file into a Hive table like so: CREATE TABLE mytable ( num1 INT, text1 STRING, num2 INT, text2 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; LOAD DATA LOCAL INPATH '/data.csv' OVERWRITE INTO TABLE mytable; …
Martijn Lenderink
  • 535
  • 1
  • 5
  • 5
51
votes
17 answers

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- (on Windows)

I am running Spark on Windows 7. When I use Hive, I see the following error The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- The permissions are set as the following C:\tmp>ls -la total 20 drwxr-xr-x …
user1384205
  • 1,231
  • 3
  • 20
  • 39
51
votes
7 answers

java.net.URISyntaxException when starting HIVE

I am new in HIVE. I have already set up hadoop and it works well, and I want to set up Hive. When I start hive , it shows an error as Caused by: java.net.URISyntaxException: Relative path in absolute URI:…
Exia
  • 2,381
  • 4
  • 17
  • 24
50
votes
3 answers

Create hive table using "as select" or "like" and also specify delimiter

Is it possible to do a create table as select using row format delimited fields terminated by '|'; or to do a create table like row format delimited fields terminated by '|'; The Language…
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
49
votes
6 answers

Hadoop/Hive : Loading data from .csv on a local machine

As this is coming from a newbie... I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server? How do I…
mel
  • 1,566
  • 5
  • 17
  • 29
47
votes
3 answers

What is the difference between Apache Spark SQLContext vs HiveContext?

What are the differences between Apache Spark SQLContext and HiveContext ? Some sources say that since the HiveContext is a superset of SQLContext developers should always use HiveContext which has more features than SQLContext. But the current…
tlarevo
  • 635
  • 1
  • 5
  • 12
47
votes
11 answers

Hive query output to file

I run hive query by java code. Example: "SELECT * FROM table WHERE id > 100" How to export result to hdfs file.
cldo
  • 1,735
  • 6
  • 21
  • 26
46
votes
3 answers

How to make MSCK REPAIR TABLE execute automatically in AWS Athena

I have a Spark batch job which is executed hourly. Each run generates and stores new data in S3 with the directory naming pattern DATA/YEAR=?/MONTH=?/DATE=?/datafile. After uploading the data to S3, I want to investigate it using Athena. Also, I…
46
votes
3 answers

SparkSQL vs Hive on Spark - Difference and pros and cons?

SparkSQL CLI internally uses HiveQL and in case Hive on spark(HIVE-7292) , hive uses spark as backend engine. Can somebody throw some more light, how exactly these two scenarios are different and pros and cons of both approaches?
Gaurav Khare
  • 2,203
  • 4
  • 25
  • 23
45
votes
5 answers

Hive installation issues: Hive metastore database is not initialized

I tried to install hive on a raspberry pi 2. I installed Hive by uncompress zipped Hive package and configure $HADOOP_HOME and $HIVE_HOME manually under hduser user-group I created. When running hive, I got the following error message: hive ERROR…
As high as honor
  • 451
  • 1
  • 5
  • 3
45
votes
3 answers

Explode the Array of Struct in Hive

This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY> ) And this is the data in the above table- 1015826235 …
arsenal
  • 23,366
  • 85
  • 225
  • 331