Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

  • Tools to enable easy data summarization (ETL)
  • Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
  • A mechanism to put structure on this data
  • An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

  • Scalability (scale out with more machines added dynamically to the Hadoop cluster)
  • Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
  • Fault-tolerance
  • Loose-coupling with its input formats
  • Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
  • Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
  • Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

  1. Add clear textual problem description.
  2. Provide query and/or table DDL if applicable
  3. Provide exception message
  4. Provide input and desired output data example
  5. Questions about query performance should include EXPLAIN query output.
  6. Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
  7. Use proper code and text formatting

Official links:

Useful Links:

21846 questions
22
votes
2 answers

sqlContext HiveDriver error on SQLException: Method not supported

I have been trying to use sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver") to get Hive table into Spark without any success. I have done research and read below: How to connect to remote hive server from spark Spark…
HP.
  • 19,226
  • 53
  • 154
  • 253
22
votes
1 answer

Hive Insert Overwrite Table

I'm new to Hive and I wanted to know if insert overwrite will overwrite an existing table I have created. I want to filter an already created table, let's call it TableA, to only select the rows where age is greater than 18. Can I achieve this using…
Anna Mai
  • 233
  • 1
  • 2
  • 7
22
votes
3 answers

Error Message: TOK_ALLCOLREF is not supported in current context - while Using DISTINCT in HIVE

I'm using the simple command: SELECT DISTINCT * FROM first_working_table; in HIVE 0.11, and I'm receiving the following error message: FAILED: SemanticException TOK_ALLCOLREF is not supported in current context. Does anyone know why this is…
user3107144
  • 231
  • 1
  • 2
  • 3
22
votes
3 answers

How to see contents of Hive orc files in linux

Is there a way to see the contents of an orc file that hive 0.11 and above use. I usually cat gz files and decompress them to see the contents eg: cat part-0000.gz | pigz -d | more Note: pigz is a parallel gz program. I would like to know if there…
viper
  • 2,220
  • 5
  • 27
  • 33
22
votes
5 answers

Hive Explode / Lateral View multiple arrays

I have a hive table with the following schema: COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1,2,3] [r,t,null] [2,1,null] How can I normalize the arrays so I get the following result COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 …
user2726995
  • 2,064
  • 2
  • 21
  • 26
21
votes
6 answers

Hive 2.1.1 MetaException(message:Version information not found in metastore. )

I'm running Hadoop 2.7.3, MySQL 5.7.17 and Hive 2.1.1 on Ubuntu 16.04. When I run ./hive, I keep getting the following warning and exception: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in…
Top.Deck
  • 1,077
  • 3
  • 16
  • 31
21
votes
3 answers

RIGHT() / LEFT() functions

Is there a function in Hiveql that is equivalent to Right() or Left() function from TSQL? For example, RIGHT(col1,10) to get the first 10 characters from col1.
jmich738
  • 1,565
  • 3
  • 24
  • 41
21
votes
5 answers

How to change date format in hive?

My table in hive has a filed of date in the format of '2016/06/01'. but i find that it is not in harmory with the format of '2016-06-01'. They can not compare for instance. Both of them are string . So I want to know how to make them in harmory…
yanachen
  • 3,401
  • 8
  • 32
  • 64
21
votes
7 answers

Connecting to Hive using Beeline

I am trying to connect to hive installed in my machine through Beeline client. when I give the 'beeline' command & connect to Hive, the client is asking for user name & password !connect jdbc:hive2://localhost:10000/default I have no idea what is…
Raj
  • 2,368
  • 6
  • 34
  • 52
21
votes
9 answers

How to transpose/pivot data in hive?

I know there's no direct way to transpose data in hive. I followed this question: Is there a way to transpose data in Hive? , but as there is no final answer there, could not get all the way. This is the table I have: | ID | Code | Proc1 |…
Sunny
  • 369
  • 2
  • 3
  • 11
21
votes
3 answers

hive check comma separated String contains a string

I have a column in hive table list_ids which is a list of ids stored as comma separated string. how can I write a query for this column to check if it stores a particular id Example: list_ids = "abc,cde,efg" I want to something like select *…
user2978621
  • 803
  • 2
  • 11
  • 20
21
votes
3 answers

Impala can't access all hive table

I try to query hbase data through hive (I'm using cloudera). I did a fiew hive external table pointing to hbase but the thing is Cloudera's Impala doesn't have an access to all those tables. All hive external tables appear in the metastore manager…
Nosk
  • 753
  • 2
  • 6
  • 24
21
votes
1 answer

SQL split comma separated row

I have a column with a variable number of comma seperated values: somethingA,somethingB,somethingC somethingElseA, somethingElseB And I want the result to take each value, and create a…
Don P
  • 60,113
  • 114
  • 300
  • 432
21
votes
3 answers

Why is Select Count(*) slower than Select * in hive

When i am running queries in VirtualBox Sandbox with hive. I feel Select count(*) is too much slower than the Select *. Can anyone explain what is going on behind? And why this delay is happening?
Haris N I
  • 6,474
  • 6
  • 29
  • 35
21
votes
1 answer

How can select a column and do a TRANSFORM in Hive?

I was using TRANSFORM USING with Hive 0.8.1, and noticed that this is invalid syntax: SELECT a, TRANSFORM(b, c) USING 'fake.py' AS d, FROM test_table; Removing "a," makes this statement work. What is the correct way of using this?
Enno Shioji
  • 26,542
  • 13
  • 70
  • 109