Questions tagged [hive]

Apache Hive is a database built on top of Hadoop and facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Please DO NOT use this tag for flutter database which is also named Hive, use flutter-hive tag instead.

Apache Hive is a database built on top of Hadoop that provides the following:

Tools to enable easy data summarization (ETL)
Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
A mechanism to put structure on this data
An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

At the same time, this language also allows traditional map/reduce programmers the ability to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Since Hive is Hadoop-based, it does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real-time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours and days. Many optimizations and improvements were made to spped-up processing such as fetch-only task, LLAP, materialized views, etc

To summarize, while low latency performance is not the top-priority of Hive's design principles, the following are Hive's key features:

Scalability (scale out with more machines added dynamically to the Hadoop cluster)
Extensibility (with map/reduce framework and UDF/UDAF/UDTF)
Fault-tolerance
Loose-coupling with its input formats
Rather reach query kanguage with native suport for JSON, XML, regexp, possibility to call java methods, using python and shell transformations, analytics and windowing functions, possibility to connect to different RDBMS using JDBC drivers, Kafka connector.
Ability to read and write almost any file formats using native and third-party SerDe, RegexSerDe.
Numerous third-party extensions, for example brickhouse UDFs, etc

How to write good Hive question:

Add clear textual problem description.
Provide query and/or table DDL if applicable
Provide exception message
Provide input and desired output data example
Questions about query performance should include EXPLAIN query output.
Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
Use proper code and text formatting

Official links:

Useful Links:

21846 questions

votes

6 answers

getting null values while loading the data from flat files into hive tables

I am getting the null values while loading the data from flat files into hive tables. my tables structure is like this: hive> create table test_hive (id int,value string); and my flat file is like this: input.txt 1 a 2 b 3 c 4 d 5 e 6 …

asked Nov 14 '12 at 12:59

user1823697

votes

2 answers

Hive query results in vertical format like MySQL's "\G"?

Is there a way to get Hive to output the results in a columnar-fashion, like the "\G" option available from MySQL? http://dev.mysql.com/doc/refman//5.5/en/mysql-commands.html

hive

asked Jun 24 '12 at 18:34

Idr

6,000
6
34
49

votes

5 answers

What is the difference between Apache Pig and Apache Hive?

What is the exact difference between Pig and Hive? I found that both have same functional meaning because they are used for doing same work. The only thing is implimentation which is different for both. So when to use and which technology? Is there…

hadoop hive apache-pig

asked Apr 23 '12 at 11:47

Ananda

1,572
7
27
54

votes

3 answers

In a hadoop cluster, should hive be installed on all nodes?

I am a newbie to Hadoop / Hive and I have just started reading the docs. There are lots of blogs on installing Hadoop in cluster mode. Also, I know that Hive runs on top of Hadoop. My question is: Hadoop is installed on all the cluster nodes.…

hadoop cluster-computing hive

asked Dec 10 '11 at 11:19

Vijay

votes

9 answers

Check if table exists in hive metastore using Pyspark

I am trying to check if a table exists in hive metastore if not, create the table. And if the table exists, append data. I have a snippet of the code below: spark.catalog.setCurrentDatabase("db_name") db_catalog = spark.catalog.listTables(dbName =…

python-3.x apache-spark hive pyspark apache-spark-sql

asked Aug 25 '19 at 13:15

Cryssie

3,047
10
54
81

votes

4 answers

Setup Standalone Hive Metastore Service For Presto and AWS S3

I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive…

hive presto hive-metastore

asked Feb 22 '18 at 16:47

mhaken

1,075
4
14
28

votes

5 answers

How to create hive table from Spark data frame, using its schema?

I want to create a hive table using my Spark dataframe's schema. How can I do that? For fixed columns, I can use: val CreateTable_query = "Create Table my table(a string, b string, c double)" sparksession.sql(CreateTable_query) But I have many…

scala apache-spark hive

asked Feb 15 '17 at 22:58

lserlohn

5,878
10
34
52

votes

2 answers

What does "WITH SERDEPROPERTIES ( 'paths' = 'key1, key2, key3') " really do in Hive DDL json serde?

Much appreciated if anyone can provide a reference to this clause. I have been searching online with little luck.

hive ddl amazon-athena

asked Feb 10 '17 at 23:26

Da Qi

votes

3 answers

Delete a database with tables in Hive

I have a database in hive which has around 100 tables. I would like to delete the whole database in a single shot query. How can we achieve that in Hive?

database hive hiveql

asked Feb 09 '17 at 06:39

user7351648

votes

6 answers

How to quit beeline?

I am using CDH 5.5 and need to use beeline. I am pretty new to it and learning it now. I can start beeline but cannot quit as we do in Hive. I need to use Ctrl+z to quit which is not the proper way. Can someone help?

hadoop hive beeline

asked Mar 12 '16 at 19:17

user4503253

votes

7 answers

Unable to exit Hive

I've just installed Hive on my Ubuntu machine (14.04). When I run hive in the terminal, it comes up with Logging initialized using configuration in jar:file:/home/nkhl/Documents/apachehive/lib/hive-common-1.2.1.jar!/hive-log4j.properties which is…

hadoop hive ubuntu-14.04

asked Oct 18 '15 at 18:17

Anonymous Person

1,437
8
26
47

votes

3 answers

REGEXP_REPLACE capturing groups

I was wondering if someone could help me understand how to use Hive's regexp_replace function to capture groups in the regex and use those groups in the replacement string. I have an example problem I'm working through below that involves…

regex hadoop hive regexp-replace

asked Feb 18 '15 at 19:26

jatal

votes

3 answers

Hive creating a table but getting FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

Here is the code I am using to create the table: CREATE TABLE vi_vb(cTime STRING, VI STRING, Vital STRING, VB STRING) PARTITIONED BY(cTime STRING, VI STRING) CLUSTERED BY(VI) SORTED BY(cTime) INTO 32 BUCKETS ROW FORMAT DELIMITED FIELDS…

hadoop hive

asked Dec 17 '14 at 21:43

user3121369

votes

1 answer

Does JDBC have a maximum ResultSet size?

Is there a maximum number of rows that a JDBC will put into a ResultSet specifically from a Hive query? I am not talking about fetch size or paging, but the total number of rows returned in a ResultSet. Correct me if I'm wrong, but the fetch size…

java jdbc hive resultset

asked Oct 28 '14 at 22:48

sparks

votes

2 answers

Hive dynamic partitioning

I'm trying to create a partitioned table using dynamic partitioning, but i'm facing an issue. I'm running Hive 0.12 on Hortonworks Sandbox 2.0. set hive.exec.dynamic.partition=true; INSERT OVERWRITE TABLE demo_tab PARTITION (land) SELECT stadt,…

hadoop hive hiveql

asked Jun 16 '14 at 07:29

Baeumla

Prev 1 2 3

…

99 100 Next