Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Introduction from the whitepaper Impala: A Modern, Open-Source SQL Engine for Hadoop:

INTRODUCTION

Impala is an open-source, fully-integrated, state-of-the-art MPP SQL query engine designed specifically to leverage the flexibility and scalability of Hadoop. Impala’s goal is to combine the familiar SQL support and multi-user performance of a traditional analytic database with the scalability and flexibility of Apache Hadoop and the production-grade security and management extensions of Cloudera Enterprise. Impala’s beta release was in October 2012 and it GA’ed in May 2013. The most recent version, Impala 2.0, was released in October 2014. Impala’s ecosystem momentum continues to accelerate, with nearly one million downloads since its GA.

Unlike other systems (often forks of Postgres), Impala is a brand-new engine, written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, YARN, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile). To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload.

...

Impala is the highest performing SQL-on-Hadoop system, especially under multi-user workloads. As Section 7 shows, for single-user queries, Impala is up to 13x faster than alter- natives, and 6.7x faster on average. For multi-user queries, the gap widens: Impala is up to 27.4x faster than alternatives, and 18x faster on average – or nearly three times faster on average for multi-user queries than for single-user ones.

References

2083 questions

votes

3 answers

Save Impala Shell query results in CSV

How can I save my query results in a CSV file via the Impala Shell. My Code: impala-shell -q "use test; select * from teams; -- From this point I need to save the query results to /Desktop (for example). " The problem that I am getting is that I…

export-to-csv impala

asked Apr 14 '18 at 16:04

user6203336

votes

2 answers

Immediate evaluation of CTE

I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct impala that results of this CTE should be re-used in…

hadoop impala cloudera-cdh

asked Nov 06 '17 at 09:26

AYK

3,312
1
17
30

votes

1 answer

Impala/Hive to get list of tables along with its size

I have used a query in Oracle DB to produce the list of tables in a database along with its owner and respective table size. Here is the sample query i have shared. select owner, table_name, round((num_rows*avg_row_len)/(1024*1024)) MB from…

sql oracle hadoop hive impala

asked Apr 20 '17 at 09:22

Manindar

votes

4 answers

Comma delimited string to individual rows - Impala SQL

sql split cloudera impala

asked May 23 '16 at 19:38

ifotopoulos

votes

2 answers

Performance of Apache Drill

Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. Thanks.

hadoop hive impala apache-drill apache-tez

asked Aug 22 '15 at 06:44

Sai

votes

2 answers

How to set configuration in Hive-Site.xml file for hive metastore connection?

I want to connect MetaStore using the java code. I have no idea how to set configuration setting in Hive-Site.xml file and where I'll post the Hive-Site.xml file. Please help. import java.sql.Connection; import java.sql.DriverManager; import…

hadoop hive cloudera impala metastore

asked Apr 07 '15 at 06:25

mohit sharma

votes

1 answer

Jdbc settings for connecting to Impala

What is the combination of driver and jdbc URL to use for CDH5 (I am on CDH5.3)? I have tried a few including: jdbc:hive2://myserver:21050/;auth=noSasl And with the following driver: org.apache.hive.jdbc.HiveDriver I have added …

jdbc hive impala

asked Mar 06 '15 at 04:40

WestCoastProjects

58,982
91
316
560

votes

1 answer

Calling JDBC to impala/hive from within a spark job and creating a table

I am trying to write a spark job in scala that would open a jdbc connection with Impala and let me create a table and perform other operations. How do I do this? Any example would be of great help. Thank you!

scala jdbc apache-spark impala

asked Oct 29 '14 at 15:48

user1189851

4,861
15
47
69

votes

1 answer

Impala - convert existing table to parquet format

I have a table that has partitions and I use avro files or text files to create and insert into a table. Once the table is done, is there a way to convert into parquet. I mean I know we could have done say CREATE TABLE default.test( name_id STRING)…

text-files avro parquet impala

asked Oct 14 '14 at 16:10

user1189851

4,861
15
47
69

votes

3 answers

Custom SerDe not supported by Impala, what's the best way to query files in CSV w/double quotes?

I have a CSV data with each field surronded with double quotes. When I created Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde' When above table is queried in Impala I am getting error SerDe not found. I added the CSV Serde JAR file in…

csv hadoop double-quotes impala

asked Sep 03 '14 at 10:56

prasannads

votes

1 answer

Implement CREATE AS SELECT in Impala

Pls help me on how to implement CREATE TABLE AS SELECT For simple create table t1 as select * from t2; I can implement as Create table t1 like t2; insert into t1 as select * from t2; But how to implement create table t1 as select c1,c2,c3 from…

cloudera impala

asked Oct 23 '13 at 03:17

on_the_shores_of_linux_sea

1,002
3
15
26

votes

2 answers

Installing cloudera impala without cloudera manager

Kindly provide the link for installing the imapala in ubuntu without cloudera manager. Couldn't able to install with official link. Unable to locate package impala using these queries : sudo apt-get install impala # Binaries for…

hadoop hive cloudera impala

asked Jun 17 '13 at 11:33

Naresh

5,073
12
67
124

votes

2 answers

How to UPDATE a value in hive table?

I have a flag column in Hive table that I want to update after some processing. I have tried using hive and impala using the below query but it didn't work, and got that it needs to be a kudu table while the table I have is a non-kudu table. Is…

sql hive hiveql impala

asked Jan 11 '21 at 10:00

Omar AlSaghier

votes

2 answers

Why Impala Scan Node is very slow (RowBatchQueueGetWaitTime)?

This query returns in 10 seconds most of the times, but occasionally it need 40 seconds or more. There are two executer nodes in the swarm, and there is no remarkable difference between profiles of the two nodes, following is one of them: …

hadoop hdfs impala olap

asked Aug 14 '20 at 02:34

luochen1990

3,689
1
22
37

votes

4 answers

Presto vs Impala: architecture, performance, functionality

Could you highligh major differences between the two in architecture & functionality in 2019? And how that differences affect performance? For some reason this excellent question was tagged as opinion-based. Extra-question: why Amazon decide to go…

database-design olap impala presto distributed-database

asked Dec 10 '19 at 21:38

VB_

45,112
42
145
293

Prev 1 2

…

99 100 Next