Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Introduction from the whitepaper Impala: A Modern, Open-Source SQL Engine for Hadoop:

INTRODUCTION

Impala is an open-source, fully-integrated, state-of-the-art MPP SQL query engine designed specifically to leverage the flexibility and scalability of Hadoop. Impala’s goal is to combine the familiar SQL support and multi-user performance of a traditional analytic database with the scalability and flexibility of Apache Hadoop and the production-grade security and management extensions of Cloudera Enterprise. Impala’s beta release was in October 2012 and it GA’ed in May 2013. The most recent version, Impala 2.0, was released in October 2014. Impala’s ecosystem momentum continues to accelerate, with nearly one million downloads since its GA.

Unlike other systems (often forks of Postgres), Impala is a brand-new engine, written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, YARN, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile). To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload.

...

Impala is the highest performing SQL-on-Hadoop system, especially under multi-user workloads. As Section 7 shows, for single-user queries, Impala is up to 13x faster than alter- natives, and 6.7x faster on average. For multi-user queries, the gap widens: Impala is up to 27.4x faster than alternatives, and 18x faster on average – or nearly three times faster on average for multi-user queries than for single-user ones.

References

2083 questions
6
votes
3 answers

Save Impala Shell query results in CSV

How can I save my query results in a CSV file via the Impala Shell. My Code: impala-shell -q "use test; select * from teams; -- From this point I need to save the query results to /Desktop (for example). " The problem that I am getting is that I…
user6203336
6
votes
2 answers

Immediate evaluation of CTE

I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct impala that results of this CTE should be re-used in…
AYK
  • 3,312
  • 1
  • 17
  • 30
6
votes
1 answer

Impala/Hive to get list of tables along with its size

I have used a query in Oracle DB to produce the list of tables in a database along with its owner and respective table size. Here is the sample query i have shared. select owner, table_name, round((num_rows*avg_row_len)/(1024*1024)) MB from…
Manindar
  • 999
  • 2
  • 14
  • 30
6
votes
4 answers

Comma delimited string to individual rows - Impala SQL

Let's suppose we have a table: Owner | Pets ------------------------------ Jack | "dog, cat, crocodile" Mary | "bear, pig" I want to get as a result: Owner | Pets ------------------------------ Jack | "dog" Jack | "cat" Jack |…
ifotopoulos
  • 83
  • 1
  • 1
  • 3
6
votes
2 answers

Performance of Apache Drill

Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. Thanks.
Sai
  • 127
  • 1
  • 2
  • 9
6
votes
2 answers

How to set configuration in Hive-Site.xml file for hive metastore connection?

I want to connect MetaStore using the java code. I have no idea how to set configuration setting in Hive-Site.xml file and where I'll post the Hive-Site.xml file. Please help. import java.sql.Connection; import java.sql.DriverManager; import…
mohit sharma
  • 259
  • 1
  • 4
  • 9
6
votes
1 answer

Jdbc settings for connecting to Impala

What is the combination of driver and jdbc URL to use for CDH5 (I am on CDH5.3)? I have tried a few including: jdbc:hive2://myserver:21050/;auth=noSasl And with the following driver: org.apache.hive.jdbc.HiveDriver I have added …
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
6
votes
1 answer

Calling JDBC to impala/hive from within a spark job and creating a table

I am trying to write a spark job in scala that would open a jdbc connection with Impala and let me create a table and perform other operations. How do I do this? Any example would be of great help. Thank you!
user1189851
  • 4,861
  • 15
  • 47
  • 69
6
votes
1 answer

Impala - convert existing table to parquet format

I have a table that has partitions and I use avro files or text files to create and insert into a table. Once the table is done, is there a way to convert into parquet. I mean I know we could have done say CREATE TABLE default.test( name_id STRING)…
user1189851
  • 4,861
  • 15
  • 47
  • 69
6
votes
3 answers

Custom SerDe not supported by Impala, what's the best way to query files in CSV w/double quotes?

I have a CSV data with each field surronded with double quotes. When I created Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde' When above table is queried in Impala I am getting error SerDe not found. I added the CSV Serde JAR file in…
prasannads
  • 609
  • 2
  • 14
  • 28
6
votes
1 answer

Implement CREATE AS SELECT in Impala

Pls help me on how to implement CREATE TABLE AS SELECT For simple create table t1 as select * from t2; I can implement as Create table t1 like t2; insert into t1 as select * from t2; But how to implement create table t1 as select c1,c2,c3 from…
6
votes
2 answers

Installing cloudera impala without cloudera manager

Kindly provide the link for installing the imapala in ubuntu without cloudera manager. Couldn't able to install with official link. Unable to locate package impala using these queries : sudo apt-get install impala # Binaries for…
Naresh
  • 5,073
  • 12
  • 67
  • 124
5
votes
2 answers

How to UPDATE a value in hive table?

I have a flag column in Hive table that I want to update after some processing. I have tried using hive and impala using the below query but it didn't work, and got that it needs to be a kudu table while the table I have is a non-kudu table. Is…
Omar AlSaghier
  • 340
  • 4
  • 12
5
votes
2 answers

Why Impala Scan Node is very slow (RowBatchQueueGetWaitTime)?

This query returns in 10 seconds most of the times, but occasionally it need 40 seconds or more. There are two executer nodes in the swarm, and there is no remarkable difference between profiles of the two nodes, following is one of them: …
luochen1990
  • 3,689
  • 1
  • 22
  • 37
5
votes
4 answers

Presto vs Impala: architecture, performance, functionality

Could you highligh major differences between the two in architecture & functionality in 2019? And how that differences affect performance? For some reason this excellent question was tagged as opinion-based. Extra-question: why Amazon decide to go…
VB_
  • 45,112
  • 42
  • 145
  • 293