Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Introduction from the whitepaper Impala: A Modern, Open-Source SQL Engine for Hadoop:

INTRODUCTION

Impala is an open-source, fully-integrated, state-of-the-art MPP SQL query engine designed specifically to leverage the flexibility and scalability of Hadoop. Impala’s goal is to combine the familiar SQL support and multi-user performance of a traditional analytic database with the scalability and flexibility of Apache Hadoop and the production-grade security and management extensions of Cloudera Enterprise. Impala’s beta release was in October 2012 and it GA’ed in May 2013. The most recent version, Impala 2.0, was released in October 2014. Impala’s ecosystem momentum continues to accelerate, with nearly one million downloads since its GA.

Unlike other systems (often forks of Postgres), Impala is a brand-new engine, written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, YARN, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile). To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload.

...

Impala is the highest performing SQL-on-Hadoop system, especially under multi-user workloads. As Section 7 shows, for single-user queries, Impala is up to 13x faster than alter- natives, and 6.7x faster on average. For multi-user queries, the gap widens: Impala is up to 27.4x faster than alternatives, and 18x faster on average – or nearly three times faster on average for multi-user queries than for single-user ones.

References

2083 questions

votes

1 answer

What are the fundamental architectural, SQL compliance, and data use scenario differences between Presto and Impala?

Can some experts give some succinct answers to the differences between Presto and Impala from these perspectives? Fundamental architecture design SQL compliance Real-world latency Any SPOF or fault-tolerance functionality Structured and…

presto impala trino

asked Nov 07 '13 at 16:16

Yellow Duck

votes

1 answer

What to use.. Impala on HDFS, or Impala on Hbase or just the Hbase?

I am working on Proof of Concept task. The task is to implement a feature of our product using Hadoop technology. Feature is quite simple, we have a UI which will let you insert details about "Network Issue". All details about such a issue are…

hadoop hbase hdfs impala

asked Jul 09 '13 at 06:15

Ameya

votes

4 answers

Error connecting: Could not connect to localhost:21000

I am trying to install cloudera impala on my local machine (32 bit ubuntu) without cloudera manager(they don't support on 32 bit ubuntu, i also tried and failed). I have tried following commands to download the impala from repository. $ sudo…

hadoop hive impala

asked Jun 18 '13 at 14:47

Naresh

5,073
12
67
124

votes

3 answers

split function does not work in Cloudera Impala

I keep getting an AnalysisException that says "split unknown" when I try to use the split function in Cloudera Impala. It seems to be a valid function listed on the built-in functions page. For reference, I'm using Hue to interact with Impala. Does…

hadoop hive cloudera impala

asked May 15 '13 at 23:48

Emre Colak

votes

5 answers

How to increase superset row limit and timeout cache for SQL Lab and Visualization

I have a dataset that has 1 billion rows. The data is stored in Hive. Also, I put Impala as a layer between Hive and Superset. The queries that are run in Superset have row limit max. 100.000. I need to change it with no row limit. Furthermore, I…

sql hive visualization impala apache-superset

asked Dec 31 '21 at 09:21

ufukyılmaz

votes

2 answers

Impala add column with default value

I want to add a column to an existing impala table(and view) with a default value (so that the existing rows also have a value). The column should not allow null values. ALTER TABLE dbName.tblName ADD COLUMNS (id STRING NOT NULL '-1') I went…

cloudera impala hue

asked Jul 24 '20 at 19:23

user2441441

1,237
4
24
45

votes

1 answer

Compaction in Impala Tables

I want to know about the compaction in Impala tables but can't find material to study about. What are different techniques and where I can find material to study about it.

cloudera impala

asked Jun 29 '20 at 20:58

Tushar Pandey

votes

1 answer

Override underlying parquet data seamlessly for impala table

I have an Impala table backed by parquet files which is used by another team. Every day I run a batch Spark job that overwrites the existing parquet files (creating new data set, the existing files will be deleted and new files will be created) Our…

apache-spark parquet impala

asked Mar 10 '20 at 04:44

Kalaiselvam M

1,050
1
16
25

votes

1 answer

What is "cold start" in Hive and why doesn't Impala suffer from this?

I'm reading the literature on comparing Hive and Impala. Several sources state some version of the following "cold start" line: It is well known that MapReduce programs take some time before all nodes are running at full capacity. In Hive, every…

hive bigdata impala

asked Nov 08 '19 at 19:21

DivyaJyoti Rajdev

votes

0 answers

Connect to impala using python from Windows machine. Error: 'TSocket' object has no attribute 'isOpen'

I want to access impala using python 3.7.3 (Anaconda, Jupyter Notebook) on my Windows machine. The following code I am trying to execute: from impala.dbapi import connect import traceback try: conn = connect(host='myhost.xx.yy', port=21050,…

python impala

asked Aug 22 '19 at 15:03

clex

votes

1 answer

Consistent Hive and Impala Hash?

I am looking for a consistent way to hash something in both the Hive Query Language and the Impala Query Language where the hashing function produce the same value regardless of if it is done in Hive or in Impala. To clarify, I want something like…

hadoop hive impala

asked Sep 07 '18 at 16:54

Aur

votes

0 answers

Select all except one impala

I am finding the apporoach to ignore a column from Inner-select Query in Impala . I am very well able to figure it out in Hive. Does anyone tried it in Impala ?? Hive : select `(col_name)?+.+` from t1 ; -- To Except a Column in Hive . Impala: I…

hive hiveql hadoop2 impala

asked Jun 12 '18 at 14:59

Govind

votes

2 answers

Can not ALTER or DROP a big Imapa partitionned tables - CAUSED BY: MetaException: Timeout when executing

I have a several impala partitionned tables that have more than 50k partitions, it work a good except the Hive Metastore operations, like DROP and ALTER ... RENAME, I face this error message: Query: drop table cars ERROR: ImpalaRuntimeException:…

hadoop hive hadoop2 impala metastore

asked Oct 02 '17 at 10:04

Mohammed Acharki

votes

2 answers

Query to Show only column names in impala

In hive we can do "show columns in TABLE_NAME", to get only column name of a table.But I want a query to show only column names of a table in IMPALA.How can i get only the column names of a table in IMPALA?

hadoop hive impala

asked Sep 19 '17 at 11:49

Biswa Patra

votes

1 answer

(Hive, SQL) - How to sort a list of string inside a column?

I have a big data problem in Hive (SQL). SELECT genre, COUNT(*) AS unique_count FROM table_name GROUP BY genre which gives result like: genre | unique_count ---------------------------------- Romance,Crime,Drama,Law |…

sql hadoop hive cloudera impala

asked Feb 10 '17 at 20:06

Afloz

3,625
3
25
31

Prev 1 2 3

…

99 100 Next