Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Introduction from the whitepaper Impala: A Modern, Open-Source SQL Engine for Hadoop:

INTRODUCTION

Impala is an open-source, fully-integrated, state-of-the-art MPP SQL query engine designed specifically to leverage the flexibility and scalability of Hadoop. Impala’s goal is to combine the familiar SQL support and multi-user performance of a traditional analytic database with the scalability and flexibility of Apache Hadoop and the production-grade security and management extensions of Cloudera Enterprise. Impala’s beta release was in October 2012 and it GA’ed in May 2013. The most recent version, Impala 2.0, was released in October 2014. Impala’s ecosystem momentum continues to accelerate, with nearly one million downloads since its GA.

Unlike other systems (often forks of Postgres), Impala is a brand-new engine, written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, YARN, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile). To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload.

...

Impala is the highest performing SQL-on-Hadoop system, especially under multi-user workloads. As Section 7 shows, for single-user queries, Impala is up to 13x faster than alter- natives, and 6.7x faster on average. For multi-user queries, the gap widens: Impala is up to 27.4x faster than alternatives, and 18x faster on average – or nearly three times faster on average for multi-user queries than for single-user ones.

References

2083 questions

votes

1 answer

Hive - Flatten Hierarchy Table into Levels

I have Hierarchy table with Parent Child relationship maximum level 15. I need to find out all the level of child nodes for each parent node. I have tried Recursive query but it is not working in Hive and Impala. Please suggest some query to solve…

asked Feb 10 '19 at 06:23

A Saraf

votes

2 answers

Impala SQL query with 1 table and finding common with 3 hostnames

I have a single table and am trying to get the destinationhostnames that all users have in common using Impala SQL. proxy table: sourcehostname destinationhostname comp1 google.com comp2 google.com comp1 yahoo.com comp1 …

sql impala

asked Feb 08 '19 at 15:20

sectechguy

2,037
4
28
61

votes

1 answer

Hive/Impala changing table counts

I have a list of release dates (some past and some future) and a list of registration numbers. release date registration 01/01/2019 R1 02/01/2019 R2 07/02/2019 R3 I basically want to create a new table that will display…

sql hive impala

asked Feb 05 '19 at 13:57

M. Andrews

votes

1 answer

Impala : Running sum of 1 hour

I want to count records of each ID with in 1 Hour. I tried out some IMPALA queries but without any luck. I have input data as follows: And expected output would be: I tried : select concat(month,'/',day,'/',year,' ',hour,':',minute) time,…

hadoop hive hql impala

asked Feb 05 '19 at 07:53

Manish Saraf Bhardwaj

1,038
8
27

votes

1 answer

Hive/Impala - Find End Child nodes in Hierarchy Structure table

I have a scenario find the lowest level child nodes from hierarchy table having parent_node_id and child_node_id as below. Source table is in Hive and Impala database. Please suggest hive/impala query to find out the lowest level child nodes for…

hadoop hive hiveql impala

asked Feb 01 '19 at 15:36

A Saraf

votes

1 answer

Oracle - End Child nodes from Hierarchy table

I am trying to find the lowest level child nodes from hierarchy table having parent_node_id and child_node_id as below. It is returning the mid level child nodes as well. Please help to modify this query and achieve the desired result. Please…

oracle hive impala

asked Jan 31 '19 at 15:35

A Saraf

votes

0 answers

IMPALA - Complex Type - How to check if field contain empty array

As mentioned above, for example, we have a table ID content 1 [] 2 [] 3 [{"name":"Jack", "age":18, "title": "MR"}] ... ... How to get all the rows that content column has value. select * from t, t.content where t.content is not null…

sql arrays impala

asked Jan 30 '19 at 11:20

user3651247

votes

1 answer

Performance issue with Impala table with merged parquet files

Here I am having python utility to create multiple parquet files using Pyarrow library for Single data set as data set size is huge for one day. Here parquet file contains 10K parquet row groups in each split parquet file, here in end we are…

apache-spark hadoop parquet impala pyarrow

asked Jan 28 '19 at 19:30

Ajay Kharade

1,469
1
17
31

votes

2 answers

Find missing records with grouping

I am struggling in implementing SQL query for identifying missing records from 2 HIVE tables based on grouping scenario. Data is as below Table 1 - Calendar month_last_day 20190131 20190229 20190331 20190430 Table 2 - Items itemid date 101 …

sql hive impala

asked Jan 26 '19 at 15:03

Hemil

votes

1 answer

impala sql transpose multiple columns to rows

I'd like to transpose my columns to rows in impala using SQL. Below is what I'm working with and the desired output underneath. The data is few million records and around a hundred columns but the 2 records are for for illustration purposes only.…

sql nosql impala

asked Jan 26 '19 at 00:54

Zee

votes

2 answers

Impala pivot from column to row, column names disappear

I am kind of new to impala, and to sql in general. I am trying to do some pivot operations in order to start with this table. Input: Name table: MyName +-----------+---------------------+-----------+ | Column A | Column B | Column C …

sql group-by pivot transpose impala

asked Jan 25 '19 at 10:23

iraciv94

votes

0 answers

Concurrency issues in java accessing Jdbc Impala

What I want to do: Have two thread t1,t2 where t1 connecting to I1 impala cluster and t2 connecting to I2 Both t1 and t2 has the task of executing set of queries in the respective databases. Below is the pseudo code: …

java multithreading resultset impala

asked Jan 23 '19 at 21:41

Zara

votes

1 answer

Different row count while creating a table or view in Impala

Different row count when trying to create a table and view in Impala I am trying to run a query in Impala having a left outer join with another table. The table structure is as below: SELECT COUNT (*) FROM ( SELECT A.*, B.ORDERED_DATE, …

mysql hadoop hive impala hue

asked Jan 23 '19 at 14:08

Yogesh

votes

1 answer

Using Externally created Parquet files in Impala

First off, apologies if this comes across poorly worded, I've tried to help myself but I'm not clear on where its not right. I'm trying to query data in Impala which has been exported from another system. Up till now its been exported as a…

parquet create-table impala

asked Jan 23 '19 at 10:50

Tim Edwards

1,031
1
13
34

votes

1 answer

Impala - Convert MON-YY to YYYYMM

I have one column Month_year in staging table having below data. Please suggest query to get the desired output. Input: +----------+ month_year +----------+ Jan-19 Dec-18 +----------+ Expected…

hive hiveql impala

asked Jan 12 '19 at 15:48

A Saraf

Prev 1 2 3

…

99 100 Next