Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Introduction from the whitepaper Impala: A Modern, Open-Source SQL Engine for Hadoop:

INTRODUCTION

Impala is an open-source, fully-integrated, state-of-the-art MPP SQL query engine designed specifically to leverage the flexibility and scalability of Hadoop. Impala’s goal is to combine the familiar SQL support and multi-user performance of a traditional analytic database with the scalability and flexibility of Apache Hadoop and the production-grade security and management extensions of Cloudera Enterprise. Impala’s beta release was in October 2012 and it GA’ed in May 2013. The most recent version, Impala 2.0, was released in October 2014. Impala’s ecosystem momentum continues to accelerate, with nearly one million downloads since its GA.

Unlike other systems (often forks of Postgres), Impala is a brand-new engine, written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, YARN, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile). To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload.

...

Impala is the highest performing SQL-on-Hadoop system, especially under multi-user workloads. As Section 7 shows, for single-user queries, Impala is up to 13x faster than alter- natives, and 6.7x faster on average. For multi-user queries, the gap widens: Impala is up to 27.4x faster than alternatives, and 18x faster on average – or nearly three times faster on average for multi-user queries than for single-user ones.

References

2083 questions
0
votes
1 answer

How can I connect to Impala using a keytab?

I am trying to establish a connection to the impala database through a Python script using a keytab instead the normal user/password combination, but am unable to find any tutorials online whatsoever, the code I am currently using is: conn =…
0
votes
2 answers

How to set duplicate values in a column to zero while not deleting a row in SQL/Impala?

I am trying to find a way how to set all but one duplicate values in a column to zero without deleting the row. Below is a simplified example that displays the general idea. The column where the duplicate value needs to be set to zero is 'Total…
Maxi
  • 1
  • 1
0
votes
1 answer

Timestamp decrease the hour in insert overwrite

I have been work with Sqoop, hive and Impala. My Sqoop Job get a field from SQL Server with the format datetime to write in a TABLE1 stored as textfile. The field in TABLE1 have the timestamp format. After this, I created a HQL script using …
Fernando Delago
  • 105
  • 1
  • 2
  • 8
0
votes
0 answers

trying to calculate percentages of counts in SQL - HUE IMPALA

thanks for reading I'm trying to use SQL to calculate percentages of counts 1) i need to figure out how many times a single person shops at the same store, in one city (london)- i've done this with a case when(london) - I need to figure out what…
SS360
  • 63
  • 1
  • 6
0
votes
1 answer

Syntax Error: ON RIGHT when trying to match a substring in Impala

Does anyone know why I am receiving this error? I am using SQL in IMPALA and it wont run. Theres a yellow underline under mem_register_hsty_view and transparency_services_summary_2018. Here is my code: use sndbx_dx; SELECT …
Coder123
  • 334
  • 6
  • 26
0
votes
1 answer

Encoding impala data while reading from pandas.read_sql

When I am reading impala data using pyhive library and pandas.read_sql I am getting an error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 3071: unexpected end of data Reason for this error might be that data might be…
Shankar Pandala
  • 969
  • 2
  • 8
  • 28
0
votes
3 answers

SQL Group By using Two Keys

I'm looking to write a query to group by ID1, ID2, but only return the IDs where there is >1 unique ID1 for ID2. I have data like this: +------+------+ | ID1 | ID2 | +------+------+ |1 |A | +------+------+ |1 |A …
DukeLuke
  • 315
  • 6
  • 26
0
votes
2 answers

Store value of impala results in a variable in linux

I have a requirement to retrieve 239, 631 etc from the below output and store it in a variable in linux -- this is output of impala results.. +-----------------+ | organization_id | +-----------------+ | 239 | | 631 | | 632 …
Yogesh
  • 47
  • 1
  • 10
0
votes
1 answer

Hive impala query

Input. Key---- id---- ind1 ----ind2 1 A Y N 1 B N N 1 C Y Y 2 A N N 2 B Y N Output Key ind1 ind2 1 Y Y 2 Y N So basically…
hival
  • 21
  • 1
  • 5
0
votes
0 answers

Why drop table failed in impala-shell but works in Hue/Impala?

This is very odd to me, I have a small Cloudera cluster that is NOT kerberorised. I am able to run drop table query in Hue/Impala but not able to do it in impala-shell. Below code is from impala-shell: [hadoop01:21000] > use client1; Query: use…
mdivk
  • 3,545
  • 8
  • 53
  • 91
0
votes
2 answers

Delete impala shell history

I face this problem: We have a shared user where we use impala-shell from the same machine for impala queries. I don't want my queries to be visible and I want to be able to clear my impala-shell history. We access impala with: impala-shell an any…
Michail N
  • 3,647
  • 2
  • 32
  • 51
0
votes
1 answer

Impala EOMONTH equivalent

My problem is that EOMONTH doesn't seem to exist in Impala so I was hoping there is a substitute for EOMONTH. I just want to only return the values that correspond with end of month dates. Below is the query I tried and the last line is where I have…
MLS
  • 108
  • 14
0
votes
1 answer

How to remove headers which cause NumberFormatException with spark sql and impala/hive

while reading from impala with urls like and jdbc:hive2://impalajdbc.data:25004/;auth=noSasl and spark sql val rr = sparkSession.sql("SELECT item_id from someTable LIMIT 10") it complains that Cannot convert column 1 to long:…
doofin
  • 508
  • 2
  • 13
0
votes
1 answer

Insert salutation depending on the number of characters

Where salutation is >15 characters, the word ‘Hi’ is to be inserted in the field thought about using a regex function, but not sure how to implement this when regexp_like(salutation, > '^[0-9]{15}$') then 'Hi' MR Nigel Humphreys -> "hi" Ms Montjoy…
Anna
  • 444
  • 1
  • 5
  • 23
0
votes
1 answer

Converting a query from Sybase IQ to impala

SELECT f.exch FROM ( SELECT CASE WHEN sourcedesk IN ('GOBUS_NY', 'GOBUS_UK', …
deb
  • 631
  • 1
  • 5
  • 15