Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Introduction from the whitepaper Impala: A Modern, Open-Source SQL Engine for Hadoop:

INTRODUCTION

Impala is an open-source, fully-integrated, state-of-the-art MPP SQL query engine designed specifically to leverage the flexibility and scalability of Hadoop. Impala’s goal is to combine the familiar SQL support and multi-user performance of a traditional analytic database with the scalability and flexibility of Apache Hadoop and the production-grade security and management extensions of Cloudera Enterprise. Impala’s beta release was in October 2012 and it GA’ed in May 2013. The most recent version, Impala 2.0, was released in October 2014. Impala’s ecosystem momentum continues to accelerate, with nearly one million downloads since its GA.

Unlike other systems (often forks of Postgres), Impala is a brand-new engine, written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, YARN, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile). To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload.

...

Impala is the highest performing SQL-on-Hadoop system, especially under multi-user workloads. As Section 7 shows, for single-user queries, Impala is up to 13x faster than alter- natives, and 6.7x faster on average. For multi-user queries, the gap widens: Impala is up to 27.4x faster than alternatives, and 18x faster on average – or nearly three times faster on average for multi-user queries than for single-user ones.

References

2083 questions
0
votes
1 answer

ORDER BY through between table IMPALA

I'm searching for an Impala Query idea. Let me try to explain my problem: it is all about sorting IDs. I have a table with a different types of IDs. A head ID and a kind of sub IDs (for one head ID there are up to 150 sub IDs) Through a window…
0
votes
1 answer

Impala - Calculate Percentage From One Table

I have this kind of table, named it as table_A Cust Amount Src_cust A 2000 B A 3000 C A 1000 B C 1000 B Result Cust Percentage Src_cust A …
Nicky Apriliani
  • 321
  • 4
  • 25
0
votes
1 answer

Selecting string after the last \\ using regex with Impala SQL

I have a dataset with a column with processes and the path. I am trying to use regex with Impala to strip off the executable. The dataset looks like…
sectechguy
  • 2,037
  • 4
  • 28
  • 61
0
votes
0 answers

DoubleType to Timestamp in Impala

How do I convert a DoubleType into a Timestamp in Impala ? I have the following doubletype format: month_date: 201710 I would have thought something like: SELECT to_timestamp(cast (t1.month_date as string), 'yyyy-MM') FROM old t1; I am getting…
Anna
  • 444
  • 1
  • 5
  • 23
0
votes
1 answer

Impala single insert statement creating multiple files

I have an Impala managed table and I am trying to execute a single statement insert query in Impala using JDBC connection. Sample query - insert into employee (ID,NAME,AGE,ADDRESS,SALARY) VALUES (1, 'Ramesh', 32, 'Mumbai', 20000 ) But after…
Avijit
  • 1,770
  • 5
  • 16
  • 34
0
votes
0 answers

Configuring Kerberos authentication to run Impala queries as part of test automation in Jenkins build

We have a module of java-Spark code that runs impala queries. As part of automation, i would like to have a junit that runs impala queries and compares the results with expected. Here the issue is need to pass through kerberos authentication to be…
Zara
  • 33
  • 1
  • 7
0
votes
1 answer

Convert Unixtime to MMddyyyy

I'm trying to convert a column which has unixtime (ex 1542862806000) to regular DTS select unix_timestamp(column_name) from table; But i get error: AnalysisException: No matching function with signature: unix_timestamp(BIGINT). My column type is…
TheNewGuy
  • 559
  • 1
  • 10
  • 27
0
votes
2 answers

Impala - How to compare date time type 'mm dd yy 00:00AM'

In Impala, when I tried to compared the date, it will give wrong result. For example: select 'Nov 23 2018 3:02AM' > 'Dec 1 2018 12:00AM' which will return True when use cast() function select cast('Dec 1 2018 12:00AM' as timestamp) which will…
user3651247
  • 238
  • 1
  • 7
  • 19
0
votes
0 answers

Impala ODBC Driver Syntax Error (Encountered DECIMAL LITERAL)

I'm attempting to do a simple INSERT INTO statement on an Impala table that has the following schema: field1 (date) field2 (string) field3 (string) field4 (string) field5 (string) field6 (bigint) I am using Impala pyODBC drivers to do this. Here's…
0
votes
0 answers

check data in a weekly circle in Impala

I am having a peoples data set with students (student=1) and I need to monitor these students on a weekkly basis. How can I filter data for a specific date and then monitor it after 7 days and after 14 days? something like this. Only this part…
Anna
  • 444
  • 1
  • 5
  • 23
0
votes
1 answer

Finding the best way to pull the last 7 days with Impala when datetime is string

I am trying to work with a dataset that we started pulling in and of course the "devicereceipttime" is stored as a string, and I cant convince anyone to change it right now. However the "year", "month", "day" and "hour" are broken out into separate…
sectechguy
  • 2,037
  • 4
  • 28
  • 61
0
votes
1 answer

Impala/Hive Filling in Missing Values Similar to LOCF (last observation carry forward)

I have a time series data in Impala that in this format. One record get created when and only when there is a change, updated value represents the new data. --------------------------------------- | Product | Year | Week | UpdatedValue…
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178
0
votes
3 answers

Join two tables on id fields using Impala

I have two tables in in HDFS that I want to join using Impala. One is Employee_Logs the other is HR_Data. Queries: select e.employee_id, e.action from Employee_Logs e where e.employment_status_desc = 'Active' select h.employee_id, h.name from…
sectechguy
  • 2,037
  • 4
  • 28
  • 61
0
votes
1 answer

Impala double values not getting loaded correctly

I have created a simple table in impala like below CREATE TABLE IF NOT EXISTS my_db.employee (name STRING, salary double ); And my insert statement is like below insert into employee (name, salary) VALUES ("Prasad", 158.17) But the…
prasad
  • 63
  • 2
  • 9
0
votes
2 answers

smaller than 8 digits long in impala

I have customer numbers some of which are longer than 8 digits. How can I flag them so they are not counted? I tried the following: SELECT t1.updte_user as staff_number, (CASE WHEN (CAST(t1.updte_user) AS INT ) Integer not null check…
Anna
  • 444
  • 1
  • 5
  • 23
1 2 3
99
100