Questions tagged [apache-kudu]

For questions related to Apache Kudu

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

Fast processing of OLAP workloads.

Integration with MapReduce, Spark and other Hadoop ecosystem components.

Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.

Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.

Strong performance for running sequential and random workloads simultaneously.

Easy to administer and manage with Cloudera Manager.

High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.

Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

Reporting applications where newly-arrived data needs to be immediately available for end users

Time-series applications that must simultaneously support:

queries across large amounts of historic data

granular queries about an individual entity that must return very quickly

Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data

134 questions

votes

1 answer

Apache Drill and Apache Kudu - not able to run "select * from " using Apache Drill, for the table created in Kudu through Apache Impala

I'm able to connect to Kudu through Apache Drill, and able to list tables fine. But when I have to fetch data from the table "impala::default.customer" below, I tried different options but none is working for me. The table in Kudu was created…

impala apache-drill apache-kudu

asked Oct 01 '21 at 05:33

Vikas Kumar

votes

0 answers

Nifi is appending additional character while reading from oracle database

I am novice to nifi. I want to read an Oracle database table "ST_MATIC" which has the below records using "QueryDatabaseTable". KEY,START_TIME,MATIC1 1,2021-07-27 01:07:28,M1 I need to load this to Kudu table. START_TIME is a TIMESTAMP column in…

apache-nifi kudu apache-kudu

asked Jul 27 '21 at 11:04

stacktesting

votes

1 answer

ETL choice, building an ETL that deals with SQL query engine (impala) or native database directly?

I am trying to build an ETL that map the source tables to a dimensional, star schema model our data warehouse is basically Impala on top of Kudu database my question is, should I: A- build an ETL that deals with kudu tables directly using Python…

impala kudu apache-kudu

asked Apr 16 '21 at 22:52

Atheer Abdullatif

votes

1 answer

Impala | KUDU Show PARTITION BY HASH. Where my row are?

I want to test CREATE TABLE with PARTITION BY HASH in KUDU This is my CREATE clause. CREATE TABLE customers ( state STRING, name STRING, purchase_count int, PRIMARY KEY (state, name) ) PARTITION BY HASH (state) PARTITIONS 2 STORED AS…

impala kudu apache-kudu

asked Jan 12 '21 at 19:48

icalvete

votes

1 answer

what's the meaning of short scans?

Recently,I'm learning something about Apache Kudu. I have this words:For workloads involving many short scans, where the overhead of contacting remote servers dominates, performance can be improved if all of the data for the scan is located in the…

database apache-kudu

asked Dec 08 '20 at 01:22

zhongshan li

votes

1 answer

How can I configure the Kudu test harness to avoid "Block cache capacity exceeds the memory pressure threshold"

I am trying to follow the guide to using the KuduTestHarness in the Getting Started guide. I have created the following simple test case. import org.apache.kudu.test.KuduTestHarness; import static org.junit.Assert.assertTrue; import…

java unit-testing kudu apache-kudu

asked Oct 07 '20 at 11:25

Steve Hindmarch

votes

1 answer

IMPALA - How to get range partition size

For Parquet table I use SHOW FILES IN db_name.parquet_table_name to get all my partitions names, size and path for my Parquet table. For Range partitions I use SHOW RANGE PARTITIONS db_name.kudu_table_name This give me only the partitons ranges but…

impala apache-kudu range-partitions

asked Sep 07 '20 at 18:26

mongotop

7,114
14
51
76

votes

0 answers

Spark SQL: Update if exists, else ignore

Is there a way to fastly execute a "Update if exists, ignore otherwise" dealing with Spark SQL and storing into Kudu? Context is: IoT platform, lots of data received and platform resources limited. I'm inserting/updating data into a Kudu Storage…

scala apache-spark apache-spark-sql kudu apache-kudu

asked Jul 07 '20 at 09:11

Cheloute

votes

1 answer

kudu impala 2.11 hue escape character does not work

I use CDH with impala 2.11. I have a test table stored in kudu. I write SQL as follows in HUE: select * from test where name = '\'' to find name with only a single quote mark but it dosen't work. Why?

escaping character impala hue apache-kudu

asked Apr 25 '20 at 03:39

i-robot

votes

1 answer

Migrate data away from a kudu disk

Question (TL;DR;) What I am looking for is a way to tell kudu to replicate data away from a directory (/data/0 in the context below), or to decommission a directory. Is it possible? Context I have a kudu setup with multiple data directories (all on…

cloudera-cdh apache-kudu

asked Apr 08 '20 at 14:43

Guillaume

2,325
2
22
40

votes

1 answer

How do I load data from a Java object to Kudu table?

I have a Java code where I am converting my JSON string to a Java object. I am storing the values of that string to the object. What I need to do next is store these values in a Kudu table. I just want to know how can this be done using Docker…

java json docker kudu apache-kudu

asked Mar 26 '20 at 17:54

Rashi

votes

1 answer

unable to insert or upsert data from kafka topic to kudu table using lenses kudu sink connector

lenses kudu sink connector version = kafka-connect-kudu-1.2.3-2.1.0 kudu table schema CREATE TABLE IF NOT EXISTS table_name( su_id bigint not null, su_tenant_id int null, su_bu_id int null, su_user_type string null, su_acpd_id int null, su_user_code…

apache-kafka-connect lenses apache-kudu

asked Mar 26 '20 at 14:37

Mohan Rajan K

votes

1 answer

Spark - Kudu predicate pushdown

I'm using kudu and spark streaming for a realtime dashboard, my problem is that when I'm joining the batch from spark streaming with kudu table it doesn't make a predicate pushdown on it and takes 2-3 seconds to fetch the entire table in spark and…

apache-spark spark-streaming spark-streaming-kafka apache-kudu

asked Oct 29 '19 at 21:42

M. Alexandru

votes

1 answer

KUDU for JDBC replication purposes, but not for Off-loaded Analytics

Given the quote from Apache KUDU official documentation, namely: https://kudu.apache.org/overview.html Kudu isn't designed to be an OLTP system, but if you have some subset of data which fits in memory, it offers competitive random access …

apache-kudu

asked Oct 28 '19 at 07:17

thebluephantom

16,458
8
40
83

votes

1 answer

Spark dataframe cast column for Kudu compatibility

(I am new to Spark, Impala and Kudu.) I am trying to copy a table from an Oracle DB to an Impala table having the same structure, in Spark, through Kudu. I am getting an error when the code tries to map an Oracle NUMBER to a Kudu data type. How can…

scala apache-spark impala apache-kudu

asked May 15 '19 at 19:27

radumanolescu

4,059
2
31
44

Prev 1 2 3

…

8 9 Next