Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
0
votes
1 answer

Apache Drill and Apache Kudu - not able to run "select * from " using Apache Drill, for the table created in Kudu through Apache Impala

I'm able to connect to Kudu through Apache Drill, and able to list tables fine. But when I have to fetch data from the table "impala::default.customer" below, I tried different options but none is working for me. The table in Kudu was created…
0
votes
0 answers

Nifi is appending additional character while reading from oracle database

I am novice to nifi. I want to read an Oracle database table "ST_MATIC" which has the below records using "QueryDatabaseTable". KEY,START_TIME,MATIC1 1,2021-07-27 01:07:28,M1 I need to load this to Kudu table. START_TIME is a TIMESTAMP column in…
stacktesting
  • 103
  • 8
0
votes
1 answer

ETL choice, building an ETL that deals with SQL query engine (impala) or native database directly?

I am trying to build an ETL that map the source tables to a dimensional, star schema model our data warehouse is basically Impala on top of Kudu database my question is, should I: A- build an ETL that deals with kudu tables directly using Python…
0
votes
1 answer

Impala | KUDU Show PARTITION BY HASH. Where my row are?

I want to test CREATE TABLE with PARTITION BY HASH in KUDU This is my CREATE clause. CREATE TABLE customers ( state STRING, name STRING, purchase_count int, PRIMARY KEY (state, name) ) PARTITION BY HASH (state) PARTITIONS 2 STORED AS…
icalvete
  • 987
  • 2
  • 16
  • 50
0
votes
1 answer

what's the meaning of short scans?

Recently,I'm learning something about Apache Kudu. I have this words:For workloads involving many short scans, where the overhead of contacting remote servers dominates, performance can be improved if all of the data for the scan is located in the…
0
votes
1 answer

How can I configure the Kudu test harness to avoid "Block cache capacity exceeds the memory pressure threshold"

I am trying to follow the guide to using the KuduTestHarness in the Getting Started guide. I have created the following simple test case. import org.apache.kudu.test.KuduTestHarness; import static org.junit.Assert.assertTrue; import…
0
votes
1 answer

IMPALA - How to get range partition size

For Parquet table I use SHOW FILES IN db_name.parquet_table_name to get all my partitions names, size and path for my Parquet table. For Range partitions I use SHOW RANGE PARTITIONS db_name.kudu_table_name This give me only the partitons ranges but…
mongotop
  • 7,114
  • 14
  • 51
  • 76
0
votes
0 answers

Spark SQL: Update if exists, else ignore

Is there a way to fastly execute a "Update if exists, ignore otherwise" dealing with Spark SQL and storing into Kudu? Context is: IoT platform, lots of data received and platform resources limited. I'm inserting/updating data into a Kudu Storage…
Cheloute
  • 783
  • 2
  • 11
  • 27
0
votes
1 answer

kudu impala 2.11 hue escape character does not work

I use CDH with impala 2.11. I have a test table stored in kudu. I write SQL as follows in HUE: select * from test where name = '\'' to find name with only a single quote mark but it dosen't work. Why?
i-robot
  • 23
  • 3
0
votes
1 answer

Migrate data away from a kudu disk

Question (TL;DR;) What I am looking for is a way to tell kudu to replicate data away from a directory (/data/0 in the context below), or to decommission a directory. Is it possible? Context I have a kudu setup with multiple data directories (all on…
Guillaume
  • 2,325
  • 2
  • 22
  • 40
0
votes
1 answer

How do I load data from a Java object to Kudu table?

I have a Java code where I am converting my JSON string to a Java object. I am storing the values of that string to the object. What I need to do next is store these values in a Kudu table. I just want to know how can this be done using Docker…
Rashi
  • 1
  • 4
0
votes
1 answer

unable to insert or upsert data from kafka topic to kudu table using lenses kudu sink connector

lenses kudu sink connector version = kafka-connect-kudu-1.2.3-2.1.0 kudu table schema CREATE TABLE IF NOT EXISTS table_name( su_id bigint not null, su_tenant_id int null, su_bu_id int null, su_user_type string null, su_acpd_id int null, su_user_code…
0
votes
1 answer

Spark - Kudu predicate pushdown

I'm using kudu and spark streaming for a realtime dashboard, my problem is that when I'm joining the batch from spark streaming with kudu table it doesn't make a predicate pushdown on it and takes 2-3 seconds to fetch the entire table in spark and…
0
votes
1 answer

KUDU for JDBC replication purposes, but not for Off-loaded Analytics

Given the quote from Apache KUDU official documentation, namely: https://kudu.apache.org/overview.html Kudu isn't designed to be an OLTP system, but if you have some subset of data which fits in memory, it offers competitive random access …
thebluephantom
  • 16,458
  • 8
  • 40
  • 83
0
votes
1 answer

Spark dataframe cast column for Kudu compatibility

(I am new to Spark, Impala and Kudu.) I am trying to copy a table from an Oracle DB to an Impala table having the same structure, in Spark, through Kudu. I am getting an error when the code tries to map an Oracle NUMBER to a Kudu data type. How can…
radumanolescu
  • 4,059
  • 2
  • 31
  • 44
1 2 3
8 9