Questions tagged [apache-kudu]

For questions related to Apache Kudu

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

Fast processing of OLAP workloads.

Integration with MapReduce, Spark and other Hadoop ecosystem components.

Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.

Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.

Strong performance for running sequential and random workloads simultaneously.

Easy to administer and manage with Cloudera Manager.

High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.

Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

Reporting applications where newly-arrived data needs to be immediately available for end users

Time-series applications that must simultaneously support:

queries across large amounts of historic data

granular queries about an individual entity that must return very quickly

Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data

134 questions

vote

0 answers

Does Spark respect kudu's hash partitioning similar to bucketed joins on parquet tables?

I'm trying out Kudu with Spark. I want to join 2 tables with the following schema- # This table has around 1 million records TABLE dimensions ( id INT32 NOT NULL, PRIMARY KEY (id) ) HASH (id) PARTITIONS 32, RANGE (id) ( PARTITION…

apache-spark kudu apache-kudu

asked Oct 16 '20 at 11:21

Anmol

vote

1 answer

Kudu Conditional UPSERT INTO

Does Kudu support conditions on the UPDATE portion of UPSERT INTO? Can I provide a conditional clause to only update given values based on a comparison between the insert values and destination table? The actual use case is to update a timestamp…

impala apache-kudu

asked Sep 19 '20 at 00:58

yogda

vote

1 answer

NIFI - Using one ReplaceText Processor how to add brackets at the beginning and end of each line

I have the following 10000 rows of log file every 5 seconds. log_datetime1 host_name1 log_message1 log_datetime2 host_name2 log_message2 log_datetime3 host_name3 log_message3 I want to send them to kudu or parquet table as the following…

json apache-nifi parquet processor apache-kudu

asked Aug 10 '20 at 16:15

mongotop

7,114
14
51
76

vote

1 answer

How to improve Kudu reads with Spark?

I have a process that given a new input retrieves related information form our Kudu database and then does some computation. The problem lies in the data retrieval, we have 1.201.524.092 rows and for any computation, it takes forever to start…

scala apache-spark apache-spark-sql kudu apache-kudu

asked Jul 06 '20 at 12:02

Shelen

vote

1 answer

Installing Apache Kudu on WSL

I am trying to install Apache Kudu and run the C++ examples on my Ubuntu distribution (18.04) on WSL. I am following the instructions for Ubuntu at https://kudu.apache.org/docs/installation.html Everything runs smoothly until I get to step 6 where I…

c++ gradle cmake apache-kudu

asked Jun 08 '20 at 20:34

rcong767

vote

1 answer

How to setup EMR cluster which supports Impala?

One of the aws document declares that impala is supported in EMR cluster. But i cant see impala option while creating EMR cluster. How can i get EMR cluster with impala-shell installed in it? Thanks in advance..

amazon-web-services hadoop amazon-emr impala apache-kudu

asked Jun 05 '20 at 06:26

Joseph N

vote

1 answer

Run kudu fsck in a kerberised CDH cluster

I am trying to have the cloudera manager run a check on a kudu cluster, which eventually will be the following command, run as the kudu user:: kudu cluster ksck master_host The output of this command is: Not authorized: leader master liveness check…

kerberos cloudera-cdh apache-kudu

asked Apr 03 '20 at 15:09

Guillaume

2,325
2
22
40

vote

0 answers

Kudu's MaintenanceMgr thread is very high on disk IO, read and write almost identical

This problem has been affected by online use, but when the program reads data in batches, the scanner is prone to timeout or "java.io.IOException: Couldn't get scan data" exception. Can someone help answer and help optimize this question, thank…

kudu apache-kudu

asked Sep 03 '19 at 03:07

cheng.W.ye

vote

1 answer

Why doesn't Kudu fail when inserting duplicate primary key?

From Impala documentation: In most relational databases, if you try to insert a row that has already been inserted, the insertion will fail because the primary key would be duplicated. Impala, however, will not fail the query. Instead, it will…

sql impala apache-kudu

asked May 31 '19 at 14:41

ecitta

vote

1 answer

Why impala-jdbc throws exception when casting from BigDecimal to DECIMAL?

I'm writing to a Kudu table using impala-jdbc 2.6.4.1005. I got this error when inserting a BigDecimal with value 7896163500 to DECIMAL(20,2). [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state:…

jdbc cloudera impala apache-kudu

asked May 31 '19 at 08:03

ecitta

vote

0 answers

Configure Hive Metastore for presto and query data from s3 and apache kudu

I am pretty new to Presto and hive. In one of our application we want to use presto to query data from apache kudu and aws s3. As per my knowledge presto has its own catalog(meta) service, but we want to configure hive metastore(without hadoop and…

hive presto apache-kudu hive-metastore

asked Mar 07 '19 at 02:25

suraj chopade

2,833
3
13
15

vote

1 answer

Using parts of the primary key to improve searching in KUDU

I have a primary key composed of three columns (id_grandparent, id_parent, id_row) which is residing in KUDU. I want my lookups to be fast (hbase-like) when looking by id_grandparent. I'm using Impala and Spark to do lookups, let's assume both of…

apache-kudu

asked Feb 28 '19 at 18:05

BiS

vote

1 answer

To install kudu do we required java to be installed?

To install the Apache kudu do we required java as prequisite? i am planning to install kudu in separate VM what are all the prequisite

java apache-kudu

asked Feb 01 '19 at 07:32

Buvi

vote

0 answers

How to get a range of rows(e.g. 1000th~2000th rows) with Apache Kudu?

I'm using Apache Kudu for study, but how can I get a specific range of rows? For example, I want to get the 1000th to the 2000th rows. I have found some client APIs about search bound with key: Status AddLowerBound(const KuduPartialRow& key); …

apache-kudu

asked Jan 29 '19 at 11:32

Ming Zhang

vote

0 answers

Apache Kudu TServer goes down when I use CTAS (Create Table As) hence my insertion fails

I have a situation where I have a Table in Cloudera Impala (Parquet Format), The table statistcs are: Size: 23GB Rows: 67M RowSize: Approx 5KB Columns: 308 My Cloudera is Total 6 Nodes Cloudera Cluster (Disk : 84TB Each, Ram: 251GB Each) Kudu Master…

bigdata cloudera impala apache-kudu

asked Nov 23 '18 at 16:01

Shahab Niaz

Prev 1 2

…

8 9 Next