Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
1
vote
0 answers

Does Spark respect kudu's hash partitioning similar to bucketed joins on parquet tables?

I'm trying out Kudu with Spark. I want to join 2 tables with the following schema- # This table has around 1 million records TABLE dimensions ( id INT32 NOT NULL, PRIMARY KEY (id) ) HASH (id) PARTITIONS 32, RANGE (id) ( PARTITION…
Anmol
  • 379
  • 5
  • 19
1
vote
1 answer

Kudu Conditional UPSERT INTO

Does Kudu support conditions on the UPDATE portion of UPSERT INTO? Can I provide a conditional clause to only update given values based on a comparison between the insert values and destination table? The actual use case is to update a timestamp…
yogda
  • 11
  • 4
1
vote
1 answer

NIFI - Using one ReplaceText Processor how to add brackets at the beginning and end of each line

I have the following 10000 rows of log file every 5 seconds. log_datetime1 host_name1 log_message1 log_datetime2 host_name2 log_message2 log_datetime3 host_name3 log_message3 I want to send them to kudu or parquet table as the following…
mongotop
  • 7,114
  • 14
  • 51
  • 76
1
vote
1 answer

How to improve Kudu reads with Spark?

I have a process that given a new input retrieves related information form our Kudu database and then does some computation. The problem lies in the data retrieval, we have 1.201.524.092 rows and for any computation, it takes forever to start…
Shelen
  • 159
  • 1
  • 8
1
vote
1 answer

Installing Apache Kudu on WSL

I am trying to install Apache Kudu and run the C++ examples on my Ubuntu distribution (18.04) on WSL. I am following the instructions for Ubuntu at https://kudu.apache.org/docs/installation.html Everything runs smoothly until I get to step 6 where I…
rcong767
  • 11
  • 1
1
vote
1 answer

How to setup EMR cluster which supports Impala?

One of the aws document declares that impala is supported in EMR cluster. But i cant see impala option while creating EMR cluster. How can i get EMR cluster with impala-shell installed in it? Thanks in advance..
Joseph N
  • 540
  • 8
  • 28
1
vote
1 answer

Run kudu fsck in a kerberised CDH cluster

I am trying to have the cloudera manager run a check on a kudu cluster, which eventually will be the following command, run as the kudu user:: kudu cluster ksck master_host The output of this command is: Not authorized: leader master liveness check…
Guillaume
  • 2,325
  • 2
  • 22
  • 40
1
vote
0 answers

Kudu's MaintenanceMgr thread is very high on disk IO, read and write almost identical

This problem has been affected by online use, but when the program reads data in batches, the scanner is prone to timeout or "java.io.IOException: Couldn't get scan data" exception. Can someone help answer and help optimize this question, thank…
cheng.W.ye
  • 11
  • 3
1
vote
1 answer

Why doesn't Kudu fail when inserting duplicate primary key?

From Impala documentation: In most relational databases, if you try to insert a row that has already been inserted, the insertion will fail because the primary key would be duplicated. Impala, however, will not fail the query. Instead, it will…
ecitta
  • 41
  • 7
1
vote
1 answer

Why impala-jdbc throws exception when casting from BigDecimal to DECIMAL?

I'm writing to a Kudu table using impala-jdbc 2.6.4.1005. I got this error when inserting a BigDecimal with value 7896163500 to DECIMAL(20,2). [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state:…
ecitta
  • 41
  • 7
1
vote
0 answers

Configure Hive Metastore for presto and query data from s3 and apache kudu

I am pretty new to Presto and hive. In one of our application we want to use presto to query data from apache kudu and aws s3. As per my knowledge presto has its own catalog(meta) service, but we want to configure hive metastore(without hadoop and…
suraj chopade
  • 2,833
  • 3
  • 13
  • 15
1
vote
1 answer

Using parts of the primary key to improve searching in KUDU

I have a primary key composed of three columns (id_grandparent, id_parent, id_row) which is residing in KUDU. I want my lookups to be fast (hbase-like) when looking by id_grandparent. I'm using Impala and Spark to do lookups, let's assume both of…
BiS
  • 501
  • 4
  • 17
1
vote
1 answer

To install kudu do we required java to be installed?

To install the Apache kudu do we required java as prequisite? i am planning to install kudu in separate VM what are all the prequisite
Buvi
  • 11
  • 3
1
vote
0 answers

How to get a range of rows(e.g. 1000th~2000th rows) with Apache Kudu?

I'm using Apache Kudu for study, but how can I get a specific range of rows? For example, I want to get the 1000th to the 2000th rows. I have found some client APIs about search bound with key: Status AddLowerBound(const KuduPartialRow& key); …
Ming Zhang
  • 11
  • 1
1
vote
0 answers

Apache Kudu TServer goes down when I use CTAS (Create Table As) hence my insertion fails

I have a situation where I have a Table in Cloudera Impala (Parquet Format), The table statistcs are: Size: 23GB Rows: 67M RowSize: Approx 5KB Columns: 308 My Cloudera is Total 6 Nodes Cloudera Cluster (Disk : 84TB Each, Ram: 251GB Each) Kudu Master…
Shahab Niaz
  • 170
  • 10
1 2
3
8 9