Questions tagged [apache-kudu]

For questions related to Apache Kudu

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

Fast processing of OLAP workloads.

Integration with MapReduce, Spark and other Hadoop ecosystem components.

Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.

Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.

Strong performance for running sequential and random workloads simultaneously.

Easy to administer and manage with Cloudera Manager.

High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.

Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

Reporting applications where newly-arrived data needs to be immediately available for end users

Time-series applications that must simultaneously support:

queries across large amounts of historic data

granular queries about an individual entity that must return very quickly

Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data

134 questions

votes

1 answer

How to get current kudu master or tserver flag value?

Master and tserver flags can be accessed from kudu web interfaces (by defult http://127.0.0.1:8051/varz and http://127.0.0.1:8050/varz). But I couldn't find a way to get it from command line. For example, how to get tserver_master_addrs from a…

apache-kudu

asked Jul 14 '17 at 18:15

ramazan polat

7,111
1
48
76

votes

1 answer

Hive Hbase JOIN performance & KUDU

Reading the Cloudera documentation using Impala to join a Hive table against HBase smaller tables as stated below, then in the absence of a Big Data appliance such as OBDA and a largish HBase dimension table that is mutable: If you have join…

asked Jun 06 '17 at 12:53

thebluephantom

16,458
8
40
83

votes

1 answer

Multi-tenancy implementation with Apache Kudu

I am implementing big data system using apache Kudu. Preliminary requirement are as follows: Support Multi-tenancy Front end will use Apache Impala JDBC drivers to access data. Customers will write Spark Jobs on Kudu for analytical use…

multi-tenant impala apache-kudu

asked Apr 25 '17 at 22:24

Sauchin

votes

0 answers

Streamsets throws exception (MANUAL_FLUSH buffer) while using Kudu client

I'm a newbie in Streamsets and Kudu technologies and I'm trying several solutions to reach my goal: I've got a folder containing some Avro files and these files need to be processed and afterward sent to a Kudu…

apache-kudu streamsets

asked Apr 20 '17 at 11:00

Christian D'Amico

votes

1 answer

Error While inserting rows into Kudu using Spark Shell

I am new to Apache Kudu, I installed it on my Ubuntu system and later created a table in it using Apache Spark shell. Now I am trying to insert data into that table using insertRows() for that I am using the but below given command,…

apache-spark insert apache-kudu

asked Apr 18 '17 at 12:48

Pavan Kumar

votes

1 answer

Connecting Apache Drill to Kudu

Is there a way to connect Apache Drill to Kudu? I have seen Drill 1.5 added an experimental support for Kudu and a drill-storage-kudu on github but I can't figure out how to make it work... Is this now less experimental? Thanks

apache-drill apache-kudu

asked Apr 10 '17 at 20:02

Jice

votes

1 answer

Installing Apache Kudu on my mac (Mac Os Sierra 10.12.1) fails to compile during "thirdparty/build-if-necessary.sh"

When I'm trying to install Apache Kudu I obtain this error. I couldn't find any information to solve this problem and the only one I could find says that after installing Xcode the problem was solved, but I have already Xcode…

macos apache-kudu

asked Dec 05 '16 at 15:52

Orbar

votes

1 answer

how can I calculate how much storage existing kudu table actually uses

I would like to calculate how big (in GB) existing kudu table actually it. Does anybody know how to do this ?

cloudera impala apache-kudu

asked Sep 29 '16 at 07:22

qwertz1123

1,173
10
27

votes

1 answer

Cannot connect Impala-Kudu to Apache Kudu (without Cloudera Manager): Get TTransportException Error

I have successfully installed kudu on Ubuntu (Trusty) as per the official kudu documentations (see http://kudu.apache.org/docs/installation.html ). The setup has one node running master and tablet server and another node running the tablet server…

apache cloudera impala apache-kudu

asked Sep 11 '16 at 12:57

user1478046

-1

votes

1 answer

Configuring Nutch to write to Apache Kudu

I am trying to configure Apache Nutch to write to Apache Kudu, but I cannot find anywhere informations about how to do it. I know I can write to Cassandra and HBase, but there is nothing about Kudu. The Hadoop distribution that I am using is CDH…

hadoop nosql web-crawler nutch apache-kudu

asked Feb 12 '19 at 09:56

Vitaly Olegovitch

3,509
6
33
49

-1

votes

1 answer

Read Impala table with SparkSQL

I was trying to execute a query that had functions like lead .. over .. partition and Union. This query works well when I try to run it on impala but fails on Hive. I need to write a Spark job that performs this query. It is failing as well in…

hive pyspark impala apache-spark-1.6 apache-kudu

asked Aug 28 '17 at 19:47

New Coder

-1

votes

1 answer

Is there any good big data store for update and delete queries?

I am using hive and hbase as back end stores. Hive is really good for raw data storage. But you cant run update and delete queries if you want good performance. Currently I am using phoenix on top of hbase. It is giving me good performance and sql…

hbase hadoop2 impala apache-kudu

asked Aug 11 '16 at 10:43

sdk

-2

votes

1 answer

Scala Play, Apache Spark and KuduContext incompatibilities

I don't know if this happening because Scala is so version restrictive or because all libraries are deprecated and not updated. I have a little project in Scala Play with Apache Spark. I want and I like to use latest versions of the libraries, so I…

scala apache-spark playframework apache-kudu

asked Jul 17 '20 at 10:03

AlleXyS

2,476
2
17
37

-2

votes

2 answers

This row was already applied and cannot be modified

When i run my code in the test environment to test my new code about kudu insert,it reports to me: This row was already applied and cannot be modified. I have already tried to debug my code and to see what is the problem in my code , but it is…

apache-kudu

asked Aug 19 '19 at 04:08

Jelly

Prev 1 2 3

…