Questions tagged [apache-kudu]

For questions related to Apache Kudu

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

Fast processing of OLAP workloads.

Integration with MapReduce, Spark and other Hadoop ecosystem components.

Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.

Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.

Strong performance for running sequential and random workloads simultaneously.

Easy to administer and manage with Cloudera Manager.

High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.

Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

Reporting applications where newly-arrived data needs to be immediately available for end users

Time-series applications that must simultaneously support:

queries across large amounts of historic data

granular queries about an individual entity that must return very quickly

Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data

134 questions

votes

1 answer

How create a kudu table in cloudera quickstart VM

I have been trying to create a kudu table in impala using the cloudera quickstart VM following this example https://kudu.apache.org/docs/quickstart.html CREATE TABLE sfmta PRIMARY KEY (report_time, vehicle_tag) PARTITION BY HASH(report_time)…

asked May 19 '18 at 21:17

Joseratts

votes

1 answer

How do you retrieve MIN value of a Apache Kudu table column?

I am using PySpark to connect to my Kudu database. I want to retrieve a min value in a column with a set of predicates. Can't seem to find an option in the API client = kudu.connect(host="myhost", port=1234) table =…

python apache-kudu

asked Apr 27 '18 at 12:33

rams

6,381
8
46
65

votes

1 answer

APACHE Kudu does not natively support range deletes or updates

Clarification requested on KUDU. In the KUDU guides the following is stated: Row delete and update operations must also specify the full primary key of the row to be changed. Kudu does not natively support range deletes or updates. The first part…

apache-kudu

asked Apr 05 '18 at 09:58

thebluephantom

16,458
8
40
83

votes

0 answers

Spark Streaming - Arb. State - Upsert to Kudu

Hoping someone can help. I'm trying to stream some data and keep the current state of IoT devices into Kudu. I'm currently using a ForeachWriter for the sink - sadly, it only works when there is a single row, if there is more than one row it hangs…

apache-spark spark-streaming apache-kudu

asked Apr 04 '18 at 20:23

Irianna

votes

1 answer

Spark2 submitting with master yarn gives error "URL must be set"

I am getting an exception, org.apache.spark.SparkException: A master URL must be set in your configuration I used spark2-submit with options deploy-mode = cluster and master = yarn. From my understanding, I should not be getting this exception with…

apache-spark cloudera-cdh apache-kudu

asked Mar 14 '18 at 23:05

Alter

3,332
4
31
56

votes

0 answers

Cannot connect to Kudu from Spark

I cannot properly connect to Kudu from Spark, error says "Kudu master has no leader" CDH 5.14 Kudu 1.6 Spark 1.6.0 standalone and 2.2.0 When I use Impala in HUE to create and query kudu tables, it works flawlessly. However, connecting from Spark…

apache-spark pyspark apache-spark-sql apache-kudu

asked Mar 02 '18 at 12:53

Susensio

votes

1 answer

Filtering from KuduRDD happen locally in Spark Application or in Kudu Server?

If I execute a Filter on KuduRDD, then first the Spark job read in all the data from Kudu table and execute the filter job within the Spark application, or the filtering happen on Kudu Server, and the Spark Application receive only filtered data?

apache-spark apache-kudu

asked Jan 20 '18 at 16:28

gszecsenyi

votes

1 answer

how to read Kudu by Impala in spark2

i want to read Kudu by Impala in spark2-shell, failed on many ways :( enter spark2-shell: spark2-shell --jars…

impala apache-spark-2.0 apache-kudu

asked Jan 10 '18 at 09:56

Autumn

votes

3 answers

Loading data from HDFS to Kudu

I'm trying to load data to a Kudu table but getting a strange result. In the Impala console I created an external table from the four HDFS files imported by Sqoop: drop table if exists hdfs_datedim; create external table hdfs_datedim ( ... ) row…

hdfs impala sqoop apache-kudu

asked Dec 19 '17 at 16:17

Jay

13,803
4
42
69

votes

1 answer

Hadoop Key-Value store with remote deploy

My application is launched from remotely pc via spark-submit in yarn-cluster mode with Kerberos keytab and principals by this guide: https://spark.apache.org/docs/latest/running-on-yarn.html. The advantages of this approach are that I have my own…

hadoop hadoop-yarn hazelcast ignite apache-kudu

asked Dec 13 '17 at 14:34

Andrei Iatsuk

votes

0 answers

Does impala scan (READ_LATEST mode) inconsistency only arise during leader change?

When I try to use impala to transfer massive data (about 100G) for one time and select count(1) immediately, I get the wrong total count. Then I execute the same sql again, the total count is correct. I want to know besides leader change, is there…

impala raft apache-kudu

asked Dec 05 '17 at 01:05

Tony Li

votes

1 answer

How to configurate apache kudu with Impala in DC/OS?

We need configurate Kudu master and kudu tablet server in DC/OS. We need The architecture is similar to that: Enter here How configurate the services in DC/OS to scale correctly. We need to have Impala Daemon, kudu Tablet Server and Hadoop Data Node…

impala dcos apache-kudu

asked Nov 28 '17 at 15:41

Ariel Debernardi

votes

2 answers

How to Visualize data in Apache Kudu?

Is it possible to visualize data in Apache Kudu? Is there any guideline for it?

data-visualization apache-kudu

asked Sep 08 '17 at 15:26

Rilwan

votes

2 answers

Kudu with PySpark2: Error with KuduStorageHandler

I am trying to read data in stored as Kudu using PySpark 2.1.0 >>> from os.path import expanduser, join, abspath >>> from pyspark.sql import SparkSession >>> from pyspark.sql import Row >>> spark = SparkSession.builder \ .master("local") \ …

hive cloudera-cdh apache-spark-sql apache-spark-2.0 apache-kudu

asked Aug 24 '17 at 22:33

New Coder

votes

1 answer

Migration form single master to mutlimaster Apache KUDU configuration

We have changed the configuration of our Apache KUDU. We have added 2 new kudu masters to the original one. PROBLEM: When we start KUDU it starts wil old leader (original master) and everything works now. But after a while leader is change to the…

cloudera-cdh impala apache-kudu

asked Jul 26 '17 at 15:10

BigDataMiner

Prev 1 2 3

…

9 Next