Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
0
votes
1 answer

How create a kudu table in cloudera quickstart VM

I have been trying to create a kudu table in impala using the cloudera quickstart VM following this example https://kudu.apache.org/docs/quickstart.html CREATE TABLE sfmta PRIMARY KEY (report_time, vehicle_tag) PARTITION BY HASH(report_time)…
Joseratts
  • 97
  • 1
  • 9
0
votes
1 answer

How do you retrieve MIN value of a Apache Kudu table column?

I am using PySpark to connect to my Kudu database. I want to retrieve a min value in a column with a set of predicates. Can't seem to find an option in the API client = kudu.connect(host="myhost", port=1234) table =…
rams
  • 6,381
  • 8
  • 46
  • 65
0
votes
1 answer

APACHE Kudu does not natively support range deletes or updates

Clarification requested on KUDU. In the KUDU guides the following is stated: Row delete and update operations must also specify the full primary key of the row to be changed. Kudu does not natively support range deletes or updates. The first part…
thebluephantom
  • 16,458
  • 8
  • 40
  • 83
0
votes
0 answers

Spark Streaming - Arb. State - Upsert to Kudu

Hoping someone can help. I'm trying to stream some data and keep the current state of IoT devices into Kudu. I'm currently using a ForeachWriter for the sink - sadly, it only works when there is a single row, if there is more than one row it hangs…
0
votes
1 answer

Spark2 submitting with master yarn gives error "URL must be set"

I am getting an exception, org.apache.spark.SparkException: A master URL must be set in your configuration I used spark2-submit with options deploy-mode = cluster and master = yarn. From my understanding, I should not be getting this exception with…
Alter
  • 3,332
  • 4
  • 31
  • 56
0
votes
0 answers

Cannot connect to Kudu from Spark

I cannot properly connect to Kudu from Spark, error says "Kudu master has no leader" CDH 5.14 Kudu 1.6 Spark 1.6.0 standalone and 2.2.0 When I use Impala in HUE to create and query kudu tables, it works flawlessly. However, connecting from Spark…
Susensio
  • 820
  • 10
  • 19
0
votes
1 answer

Filtering from KuduRDD happen locally in Spark Application or in Kudu Server?

If I execute a Filter on KuduRDD, then first the Spark job read in all the data from Kudu table and execute the filter job within the Spark application, or the filtering happen on Kudu Server, and the Spark Application receive only filtered data?
gszecsenyi
  • 93
  • 9
0
votes
1 answer

how to read Kudu by Impala in spark2

i want to read Kudu by Impala in spark2-shell, failed on many ways :( enter spark2-shell: spark2-shell --jars…
Autumn
  • 109
  • 13
0
votes
3 answers

Loading data from HDFS to Kudu

I'm trying to load data to a Kudu table but getting a strange result. In the Impala console I created an external table from the four HDFS files imported by Sqoop: drop table if exists hdfs_datedim; create external table hdfs_datedim ( ... ) row…
Jay
  • 13,803
  • 4
  • 42
  • 69
0
votes
1 answer

Hadoop Key-Value store with remote deploy

My application is launched from remotely pc via spark-submit in yarn-cluster mode with Kerberos keytab and principals by this guide: https://spark.apache.org/docs/latest/running-on-yarn.html. The advantages of this approach are that I have my own…
Andrei Iatsuk
  • 452
  • 6
  • 17
0
votes
0 answers

Does impala scan (READ_LATEST mode) inconsistency only arise during leader change?

When I try to use impala to transfer massive data (about 100G) for one time and select count(1) immediately, I get the wrong total count. Then I execute the same sql again, the total count is correct. I want to know besides leader change, is there…
Tony Li
  • 11
  • 1
0
votes
1 answer

How to configurate apache kudu with Impala in DC/OS?

We need configurate Kudu master and kudu tablet server in DC/OS. We need The architecture is similar to that: Enter here How configurate the services in DC/OS to scale correctly. We need to have Impala Daemon, kudu Tablet Server and Hadoop Data Node…
0
votes
2 answers

How to Visualize data in Apache Kudu?

Is it possible to visualize data in Apache Kudu? Is there any guideline for it?
Rilwan
  • 88
  • 11
0
votes
2 answers

Kudu with PySpark2: Error with KuduStorageHandler

I am trying to read data in stored as Kudu using PySpark 2.1.0 >>> from os.path import expanduser, join, abspath >>> from pyspark.sql import SparkSession >>> from pyspark.sql import Row >>> spark = SparkSession.builder \ .master("local") \ …
0
votes
1 answer

Migration form single master to mutlimaster Apache KUDU configuration

We have changed the configuration of our Apache KUDU. We have added 2 new kudu masters to the original one. PROBLEM: When we start KUDU it starts wil old leader (original master) and everything works now. But after a while leader is change to the…
BigDataMiner
  • 93
  • 1
  • 7
1 2 3
8
9