Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
1
vote
0 answers

Kudu Client fails with exceptions after running for few days

I have a Scala/Spark/Kafka process that I run. When I first start the process I create a KuduClient Object using a function I made that I share between classes. For this job I only create the KuduClient once, and let the process run continuously.…
alex
  • 1,905
  • 26
  • 51
1
vote
3 answers

How to find KUDU master name or port in which KUDU DB in my cloudera cluster?

I am trying to write a Spark dataframe to Kudu DB, but I do not know the Kudu master. The cluster I am using is a Cloudera cluster. How do I find Kudu master in the cluster?
1
vote
2 answers

Apache Kudu slow insert, high queuing time

I have been using Spark Data Source to write to Kudu from Parquet, and the write performance is terrible: about 12000 rows / seconds. Each row roughly 160 bytes. We have 7 kudu nodes, 24 core + 64 GB RAM each + 12 SATA disk each. None of the…
Tung Vs
  • 113
  • 10
1
vote
1 answer

How to insert data from Kafka to Kudu using Spark streaming

I have a Spark streaming application that listens to a Kafka topic. When getting the data I need to process it and send to Kudu. Currently I am using org.apache.kudu.spark.kudu.KuduContext API and call the insert action with the data frame. In order…
LubaT
  • 129
  • 2
  • 9
1
vote
2 answers

NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive

I am trying to create a Kudu table using Impala-shell. Query: CREATE TABLE lol ( uname STRING, age INTEGER, PRIMARY KEY(uname) ) STORED AS KUDU TBLPROPERTIES ( 'kudu.master_addresses' = '127.0.0.1' ); CREATE TABLE t (k INT PRIMARY KEY) STORED…
user9518134
1
vote
1 answer

Timestamp Primary key Kudu

I am trying to load data into Kudu table through envelope. One of the primary key column is timestamp. DDL : CREATE TABLE BAL ( client_id int bal_id int, effective_time timestamp, prsn_id int, bal_amount double, prsn_name string, PRIMARY KEY…
Abhinav Singh
  • 109
  • 1
  • 2
  • 10
1
vote
1 answer

Truncate Kudu table using Spark

What is the best way to truncate kudu table from spark? Is there any analogue of SQL "TRUNCATE TABLE_NAME;" or "DELETE FROM TALBE_NAME;"? I just managed to find kuduContext.deleteRows, but it requires explicit specification rows to delete. Or I…
Vladimir Kravets
  • 330
  • 6
  • 21
1
vote
1 answer

Running Kudu in a docker and master to tserver two-way connection / circular link issues - docker composition

How can you run Kudu, which requires two containers - one for the master and one for the tserver under docker, when the two containers need to connect to each other by DNS. Kudu can be run under Docker using the following commands: docker run --name…
Danny Varod
  • 17,324
  • 5
  • 69
  • 111
1
vote
1 answer

How to write and update by kudu API in Spark 2.1

I want to write and update by Kudu API. This is the maven dependency: org.apache.kudu kudu-client 1.1.0
Autumn
  • 109
  • 13
1
vote
1 answer

sqoop syntax to import to kudu table

We'd like to test Kudu and need to import data. Sqoop seems like the correct choice. I find references that you can import to Kudu but no specifics. Is there any way to import to Kudu using Sqoop?
Jay
  • 13,803
  • 4
  • 42
  • 69
1
vote
1 answer

Kudu nested field

I have questions about Kudu with nested fields. I have JSON from Kafka like this: { "ts": 32, "status": "success", "uid": "3232", "url": "http://some_url", "syncpixel": "http://some_url", "dfp": { "DFP_UABrowser": "Chrome 61", …
1
vote
1 answer

SPARK KUDU Complex Update statements directly or via Impala JDBC Driver possible?

If I look at the Imapala Shell or Hue, I can write complicated enough IMPALA update statements for KUDU. E.g. update with sub-select and what not. Fine. Looking at the old JDBC connection methods for, say, mySQL via SPARK / SCALA, there is not a lot…
thebluephantom
  • 16,458
  • 8
  • 40
  • 83
1
vote
1 answer

What's Amazon Web Services *native* offering is closest to Apache Kudu?

I am looking for a native offering, such as any of the RDS solutions, Elastic Cache, Amazon Redshift, not something that I would have to host myself. From the Apache Kudu: https://kudu.apache.org/ : Kudu provides a combination of fast…
Mateo
  • 1,494
  • 1
  • 18
  • 27
1
vote
0 answers

Delete impala reference to Kudu table

I hava an Impala Kudu setup where I have the following table: CREATE TABLE IF NOT EXISTS impala_table (id STRING), PRIMARY KEY (id)) distribute BY hash(id) into 5 buckets STORED AS kudu TBLPROPERTIES('kudu.table_name' = 'impala_tabl',…
T. Bombeke
  • 107
  • 1
  • 8
1
vote
1 answer

Apache Kudu with Apache Spark NoSuchMethodError: exportAuthenticationCredentials

I have this function with Spark and Scala: import org.apache.kudu.client.CreateTableOptions import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ import org.apache.spark.sql.{DataFrame, Dataset, Encoders, SparkSession} import…
Josemy
  • 810
  • 1
  • 12
  • 29
1 2 3
8 9