Questions tagged [apache-kudu]

For questions related to Apache Kudu

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

Fast processing of OLAP workloads.

Integration with MapReduce, Spark and other Hadoop ecosystem components.

Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.

Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.

Strong performance for running sequential and random workloads simultaneously.

Easy to administer and manage with Cloudera Manager.

High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.

Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

Reporting applications where newly-arrived data needs to be immediately available for end users

Time-series applications that must simultaneously support:

queries across large amounts of historic data

granular queries about an individual entity that must return very quickly

Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data

134 questions

vote

0 answers

Kudu Client fails with exceptions after running for few days

I have a Scala/Spark/Kafka process that I run. When I first start the process I create a KuduClient Object using a function I made that I share between classes. For this job I only create the KuduClient once, and let the process run continuously.…

apache-spark cloudera apache-kudu

asked Oct 24 '18 at 18:46

alex

1,905
26
51

vote

3 answers

How to find KUDU master name or port in which KUDU DB in my cloudera cluster?

I am trying to write a Spark dataframe to Kudu DB, but I do not know the Kudu master. The cluster I am using is a Cloudera cluster. How do I find Kudu master in the cluster?

python-3.x pyspark cloudera impala apache-kudu

asked Sep 13 '18 at 21:19

Karthik reddy

vote

2 answers

Apache Kudu slow insert, high queuing time

I have been using Spark Data Source to write to Kudu from Parquet, and the write performance is terrible: about 12000 rows / seconds. Each row roughly 160 bytes. We have 7 kudu nodes, 24 core + 64 GB RAM each + 12 SATA disk each. None of the…

performance apache-spark apache-kudu data-ingestion

asked Aug 13 '18 at 04:55

Tung Vs

vote

1 answer

How to insert data from Kafka to Kudu using Spark streaming

I have a Spark streaming application that listens to a Kafka topic. When getting the data I need to process it and send to Kudu. Currently I am using org.apache.kudu.spark.kudu.KuduContext API and call the insert action with the data frame. In order…

apache-spark apache-kafka spark-streaming apache-kudu

asked Aug 08 '18 at 13:12

LubaT

vote

2 answers

NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive

I am trying to create a Kudu table using Impala-shell. Query: CREATE TABLE lol ( uname STRING, age INTEGER, PRIMARY KEY(uname) ) STORED AS KUDU TBLPROPERTIES ( 'kudu.master_addresses' = '127.0.0.1' ); CREATE TABLE t (k INT PRIMARY KEY) STORED…

impala apache-kudu

asked Aug 07 '18 at 06:27

user9518134

vote

1 answer

Timestamp Primary key Kudu

I am trying to load data into Kudu table through envelope. One of the primary key column is timestamp. DDL : CREATE TABLE BAL ( client_id int bal_id int, effective_time timestamp, prsn_id int, bal_amount double, prsn_name string, PRIMARY KEY…

apache apache-kudu

asked May 05 '18 at 06:25

Abhinav Singh

vote

1 answer

Truncate Kudu table using Spark

What is the best way to truncate kudu table from spark? Is there any analogue of SQL "TRUNCATE TABLE_NAME;" or "DELETE FROM TALBE_NAME;"? I just managed to find kuduContext.deleteRows, but it requires explicit specification rows to delete. Or I…

apache-spark truncate apache-kudu

asked Apr 24 '18 at 16:18

Vladimir Kravets

vote

1 answer

Running Kudu in a docker and master to tserver two-way connection / circular link issues - docker composition

How can you run Kudu, which requires two containers - one for the master and one for the tserver under docker, when the two containers need to connect to each other by DNS. Kudu can be run under Docker using the following commands: docker run --name…

docker docker-compose apache-kudu

asked Apr 05 '18 at 12:45

Danny Varod

17,324
5
69
111

vote

1 answer

How to write and update by kudu API in Spark 2.1

I want to write and update by Kudu API. This is the maven dependency: org.apache.kudu kudu-client 1.1.0 …

scala apache-spark apache-kudu

asked Jan 09 '18 at 11:42

Autumn

vote

1 answer

sqoop syntax to import to kudu table

We'd like to test Kudu and need to import data. Sqoop seems like the correct choice. I find references that you can import to Kudu but no specifics. Is there any way to import to Kudu using Sqoop?

sqoop apache-kudu

asked Dec 13 '17 at 19:10

Jay

13,803
4
42
69

vote

1 answer

Kudu nested field

I have questions about Kudu with nested fields. I have JSON from Kafka like this: { "ts": 32, "status": "success", "uid": "3232", "url": "http://some_url", "syncpixel": "http://some_url", "dfp": { "DFP_UABrowser": "Chrome 61", …

apache-spark nested apache-kudu

asked Nov 16 '17 at 13:14

Артем Кулбасов

vote

1 answer

SPARK KUDU Complex Update statements directly or via Impala JDBC Driver possible?

If I look at the Imapala Shell or Hue, I can write complicated enough IMPALA update statements for KUDU. E.g. update with sub-select and what not. Fine. Looking at the old JDBC connection methods for, say, mySQL via SPARK / SCALA, there is not a lot…

apache-spark impala apache-kudu

asked Nov 08 '17 at 11:24

thebluephantom

16,458
8
40
83

vote

1 answer

What's Amazon Web Services native offering is closest to Apache Kudu?

I am looking for a native offering, such as any of the RDS solutions, Elastic Cache, Amazon Redshift, not something that I would have to host myself. From the Apache Kudu: https://kudu.apache.org/ : Kudu provides a combination of fast…

sql amazon-web-services bigdata bigtable apache-kudu

asked Oct 17 '17 at 23:16

Mateo

1,494
1
18
27

vote

0 answers

Delete impala reference to Kudu table

I hava an Impala Kudu setup where I have the following table: CREATE TABLE IF NOT EXISTS impala_table (id STRING), PRIMARY KEY (id)) distribute BY hash(id) into 5 buckets STORED AS kudu TBLPROPERTIES('kudu.table_name' = 'impala_tabl',…

impala apache-kudu

asked Oct 09 '17 at 07:57

T. Bombeke

vote

1 answer

Apache Kudu with Apache Spark NoSuchMethodError: exportAuthenticationCredentials

I have this function with Spark and Scala: import org.apache.kudu.client.CreateTableOptions import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ import org.apache.spark.sql.{DataFrame, Dataset, Encoders, SparkSession} import…

scala apache-spark apache-kudu

asked Mar 22 '17 at 09:17

Josemy

Prev 1 2 3

…

8 9 Next