Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
2
votes
2 answers

Kudu table comments not showing up. What should I do?

This is my create statement for impala-shell: CREATE TABLE IF NOT EXISTS tmp.demo0011( uid Bigint, comment'用户uid' nick String, comment'昵称' primary key(uid) ) partition by hash(uid) partitions 128 stored as kudu tblproperties ( …
mrzhang
  • 31
  • 4
2
votes
1 answer

Impala concurrent query delay

My cluster configuration is as follows: 3 Node cluster 128GB RAM per cluster node. Processor: 16 core HyperThreaded per cluster node. All 3 nodes have Kudu master and T-Server and Impala server, one of the node has Impala catalogue and Impala…
Prog_G
  • 1,539
  • 1
  • 8
  • 22
2
votes
1 answer

How to test spring batch step which reads from database and writes into a file?

I would like to know what would be the best approach to test the below scenario in a Spring Batch job: A job consisting of two steps: 1) The first step reads from a database using an ItemReader (from apache kudu using impala) and writes into a…
2
votes
2 answers

kerberos authentication in Kudu for spark2 job

I am trying to put some data in kudu, but the worker cannot find the kerberos token, so I am not able to put some data into the kudu database. here you can see my spark2-submit statement spark2-submit --master yarn "spark.yarn.maxAppAttempts=1"…
Lukas
  • 31
  • 1
  • 3
2
votes
1 answer

Kudu table column containing created timestamp

We are trying to create a kudu table that should contain a column holding the timestamp when the records are getting inserted. We tried the below : create table clcs.table_a ( store_nbr string, load_dttm timestamp default now(), …
srikanth ramesh
  • 131
  • 1
  • 11
2
votes
4 answers

Spark structured stream to kudu context

I want to read kafka topic then write it to kudu table by spark streaming. My first approach // sessions and contexts val conf = new SparkConf().setMaster("local[2]").setAppName("TestMain") val sparkSession =…
Jihun No
  • 1,201
  • 1
  • 14
  • 29
2
votes
0 answers

Impala KUDU table - howto bulk update

I need to performing updates of KUDU table, Is there any option to du update in bulk? The flow is following: 1 .Fetch 1000 rows 2. Process rows, calculate new value for each row 3. Update KUDU table with new values Updating row by row with one DB…
2
votes
1 answer

Load a text file into Apache Kudu table?

How do you load a text file to an Apache Kudu table? Does the source file need to be in HDFS space first? If it doesn't share the same hdfs space as other hadoop ecosystem programs (ie/ hive, impala), is there Apache Kudu equivalent of: hdfs dfs…
boethius
  • 418
  • 6
  • 15
2
votes
3 answers

How to access to apache kudu table created from impala using apache spark

I downloaded the quickstart VM of apache kudu and I have followed the examples just like they appears in this page https://kudu.apache.org/docs/quickstart.html, in fact I created the table named "sfmta" but when I tried to to access to the kudu…
Joseratts
  • 97
  • 1
  • 9
1
vote
0 answers

Type VARCHAR(n) is not supported in Kudu error when creating table in Impala 3.4.0

I'm trying to create a table with a varchar(30) column in Impala 3.4.0/Kudu 1.14.0 According to this Jira ticket it is exactly Impala 3.4.0 where support for varchar columns was included for the first time. Is there any problem with my understanding…
1
vote
0 answers

KuduSink fails to start

I'm trying to write a ETL pipeline from kafka to HDFS using flink. I'm using the bahir KuduSink and a PojoOperationMapper It throws an exception before starting. I've included my code, pom, and exception stack trace. Is there something obvious I'm…
1
vote
0 answers

Docker (compose) networking

I have a setup which without configuration change sometimes work, sometimes not, and I would welcome any help to understand why (and have it work 100% of the time). Setup Platform: Windows 10 WSL2, ubuntu 21.04 docker compose 1.29.2 docker engine…
Guillaume
  • 2,325
  • 2
  • 22
  • 40
1
vote
0 answers

Installing apache kudu in docker in windows machine

When installing apache kudu in docker by executing the below command set: KUDU_QUICKSTART_IP=$(ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' | tail -1) I get below error: tail: option used in invalid context -- 1 How to avoid…
1
vote
0 answers

Query kerberosed database

I have an Impala Kudu database secured via Kerberos. Even if I specify the database name in connection string, this will be ignored and I need to use it in my query (which is annoying because I have a lot of queries generated dynamically). Due of…
AlleXyS
  • 2,476
  • 2
  • 17
  • 37
1
vote
1 answer

Spark Scala DateType schema execution error

I get an execution error when I try to create a Schema for a dataframe in Spark Scala that says: Exception in thread "main" java.lang.IllegalArgumentException: No support for Spark SQL type DateType at…
user2728349
  • 139
  • 1
  • 3
  • 12
1
2
3
8 9