Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
0
votes
1 answer

While using KUDU Client API to create table, how do I set an order of columns ion the Primary Key?

I'm trying to write code that transfers some tables from SQL Server to KUDU using JAVA KuduClient API. On SQL I got a table that has composite Primary Key (for example: PRIMARY KEY ([ID], [DATA_SOURCE])). How do I guarantee that on KUDU the Primary…
0
votes
3 answers

Using Slick with Kudu/Impala

Kudu tables can be accessed via Impala thus its jdbc driver. Thanks to that it is accessable via standard java/scala jdbc api. I was wondering if it is possible to use slick for it. Or if not is any other high level scala db framework supporting…
abalcerek
  • 1,807
  • 1
  • 22
  • 27
0
votes
0 answers

Apache Drill Kudu query doesn't support range + hash multilevel partition

Drill Kudu query doesn't support range + hash multilevel partition. Kudu table : CREATE TABLE test1 ( id int , name string, value string, prmary key(id, name) ), PARTITION BY HASH (name) PARTITIONS 8, PARTITION BY RANGE (id) ( …
ibiu
  • 1
0
votes
1 answer

How to model a one to many relation in Apache Kudu?

I am trying to model a one-to-many relation in Apache Kudu. To sum up, Apache Kudu doesn't have: foreign keys array data types JSON support So the usual ways to model aren't available. How can I model the relation?
Vitaly Olegovitch
  • 3,509
  • 6
  • 33
  • 49
0
votes
0 answers

what is the best practice from Cloudera to migrate the parquet-based impala to kudu-based impala

We are using Cloudera as our hadoop environment. Can someone please provide any guildance on how to integrate or migrate existing parquet/impala to kudu/impala to hopefully get a performance improvement to our existing pipeline? Our existing…
mdivk
  • 3,545
  • 8
  • 53
  • 91
0
votes
1 answer

GeoMesa on Kudu Import with spatial data error

I use the example given in the tutorial to operate on my data, but after I import the data into Kudu, I find that the last field is not Geometry type. Could you please tell me how to solve this problem?
MaLong
  • 1
  • 1
0
votes
1 answer

pyspark: insert into dataframe if key not present or row.timestamp is more recent

I have a Kudu database with a table in it. Every day, I launch a batch job which receives new data to ingest (an ETL pipeline). I would like to insert the new data if: the key is not present if the key is present, update the row only if the…
Federico Ponzi
  • 2,682
  • 4
  • 34
  • 60
0
votes
0 answers

Error encountered writing to Kudu using Spark / Scala

I'm trying to write data into Kudu from Spark and I'm getting this error java.lang.UnsupportedOperationException at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:296) at…
M. Alexandru
  • 614
  • 5
  • 20
0
votes
1 answer

Is it possible to easily create a Kudu table from a PySpark Dataframe?

Ideally the following code snipped would work: import kudu from kudu.client import Partitioning df = … #some spark dataframe # Connect to Kudu master server client = kudu.connect(host=‘…‘, port=7051) # infer schema from spark…
Rick
  • 2,080
  • 14
  • 27
0
votes
2 answers

how to read from Kudu to python

I am trying to retrieve data from Kudu. But I am not able to install kudu-python package in anaconda or my server. Can I get some help with it? The documentation on the internet is not really clear.
0
votes
1 answer

how handle this error that i am facing when trying to write from SQL to Kudu via Pyspark

I want to write a huge table from SQL to Kudu Table, I am not able to write it to a Kudu table. With the following code: kuduDF.write.format('org.apache.kudu.spark.kudu') .option('kudu.master',kudu_master) …
0
votes
1 answer

Any suggestions for analytical columnar DB which can be modified?

I need to build a customer 360 degree database, which requires: A wide-column table, each customer is one row, with lots of columns (says > 1000) We have ~20 batch update analytics jobs running daily. Each analytics job queries and updates a small…
Tung Vs
  • 113
  • 10
0
votes
1 answer

Unable to start Kudu master

While starting kudu-master, I am getting the below error and unable to start kudu cluster. F0706 10:21:33.464331 27576 master_main.cc:71] Check failed: _s.ok() Bad status: Invalid argument: Unable to initialize catalog manager: Failed to initialize…
Sangeeta
  • 491
  • 5
  • 22
0
votes
1 answer

not able to create table in kudu using impala-shell

I was doing R&D on hadoop, hive, impala, and kudu. Installed HADOOP, HIVE, IMPALA, and KUDU servers. I have configured --kudu_master_hosts=: in /etc/default -> impala file. i.e like below: IMPALA_SERVER_ARGS=" \ -log_dir=${IMPALA_LOG_DIR} \ …
Akshay P
  • 1
  • 3
0
votes
2 answers

Can I add more than 300 columns in Apache Kudu?

I have been asked to create a Kudu table. I know that Kudu is a columnar storage, but now my company's database table has like 285 columns which can fit in the the Kudu table, but is it possible to dynamically add columns in excess of the 300 column…
HJSG
  • 41
  • 1
  • 6
1 2 3
8 9