Highest Voted 'apache-hudi' Questions

2

votes

1 answer

Custom Payload class in Python for precombine and combineAndGet in Apache Hudi And Pyspark

We are migrating our code base from spark-java to PySpark. We were handling custom aggregations for merging data using preCombine() and combineAndGetUpdateValue() and had implemented this in our Spark-Java code. Example below: package…

asked Apr 21 '22 at 06:32

Sanchit Kulkarni

41
7

2

votes

0 answers

How to access hudi metrics

How can the Hudi metrics be accessed programatically. After a commit I would like to get metrics like records updated / records inserted and log them into a database. I tried setting hoodie.metrics.on=true and hoodie.metrics.reporter.type=INMEMORY.…

apache-spark hadoop apache-hudi

asked Nov 08 '21 at 07:34

Joha

935
12
32

2

votes

1 answer

Connect Redshift Spectrum/ AWS EMR with Hudi directly or via AWS Glue Data Catalog

I'm trying to understand how to properly connect Redshift Spectrum with Hudi data. Looks like I can directly create Redshift external table for data managed in Apache Hudi like it is described by the following documentation…

amazon-web-services amazon-s3 amazon-emr amazon-redshift-spectrum apache-hudi

asked Sep 12 '21 at 13:48

alexanoid

24,051
54
210
410

2

votes

0 answers

How to get the last commit for a key when reading apache hudi MOR table reading in incremental mode?

I have a MOR table with key = acctid, when I do 3 commits on same key and try to read in incremental mode I see only 1st commit, is there anyway to read the last commit or all commits for a given key using incremental mode ? Please check details…

pyspark merge apache-hudi

asked Jul 19 '21 at 18:54

sannidhi

23
5

2

votes

0 answers

java.lang.ClassNotFoundException: org.apache.parquet.hadoop.metadata.CompressionCodecName

Does anyone have such a problem when using Hudi to integrate with the spark shell? I just started learning Hudi by the official document. The version of the environment is CDH-5.16.2,spark-2.3.0. import…

scala apache-spark apache-hudi

asked May 07 '21 at 07:58

shiwei

21
2

2

votes

2 answers

Apache Spark and Hudi: tons of output files

I'm trying to read data from many different .csv files ( all with the same "structure" ), perform some operations with Spark and finally save them in Hudi format. To store data in the same Hudi table I thought the best approach would be to use the…

apache-spark pyspark apache-spark-sql apache-hudi

asked Apr 14 '21 at 15:03

Baobabbo

21
6

2

votes

1 answer

How to run hudi on dataproc and write to gcs bucket

I want to write to a gcs bucket from dataproc using hudi. To write to gcs using hudi it says to set prop fs.defaultFS to value gs:// (https://hudi.apache.org/docs/gcs_hoodie) However when I set fs.defaultFS on dataproc to be a gcs bucket I get…

google-cloud-dataproc apache-hudi dataproc

asked Apr 12 '21 at 15:54

Funzo

1,190
2
14
25

2

votes

1 answer

Writing spark DataFrame In Apache Hudi Table

I am new to apace hudi and trying to write my dataframe in my Hudi table using spark shell. For type first time i am not creating any table and writing in overwrite mode so I am expecting it will create hudi table.I am Writing below code. …

apache-spark hive apache-hudi

asked Mar 19 '21 at 09:54

Rahul Patidar

47
1
3
9

2

votes

2 answers

Issue for Integrating Hudi with Kafka using Avro Schema

I am trying to integrate Hudi with Kafka topic. Steps followed : Created Kafka topic in Confluent with schema defined in schema registry. Using kafka-avro-console-producer, I am trying to produce data. Running Hudi Delta Streamer in continuous mode…

apache-spark avro confluent-platform confluent-schema-registry apache-hudi

asked Feb 25 '21 at 16:36

Prashant

21
5

2

votes

0 answers

Flink's hive streaming vs iceberg/hudi/delta

There are some open sourced datake solutions that support crud/acid/incremental pull,such as Iceberg, Hudi, Delta. I think they have done what flink's hive streaming wants to do and even do better, So, I would ask what the real power of flink's hive…

apache-flink delta-lake apache-hudi iceberg

asked Nov 28 '20 at 05:59

Tom

5,848
12
44
104

2

votes

2 answers

More than 1 column in record key in spark Hudi Job while making an upsert

I am currently doing a POC on deltalake where I came across this framework called Apache Hudi. Below is the data I am trying to write using apache spark framework. private val INITIAL_ALBUM_DATA = Seq( Album(800,810, "6 String Theory", Array("Lay…

apache-spark apache-spark-sql apache-hudi

asked Aug 29 '20 at 10:30

user3199285

177
2
12

2

votes

1 answer

Error while running Apache Hudi deltastreamer

I a trying to run Hudi deltastreamer on AWS EMR. Followed the steps in this blog. https://cwiki.apache.org/confluence/pages/viewrecentblogposts.action?key=HUDI But when I run the below spark submit, error comes: Exception in thread "main"…

apache-spark spark-streaming parquet apache-hudi

asked Jan 31 '20 at 15:02

raghuvd

665
1
5
6

2

votes

1 answer

Spark streaming - Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file

Im using spark to write my json data to s3. However, I keep getting the below error. We are using apache hudi for updates. This only happens for some data, everything else works fine. Caused by: org.apache.parquet.io.ParquetDecodingException: Can…

apache-spark spark-streaming parquet hoodie apache-hudi

asked Dec 26 '19 at 19:55

mythic

535
7
21

1

vote

1 answer

deltastreamer.HoodieDeltaStreamer exceptio: Filesystem closed

I am using HoodieDeltaStreamer to connect kafka and store data to hoodie table Hudi version : 0.10.1 Spark : 3.2.4 Hadoop : 3.3.5 Only one spark-submit job is running cmd : spark-submit --class…

apache-spark data-lake apache-hudi

asked Jun 27 '23 at 09:25

Ankit Bansal

2,162
8
42
79

1

vote

0 answers

Unarchiving Apache Hudi archived commits

Is it possible to unarchive an archived commit with Apache Hudi? For example, I've set the following configuration and have 4 commits. hoodie.keep.max.commits = 3 commit1 commit2 commit3 commit4 On the 4th commit, commit1 is archived and moved to…

apache-hudi

asked May 10 '23 at 08:52

James Burton

37
6

Questions tagged [apache-hudi]