Highest Voted 'apache-hudi' Questions

1

vote

0 answers

java.lang.NoClassDefFoundError: org/apache/parquet/schema/LogicalTypeAnnotation$UUIDLogicalTypeAnnotation while fetching data from Hudi

I am trying to view some data from Hudi using below code in spark. import org.apache.hudi.DataSourceReadOptions; val hudiIncQueryDF = spark .read() .format("hudi") .option(DataSourceReadOptions.QUERY_TYPE,…

apache-spark google-cloud-dataproc apache-hudi

asked May 30 '22 at 07:57

radhika sharma

499
1
9
28

1

vote

1 answer

pyspark: Get hudi last/latest commit using pyspark

I am doing an incremental query with spark-hudi every hour and saving that incremental query begin and end time in db(say mysql) everytime. For nexti ncemental query I use begin time as end time of previous query fetch from mysql. incremental query…

python dataframe apache-spark pyspark apache-hudi

asked May 16 '22 at 04:18

MOHD NAYYAR

41
4

1

vote

1 answer

class "org.apache.flink.streaming.api.operators.MailboxExecutor" not found

when I use hudi 0.10.1 and flink1.14.0 , I got an exception " not found class org.apache.flink.streaming.api.operators.MailboxExecutor" I found "MailboxExecutor" is in the flink1.13.1 , how can I do it？ complie with flink 1.14 ?

apache-flink flink-streaming flink-sql apache-hudi

asked Mar 22 '22 at 12:09

Michael Ran

11
2

1

vote

0 answers

Can we incrementally query a Hudi table based on a custom column (Spark SQL)

I'm trying to ingest historical data into a data catalog using Apache Hudi upsert. As the data is years and months old, I wanted to iterate each month, adding the historical date as a column to be queryable. The problem is: incremental queries in…

apache-spark apache-hudi

asked Mar 11 '22 at 13:43

Denis Moura

21
4

1

vote

1 answer

Getting duplicate records while querying Hudi table using Hive on Spark Engine in EMR 6.3.1

I am querying a Hudi table using Hive which is running on Spark engine in EMR cluster 6.3.1 Hudi version is 0.7 I have inserted a few records and then updated the same using Hudi Merge on Read. This will internally create new files under the same…

apache-spark hive amazon-emr apache-hudi

asked Mar 10 '22 at 18:37

vijayinani

2,548
2
26
48

1

vote

0 answers

Committing hudi files manually

I am using spark 3.x with apache-hudi 0.8.0 version. While I am trying to create presto table by using hudi-hive-sync tool I am getting below error. Got runtime exception when hive syncing java.lang.IllegalArgumentException: Could not find any data…

hive presto apache-hudi

asked Jan 21 '22 at 01:41

Shasu

458
5
22

1

vote

1 answer

What does each section of the Parquet file name written with Apache Hudi represent?

Apache Hudi writes out each parquet file like below: 0743209d-51cb-4233-a7cd-5bb712fba1ff-0_21-64-5300_20211117172738.parquet I'm trying to understand what each section of the file represents. Here is my current understanding but I would like…

apache-spark parquet apache-hudi

asked Nov 17 '21 at 19:54

cauthon

161
1
10

1

vote

2 answers

EMR Hudi cannot create hive connection jdbc:hive2://localhost:10000/

Trying to save hudi table in Jupyter notebook with hive-sync enabled. I am using EMR: 5.28.0 with AWS Glue as catalog enabled: # Create a DataFrame inputDF = spark.createDataFrame( [ ("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"), …

apache-spark pyspark amazon-emr aws-glue apache-hudi

asked Oct 07 '21 at 16:48

dytyniak

364
3
10

1

vote

0 answers

AWS Partitioned Hudi

I have a dataset of around 180000000 records in .csv that I transform in hudi parquet through glue job. It's partitioned by one column. It writes all successfully, but it takes too long to read hudi data in glue job (>30min). I tried to read only…

amazon-web-services aws-glue amazon-athena apache-hudi

asked Oct 01 '21 at 11:39

robotic_arm13

13
3

1

vote

1 answer

Hudi partition and upsert are not working

what is wrong in this config , partition keys are not working in HUDI as well as all the records get updated in the hudi dataset while doing the upsert . so couldnt extract the delta from the tables. commonConfig = {'className' :…

pyspark apache-hudi

asked Aug 29 '21 at 05:50

Suganya

37
6

1

vote

0 answers

Partition pruning not working on Hudi dataset

We have created a Hudi dataset which has two level partition like this s3://somes3bucket/partition1=value/partition2=value where partition1 and partition2 is of type string When running a simple count query using Hudi format in spark-shell, it…

apache-spark apache-spark-sql apache-hudi

asked Aug 09 '21 at 16:07

Raj

2,368
6
34
52

1

vote

1 answer

What is the timestamp format of _hoodie_commit_time column in Apache Hudi?

I'm exploring apache-hudi framework and following the quick guide. I'm trying out incremental query functionality, where we use the column _hoodie_commit_time for determining the incremental pull. I was wondering what is the timestamp format &…

apache-hudi

asked Aug 01 '21 at 05:32

Anoop Deshpande

514
1
6
23

1

vote

0 answers

After upsert is performed on the original table, writeToken in the Parquet file name of Hudi changes, resulting in Incremental query failure

@[toc] 0 Reason guess Every time we upsert the target, hoodie generates a log and compacts it, causing any incremental query before that point in time to die. 1 Here are all the operations to do with the original label. 1.1 Operation 1 (Update) An…

apache-spark hadoop apache-hudi

asked Jul 30 '21 at 12:01

fanxinglanyu

11
1

1

vote

1 answer

Is there a way to use Apache Hudi on AWS glue?

Trying to explore apach hudi for doing incremental load using S3 as a source and then finally saving the output to a different location in S3 through AWS glue job. Any blogs/articles which can help here as a starting point ?

apache-spark amazon-s3 aws-glue apache-hudi

asked Apr 28 '21 at 10:32

shikeb

11
1
3

1

vote

2 answers

Unable to run spark.sql on AWS Glue Catalog in EMR when using Hudi

Our setup is configured that we have a default Data Lake on AWS using S3 as storage and Glue Catalog as our metastore. We are starting to use Apache Hudi and we could get it working following de AWS documentation. The issue is that, when using the…

amazon-emr aws-glue aws-glue-data-catalog apache-hudi

asked Apr 09 '21 at 19:55

gabra

9,484
4
29
45

Questions tagged [apache-hudi]