Highest Voted 'apache-hudi' Questions

0

votes

1 answer

AWS S3 (ap-south-1) returns Bad Request for Hudi DeltaStreamer job

I'm trying to run a DeltaStreamer job to push data to S3 bucket using the following cmd: spark-submit \ --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 \ --conf…

asked Jul 26 '21 at 07:00

ProgramSpree

372
5
21

0

votes

2 answers

Why apache-hudi is creating COPY_ON_WRITE table even if I have given MERGE_ON_READ?

I am trying to create a simple hudi table with MERGE_ON_READ table type. After executing the code still in hoodie.properties file I see hoodie.table.type=COPY_ON_WRITE Am I missing something here ? Jupyter Notebook for this code:…

pyspark apache-hudi

asked Jul 10 '21 at 13:14

sannidhi

23
5

0

votes

2 answers

How prevent hudi to write patition columns into data?

Consider the following: data are read from partitioned structure y=,m=,d=. hudi DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY config is set to y=,m=,d= At first glance I have to remove y,m,d columns but without these columns hudi will not…

java scala apache-spark apache-hudi

asked Jul 06 '21 at 09:19

Cherry

31,309
66
224
364

0

votes

1 answer

Delete in Apache Hudi - Glue Job

I have to build a Glue Job for updating and deleting old rows in Athena table. When I run my job for deleting it returns an error: AnalysisException: 'Unable to infer schema for Parquet. It must be specified manually.;' My Glue Job: datasource0 =…

pyspark aws-glue apache-hudi

asked Jul 01 '21 at 14:56

Mateja K

57
2
12

0

votes

1 answer

How Can Apache Hudi merge delta asynchronously?

I'm new to Apache Hudi. In Apache Hudi, merge on read table type merge delta data asynchronously. It is merged when data is queried or the merge config(interval or unmerged commit count) is meet. But Hudi has not own background process, otherwise…

data-lake apache-hudi

asked Jun 29 '21 at 10:59

SHRIN

318
3
15

0

votes

2 answers

need help on submitting hudi delta streamer job via apache livy

I am little confused with how to pass the arguments as REST API JSON. Consider below spark submit command. spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 \ --master yarn \ …

apache-spark amazon-emr livy apache-hudi

asked Jun 17 '21 at 14:26

shiva nagesh

87
7

0

votes

1 answer

Issue on creating external table hive with hudi

I am trying to create an external file in hive metastore, using apache hudi framework. Its able to connect with hive metastore but throws exception after the connection when trying to create table. dataFrame.writeStream …

apache-spark hadoop hive apache-hudi

asked Mar 19 '21 at 12:18

sreekesh.s

158
1
8

0

votes

1 answer

Spark Datasource Hudi table read using instant time

I'm reading Hudi table using Spark.read.format("hudi") want to understand how is this option works hoodie.datasource.read.begin.instanttime Will it similar to hudi's hoodie_commit_ts column available in parquets files? I'm not able to get same count…

apache-spark pyspark apache-hudi

asked Mar 09 '21 at 18:32

Ankit Singla

1
1

0

votes

1 answer

Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files?

The documentation says: Hudi works with Spark-2.x & Spark 3.x versions. (https://hudi.apache.org/docs/quick-start-guide.html) But I have not been able to use hudi-spark-bundle_2.11 version 0.7.0 with Spark 2.3.0 and Scala 2.11.12. Is there any…

apache-spark orc apache-hudi

asked Feb 22 '21 at 07:35

Joyan

41
1
7

0

votes

1 answer

Error consuming records caused by SdkInterruptedException when inserting into Hudi Table

I have this Hudi table that I created from a migration, so this has billions of rows. There were no problems when migrating, but as soon as I started a streaming to start writing fresh data to this table, these errors occurred: ERROR - error…

amazon-web-services amazon-emr apache-hudi

asked Dec 29 '20 at 20:54

Ygor de Fraga

51
9

0

votes

1 answer

Apache Hudi example from spark-shell throws error for Spark 2.3.0

I am trying to run this example (https://hudi.apache.org/docs/quick-start-guide.html) using spark-shell. The Apache Hudi documentation says "Hudi works with Spark-2.x versions" The environment details are: Platform: HDP 2.6.5.0-292 Spark version:…

apache-spark avro spark-avro spark-shell apache-hudi

asked Dec 27 '20 at 08:36

Joyan

41
1
7

0

votes

0 answers

Debezium + Schema Registry Avro Schema: why do I have the "before" and "after" fields, and how do I use that with HudiDeltaStreamer?

apache-kafka apache-kafka-connect confluent-schema-registry debezium apache-hudi

asked Dec 02 '20 at 12:30

oopcode

1,912
16
26

0

votes

3 answers

Databricks - java.lang.NoClassDefFoundError: org/json/JSONException

We can't figure out the following issue: we are trying to use Apache Hudi to save data to the storage. The problem is when we upload a fat jar which includes the org.json package in dependencies, the df.save() application is failing…

classpath databricks azure-databricks apache-hudi

asked Nov 03 '20 at 17:54

eugen-fried

2,111
3
27
48

0

votes

2 answers

Install Hudi ver. 0.6.0 on AWS EMR

Can anyone help me with properly installing Hudi 0.6.0 on AWS EMR ver 6.0.0 ? I think AWS has some custom scripts added to make Hudi work in EMR properly

amazon-web-services amazon-emr apache-hudi

asked Sep 08 '20 at 11:54

ASHISH M.G

522
2
7
23

0

votes

1 answer

Using Apache Hudi with Python/Pyspark

Has anyone used Apache Hudi in a Pyspark environment? If it is possible, are there any code samples available?

pyspark apache-hudi

asked Mar 30 '20 at 13:25

being_felicity

81
1
7

Questions tagged [apache-hudi]