Highest Voted 'apache-hudi' Questions

0

votes

0 answers

Is there anyway i can use StreamTableEnvrionment in ProcessWindowFunction?

Scenario Using Flink to read MySQL binlog and write to a Hudi table, but I want to partition the binlog data source into windows, and batch insert all the data within a window into the Hudi table when the window closes. My current approach is to use…

asked Jun 30 '23 at 02:45

gamblewin

1
1

0

votes

2 answers

Unable to alter column name for a Hudi table in AWS

I'm unable to alter the column name of Hudi table . spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO subidentifier") unbable to change the column name. A clear and concise description of the problem. I'm unable to alter the column…

amazon-web-services aws-glue apache-hudi

asked Jun 29 '23 at 18:23

gaurav mathur

35
5

0

votes

0 answers

Apache Hudi TimestampBasedKeyGenerator issue partitioning by year and Month

I am using Apache Hudi version 0.12.0 in AWS Glue Version 4.0. I am trying to get my table to have partitions by month and year, and I cannot get this to work. Here is the code in my Glue Job: base_s3_path =…

python pyspark aws-glue apache-hudi

asked Jun 18 '23 at 18:47

cjf280830

3
3

0

votes

0 answers

Issue with reading data from Hudi table incrementally in Spark-shell

I am encountering an error while attempting to read data from a Hudi table incrementally using Spark-shell. Below is the code I am using: import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.HoodieDataSourceHelpers import…

apache-spark apache-spark-sql hive bigdata apache-hudi

asked Jun 15 '23 at 17:49

Rohan Jain

63
6

0

votes

0 answers

Error while trying to stream data from Kafka and store it in apache hudi

I am trying to store Kafka data in apache hudi Spark version I am using is 3.3.1 Kafka clients 2.8.1 , hudi spark-sql-kafka 0-10-2.12 and I am getting error while writing that code org/apache/commons/pool2/impl/GenericKeyedObjectPool error. I am…

pyspark apache-kafka apache-spark-sql apache-hudi

asked Jun 14 '23 at 06:52

subash

3
1

0

votes

0 answers

Unable to read Hudi file in Spark Databricks Environment

I am facing this error while running Spark in Databricks. I am trying to read Hudi file format. I’m using Hudi 0.13.0 with Databricks (12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) Trying to load a hudi data set from S3 but failed with this…

apache-spark pyspark databricks aws-databricks apache-hudi

asked Jun 12 '23 at 07:54

dude

1
3

0

votes

2 answers

Change the location of a Hudi table in AWS?

Describe the problem you faced How can we change the location of a hudi table to new location. I've customer table that is saved at s3://aws-amazon-com/Customer/ which I want to change to s3://aws-amazon-com/CustomerUpdated/ . I'm working on Glue…

apache-spark aws-glue apache-hudi

asked Jun 09 '23 at 13:50

gaurav mathur

35
5

0

votes

1 answer

Performance and Data Integrity Issues with Hudi for Long-Term Data Retention

Our project requires that we perform full loads daily, retaining these versions for future queries. Upon implementing Hudi to maintain 6 years of data with the following setup: "hoodie.cleaner.policy":…

python amazon-web-services apache-spark amazon-emr apache-hudi

asked May 29 '23 at 14:11

Luiz

1,275
4
19
35

0

votes

1 answer

Querying Apache Hudi using PySpark on EMR by table name

While writing data to the Apache Hudi on EMR using PySpark, we can specify the configuration to save to a table name. See hudiOptions = { 'hoodie.table.name': 'tableName', 'hoodie.datasource.write.recordkey.field':…

apache-spark pyspark amazon-emr apache-hudi

asked May 18 '23 at 14:23

Anurag A S

725
10
23

0

votes

1 answer

Is there standard way to get the data lake format from parquet file? (e.g. Apache iceberg, Apache Hudi, Deltalake)

I am writing parquet clean job using PyArrow. However, I only want to process native parquet files and skip over any .parquet files in iceberg, hudi, or deltalake format. This is because these formats require updates to be done through the…

parquet delta-lake apache-hudi iceberg

asked May 14 '23 at 17:29

Sam.E

175
2
10

0

votes

1 answer

Non Nested AVRO Schema For Postgres Change-Log Events (Debezium <> Confluent Schema Registry)

Purely Avro Question First Is it possible to have an Avro schema compatible with the following message whose before and after fields must be of the same record type: { "before": null, "after": { "id": 1, "name": "Bob" }, "op":…

apache-kafka avro confluent-schema-registry debezium apache-hudi

asked May 12 '23 at 20:18

samser

83
6

0

votes

1 answer

Flink streaming Kinesis to Hudi not writing any data

I'm trying out PyFlink for streaming data from Kinesis into Hudi format, but can't figure out why it is not writing any data. I hope that maybe someone can provide any pointers. Versions: Flink 1.15.4, Python 3.7, Hudi 0.13.0 I use streaming table…

apache-flink pyflink apache-hudi

asked May 04 '23 at 12:42

Timo

5,188
6
35
38

0

votes

1 answer

HUDI compaction using Flink raises NullPointerException: Value must not be null

I followed the example on Hudi's website. Instead of using hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar, I use hudi-flink1.16-bundle-0.13.0.jar, acquired from here. Command: $FLINK_HOME/bin/flink run \ -c…

apache-flink apache-hudi

asked Apr 21 '23 at 08:36

Bing-hsu Gao

343
3
14

0

votes

0 answers

HUDI: how to apply the CDC delete and upsert events?

I am reading https://medium.com/slalom-build/data-lakehouse-building-the-next-generation-of-data-lakes-using-apache-hudi-41550f62f5f I cannot understand the following piece of codes. it seems that upserts CDC events applied before delete CDC…

cdc apache-hudi

asked Apr 21 '23 at 03:56

BAE

8,550
22
88
171

0

votes

1 answer

Unable to see hive partitions when running Hudi DeltaStreamer with `TimestampBasedKeyGenerator` (but able to see hudi partitions)

In Hudi, I’m using the TimestampBasedKeyGenerator partitions (e.g. using the hudi cli I'm able to see partition for 2023-03-05) in my s3 path (e.g. s3://my_bucket/my_table/2023-03-25/): hudi:my_table->show fsview…

hive apache-hudi

asked Apr 19 '23 at 22:36

Will

11,276
9
68
76

Questions tagged [apache-hudi]