Highest Voted 'apache-hudi' Questions

0

votes

2 answers

Hudi supports `update` operation?

I have an exception when update record with spark sql for hudi as following. update hudi.cow1 set price=1300 where id=2; 22/10/17 19:24:44 ERROR Executor: Exception in task 0.0 in stage 206.0 (TID 2442) org.apache.avro.AvroRuntimeException: Not a…

apache-hudi

asked Oct 17 '22 at 12:17

Angle Tom

1,060
1
11
29

0

votes

1 answer

Can I use incremental, time travel, and snapshot queries with hudi only using spark-sql?

I'm trying to do incremental, snapshot, and time travel queries using spark-sql with hudi, but the only way that I can find to do this is creating a DataFrame with spark.read and then creating a temp view. Is there any way to accomplish this with…

apache-spark-sql apache-hudi

asked Sep 23 '22 at 17:20

bigdatabeginner

3
2

0

votes

3 answers

How to add Hudi Package to local AWS Glue Interactive Notebook

I have setup Glue Interactive sessions locally by following https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html However, I am not able to add any additional packages like HUDI to the interactive session There are a few magic…

jupyter-notebook aws-glue apache-hudi

asked Sep 09 '22 at 13:02

NarenS

1

0

votes

1 answer

org.apache.flink.table.api.TableException: Unsupported query: Merge Into

I am working on a Flink streaming job where I need to upsert data in the Hudi table. I am using merge into a query to upsert data in the Hudi table. Table table = tableEnv.fromDataStream(KafkaStreamTableDataStreamStream); …

flink-streaming flink-sql apache-hudi

asked Aug 31 '22 at 12:29

lucy

4,136
5
30
47

0

votes

2 answers

Can I use mysql database as destination storage for apache hudi

I am new to Apache Hudi,Please let me know if there any configuration is provided in apache hudi for writing data on mysql database.

mysql apache-hudi

asked Aug 02 '22 at 08:13

ash

1
4

0

votes

2 answers

Hudi data overrides every time on new batch of spark structure streaming

I am working on spark structure streaming where job consuming Kafka message, do aggregation and save data in apache hudi table every 10 seconds. The below code is working fine but it overwrites the resultant apache hudi table data on every batch. I…

pyspark apache-kafka spark-structured-streaming apache-hudi

asked Jul 21 '22 at 05:23

lucy

4,136
5
30
47

0

votes

1 answer

Hoodie (Hudi) precombine field failing on NULL

My AWS Glue job for Hudi CDC is failing on a column that is a precombine field (see error message below). I have validated that there are no NULL values on this column (it has an AFTER UPDATE Trigger and a default of NOW() set). When I query the…

apache-spark aws-glue cdc apache-hudi hoodie

asked Jun 06 '22 at 19:31

J Weezy

3,507
3
32
88

0

votes

0 answers

Hudi Failed to delete for commit time for certain records

I have a COW Table and able to insert and update the records using Glue ETL with out any issues. How ever when i try to delete the records for some records i am getting the following error: An error occurred while calling…

apache-hudi

asked Apr 08 '22 at 17:09

Sateesh K

1,071
3
19
45

0

votes

1 answer

how to update/delete a record in hudi table in AWS?

I have a requirement to update or delete a record the hudi table, one way is to do that with pyspark/scala by following the steps mentioned in the below guide https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html Also…

amazon-web-services pyspark aws-glue apache-hudi

asked Apr 06 '22 at 14:04

GOPI M

27
7

0

votes

1 answer

Apache Hudi Serialize issue

Could some one please help to rectify this error It is showing the below error when I am trying to update the data py4j.protocol.Py4JJavaError: An error occurred while calling o84.save. : org.apache.hudi.exception.HoodieException: hoodie only…

apache-spark google-cloud-platform serialization pyspark apache-hudi

asked Mar 21 '22 at 13:16

Gopinath Thatha

1
1

0

votes

1 answer

How to change Hudi table version via Hudi CLI

How do I change the table version via the Hudi CLI? Steps: ssh into EMR kick off the hudi cli /usr/lib/hudi/cli/bin/hudi-cli.sh. Version of the Hudi CLI is 1. connect to my table connect --path s3://bucket/db/table In the desc of the table I see…

apache-hudi

asked Mar 02 '22 at 17:33

Andreina

63
7

0

votes

1 answer

AWS Glue- How to output only 1 latest file in s3 bucket

I use AWS Glue and Apache Hudi to replicate data in RDS to S3. If I execute the following job, 2 parquet files (initial one, and updated one) will be generated in the S3 bucket (basePath). In this case, I want only 1 latest file, and would like to…

amazon-web-services amazon-s3 aws-glue apache-hudi

asked Nov 30 '21 at 11:06

satohh

45
4

0

votes

1 answer

[HUDI]Creating Append only Raw data in HUDI

I am trying to adopt HUDI in our project. I am looking for 3 levels of data. Raw (S3) --> Cleaned (HUDI, append only) ---> Standard (HUDI, upserts) The idea is to keep a Cleaned bucket for clean data with Append only mode. This can be used by…

apache-hudi

asked Nov 16 '21 at 13:02

Amit Joshi

172
1
14

0

votes

1 answer

hudi delta streamer job via apache livy

Please help how to pass --props file and --source-class file to LIVY API POST . spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 \ --master yarn \ --deploy-mode cluster \ --conf…

apache-spark spark-submit apache-hudi

asked Oct 07 '21 at 11:42

codek

65
1
6

0

votes

0 answers

Issue with Apache Hudi Update and Delete Operation on Parquet S3 File

Here I am trying to simulate updates and deletes over a Hudi dataset and wish to see the state reflected in Athena table. We use EMR, S3 and Athena services of AWS. Attempting Record Update with a withdrawal object withdrawalID_mutate =…

apache-spark spark-streaming amazon-emr apache-hudi iceberg

asked Aug 07 '21 at 13:11

jishmisc28

9
5

Questions tagged [apache-hudi]