Highest Voted 'apache-hudi' Questions

0

votes

0 answers

How to pass AWS keys to access S3 with Apache Hudi on LocalStack?

I am using the Docker image of localstack/localstack:2.0.2 and attempting to write on S3 in it using PySpark(3.1.1) /Apache-Hudi (0.13.0) with the following options: { 'hoodie.table.name': 'foo', ... …

asked Apr 18 '23 at 15:37

Randomize

8,651
18
78
133

0

votes

2 answers

Apache Spark: Exception in thread "main" java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3Adapter

I have run the following code via intellij and runs successfully. The code is shown below. import org.apache.spark.sql.SparkSession object HudiV1 { // Scala code case class Employee(emp_id: Int, employee_name: String, department: String,…

scala apache-spark apache-hudi

asked Apr 14 '23 at 18:28

pacman

725
1
9
28

0

votes

1 answer

Hudi DeltaStreamer with AWS Glue Data Catalog syncs the database, but not the tables

This is similar to being unable to sync AWS Glue Data Catalog where you run a spark-submit with Hudi DeltaStreamer, except you only sync the database (and not the tables). E.g. you submit: spark-submit \ --conf…

amazon-emr apache-hudi

asked Apr 11 '23 at 19:43

Will

11,276
9
68
76

0

votes

1 answer

Running Hudi DeltaStreameron EMR succeeds, but does not sync to AWS Glue Data Catalog

When I run Hudi DeltaStreamer on EMR, I see the hudi files get created in S3 (e.g. I see a .hoodie/ dir and the expected parquet files in S3. The command looks something like: spark-submit \ --conf…

amazon-emr apache-hudi

asked Apr 07 '23 at 16:11

Will

11,276
9
68
76

0

votes

1 answer

Hudi DeltaStreamer error on `User class threw exception: java.lang.NullPointerException` at `...SchemaRegistryProvider.fetchSchemaFromRegistry`

If you run into this error: 23/04/03 16:19:49 INFO Client: client token: N/A diagnostics: User class threw exception: java.lang.NullPointerException at…

apache-hudi

asked Apr 03 '23 at 23:22

Will

11,276
9
68
76

0

votes

1 answer

Writing data from Multi-Cluster into Hudi tables in S3

For Multi-cluster writes in S3, Delta lake uses Dynamo Db to atomically check if the file is present before writing it because S3 not supporting the “put-if-absent” consistency guarantee. Therefore, in order to leverage this feature using Delta lake…

apache-spark amazon-s3 spark-structured-streaming apache-hudi

asked Mar 22 '23 at 20:09

Raj

11
2

0

votes

0 answers

getting java.lang.NullPointerException while reading hudi table data using spark sql while calling in spark action

I am getting weird pyspark java null pointer exception while calling any action through pyspark. below is the snippet for this: py4j.protocol.Py4JJavaError: An error occurred while calling o432.showString. : org.apache.spark.SparkException: Job…

python pyspark apache-spark-sql nullpointerexception apache-hudi

asked Mar 17 '23 at 10:57

Poonam Dixit

1

0

votes

1 answer

How to update record in OverwriteWithLatestAvroPayload.preCombine

I have a requirement where I need to combine fields from old and new records in OverwriteWithLatestAvroPayload.preCombine The default behaviour either selects old-record or this based on ordering Value. But in my case I need to combine fields from…

apache-flink apache-hudi

asked Mar 06 '23 at 11:03

Yabha

11
2

0

votes

0 answers

Flink crashes when I try to create a new table

Hello I'm working with flink-MSK-Hudi architecture and I want to ingest data in my AWS Glue catalog. Currently If I try to ingest the data in a S3 bucket in hudi format in worked good, the problem is when I set the hive properties in the hudi…

apache-flink amazon-emr flink-streaming flink-sql apache-hudi

asked Feb 24 '23 at 19:10

Valle1208

43
4

0

votes

0 answers

GENERIC_INTERNAL_ERROR: Field new_test2 not found in log schema. Query cannot proceed! Derived Schema Fields:

there's a hudi table that is written as parquet file in s3, I ma trying to query it using athena, firstly it worked fine, then when I try to add a column and try to query it again I get this error: GENERIC_INTERNAL_ERROR: Field new_test2 not found…

parquet amazon-athena apache-hudi

asked Feb 19 '23 at 14:28

Mee

1,413
5
24
40

0

votes

0 answers

AWS EMR managed auto scaling is automatically scaling down to 0 task nodes and again adding task nodes to and fro while the spark job is running

Environment: AWS EMR cluster with managed autosclaing turned on and running hudi job Issue: I enabled auto scaling with minimum 2 nodes and maximum 8 task nodes capacity and maximum 2 core nodes, with 2 on demand capacity. I ran a spark job, it…

amazon-web-services apache-spark amazon-emr aws-auto-scaling apache-hudi

asked Feb 17 '23 at 10:44

Roobal Jindal

214
2
13

0

votes

0 answers

How to use temporal table join in batch mode Flink SQL?

In order to revise data T+1, for the reason of data delay, i want to execute temporal table join using flink sql in batch mode. And flink official document show join support running in batch mode, but i got error when execute sql:…

flink-sql apache-hudi

asked Feb 16 '23 at 13:32

Felix Feng

281
3
7

0

votes

2 answers

Can partitioning data in Apache Hudi optimize AWS Spectrum query?

I'm using AWS Redshift Spectrum to query a Hudi table. As we know, filtering data by partition column when querying data in Spectrum could reduce the size of the data scanned by Spectrum and speed up the query. My question is, if I use Spectrum to…

amazon-redshift amazon-redshift-spectrum apache-hudi

asked Feb 15 '23 at 02:33

Rinze

706
1
5
21

0

votes

1 answer

pyspark.sql.utils.AnalysisException: 'writeStream' can be called only on streaming Dataset/DataFram

I am having glue streaming job, and I need to write the data as stream but after applying some processing, so I did the following: data_frame_DataSource0 = glueContext.create_data_frame.from_catalog( database=database_name, …

pyspark amazon-kinesis apache-hudi glue-streaming

asked Feb 12 '23 at 14:15

Mee

1,413
5
24
40

0

votes

1 answer

Flink SQL-Cli: Hudi is abstract

i'm trying to recreate the flink common example working with hudi (https://hudi.apache.org/docs/flink-quick-start-guide), but when I try to insert the example data an error appears, can someone help me with this? The steps that I'm following in my…

apache-flink sqlclient apache-hudi

asked Feb 09 '23 at 22:53

Valle1208

43
4

Questions tagged [apache-hudi]