Highest Voted 'apache-hudi' Questions

0

votes

0 answers

the compaction of the MOR hudi table keeps the old values

I am having hudi table, and I write it as MOR here's the config: conf = { 'className': 'org.apache.hudi', 'hoodie.table.name': hudi_table_name, 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.table.type':…

aws-glue apache-hudi

asked Feb 08 '23 at 12:28

Mee

1,413
5
24
40

0

votes

1 answer

Why not write data as hudi or iceburg format in flink-table-store?

Recently I had a chance to get to know the flink-table-store project. I was attracted by the idea behind it at the first glance. After reading the docs, I've got a question in my head for a while. It's about the design of the file storage. It looks…

apache-flink apache-hudi

asked Feb 06 '23 at 01:17

Kyle Liao

1

0

votes

1 answer

how to get the latest version of hudi table

I have a spark streaming job in which listens to kinesis stream, then it writes it to hudi table, what I want to do is say for example I added these two records to hudi table: | user_id | name | timestamp | -------- | --------------…

apache-spark streaming apache-hudi

asked Feb 05 '23 at 12:03

Mee

1,413
5
24
40

0

votes

1 answer

Hudi COW table - Bulks_Insert produces more number of files while clustering is enabled compare to Insert mode

I am trying to use clustering configurations in Hudi COW table to keep only a single file in the partition folders if the total partition data size is less than 128 MB. But it seems that clustering is not working with bulk_insert as expected. We…

apache-spark amazon-emr apache-hudi

asked Jan 30 '23 at 14:26

Vinay Sinha

193
2
13

0

votes

1 answer

Creating an Athena view on a HUDI table returns soft deleted records when the view is read using SPARK

I have multiple HUDI tables with differing column names and I built a view on top of it to standardize the column names. When this view is read from Athena, it returns a correct response. But, when the same view is read using SPARK using…

apache-spark pyspark amazon-athena apache-hudi

asked Jan 20 '23 at 14:39

sashmi

97
1
2
14

0

votes

2 answers

Deleting records from Apache Hudi Table which is part of Glue Tables created using AWS Glue Job and Kinesis

I currently have a DynamoDB stream configured which is inputing streams into Kinesis Data streams whenever insertion/updation happens and subsequently I have Glue tables which is taking input from above kinesis stream and then displaying the…

amazon-web-services apache-spark amazon-dynamodb aws-glue apache-hudi

asked Jan 17 '23 at 10:59

Mahesh M

69
1
10

0

votes

1 answer

Hudi with Spark perform very slow when trying to write data into filesystem

I'm trying Apache Hudi with Spark by a very simple demo: with SparkSession.builder.appName(f"Hudi Test").getOrCreate() as spark: df = spark.read.option('mergeSchema', 'true').parquet('s3://an/existing/directory/') hudi_options = { …

apache-spark pyspark apache-hudi

asked Dec 20 '22 at 01:52

Rinze

706
1
5
21

0

votes

1 answer

How to encrypt apache hudi external tables data present in s3 synced into hive tables through spark jobs

Technical background: I am getting tables data from kafka and putting it into hudi and hive tables using spark. I am using AWS EMR. I want to encrypt data in transit within the cluster as well as synced external tables data present in s3 (Data at…

apache-spark encryption hive amazon-emr apache-hudi

asked Dec 19 '22 at 08:44

Roobal Jindal

214
2
13

0

votes

0 answers

Pyspark Hudie writing timestamps as binary

I am trying to write a pyspark DF to s3 hudie parquet format. Evcerything is working fine, however, the timestamps are writing as binary format. I would like to write this as hive tiestamp format so that i can query data in Athena. Pyspark config as…

pyspark hive parquet apache-hudi

asked Nov 29 '22 at 10:42

Sql_Peter

3
3

0

votes

0 answers

delete Apache hudi duplicate record key

I got some trouble in hudi when i delete rows with the same record key by spark-sql. e.g I created a table and set the recordKey=empno CREATE TABLE emp_duplicate_pk ( empno int, ename string, job string, mgr int, hiredate…

apache-spark apache-spark-sql apache-hudi

asked Nov 18 '22 at 08:55

mcspter

1
2

0

votes

1 answer

How to insert struct, map type in Apache Hudi

I see the official document, there are no samples about inserting complex types like struct and map. So, what's the grammar? My table definition: spark-sql> desc struct_map; _hoodie_commit_time string NULL _hoodie_commit_seqno string …

apache-spark apache-spark-sql apache-hudi

asked Nov 18 '22 at 08:42

Smith Cruise

404
1
4
19

0

votes

1 answer

I encountered an error when use flink to insert data into a Apachi hudi table

Environment: Flink: 1.15.2 Hudi flink: hudi-flink1.15-bundle-0.12.0.jar When I execute the statements： Flink SQL> CREATE TABLE t1( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition`…

apache-flink apache-hudi

asked Nov 18 '22 at 07:56

he wang

11
2

0

votes

0 answers

issue on inserting hudi with S3

I am using Hudi to insert data into S3. Hudi table can be created and data are also inserted into the table with no issue. But When I select the table, no result return. And when I check the S3, no relative files are generated either. Where can I…

bigdata apache-hudi

asked Nov 07 '22 at 01:42

Sharon Ni

1

0

votes

0 answers

Custom HoodieRecordPayload for use in flink sql

I am trying to use Apache Hudi with Flink sql by following Hudi's flink guide The basics are working, but now I need to provide custom implementation of HoodieRecordPayload as suggested on this FAQ. But when I am passing this config as shown in…

apache-flink flink-sql apache-hudi

asked Oct 31 '22 at 11:01

Yabha Isomap

1
1

0

votes

1 answer

Flink write to hudi with different schemas extracted from kafka datastream

So I have a Kafka topic which contains avro record with different schemas. I want to consume from that Kafka topic in flink and create a datastream of avro generic record.(this part is done) Now I want to write that data in hudi using schema…

apache-flink flink-streaming apache-hudi

asked Oct 20 '22 at 17:37

terminal

72
3

Questions tagged [apache-hudi]