Questions tagged [apache-hudi]

Apache Hudi is a transactional data lake platform with a focus on batch and event processing (with ACID support). Use this tag for questions specific to problems with Apache-Hudi. Do not use this tag for common issues with topic data lake or delta lake.

Questions on using Apache Hudi

158 questions

votes

1 answer

presto with hudi - select * from table

I have a parquet record created with hudi off a spark kinesis stream and stored in S3. An AWS glue table is generated from this record. I update the InputRecord type to org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat as per…

asked Feb 11 '20 at 18:56

Adam

votes

1 answer

Apache Hudi throwing Dataset not found exception when storing to S3

I am trying to load a simple dataframe as Hudi dataset into S3 and I am having trouble in doing that. I am new to Apache Hudi and I am trying to load the data from by running the code locally on my Windows machine. All the Maven dependencies I am…

apache-spark-sql apache-hudi

asked Sep 16 '19 at 06:16

Hariprasad A

votes

1 answer

Spark structured streaming with Apache Hudi

I have a requirement where i need to write the stream using structured streaming to Hudi dataset. I found there is a provision to do this over Apache Hudi Jira issues but wanted to know if anyone successfully implemented this and have an example. I…

apache-spark streaming spark-structured-streaming apache-hudi

asked Aug 14 '19 at 12:28

Hariprasad A

votes

1 answer

spark-submit Error: java.util.NoSuchElementException: spark.scheduler.mode

I am trying to setup Apache Hudi on an Ubuntu 16.04 server. I cloned the repo https://github.com/apache/incubator-hudi.git and then build it as mvn clean install -DskipTests -DskipITs The build completed successfully. I then proceeded with…

apache-spark ubuntu-16.04 apache-hudi

asked Jun 20 '19 at 05:54

Nadeem Mehraj

-1

votes

1 answer

Bad Performance using Open Table Format

I have an existing case: where Entire/Full data is read daily from multiple hive tables, Which is processed/transformed (join, aggregation, filter, etc) as mentioned in SQL query. These SQL query are mentioned in series of YAML files , let's say…

sql apache-spark spark-structured-streaming delta-lake apache-hudi

asked Aug 13 '23 at 21:49

Rituparno Behera

-1

votes

1 answer

How should I choose Hudi table partition key?

In my batch processing data pipeline I have transactions with booking date and accounting date, the transactions in the same time window have the same booking date and within 2 mins time window, booking date is just several minutes earlier than…

apache-hudi

asked Jul 16 '23 at 06:19

user1532146

-1

votes

1 answer

Apache Hudi schema evolution

Can anyone share the right approach for handling schema changes in apache hudi? Example: renaming a column from col1 to col2 or changing the data type from long to int. (Pyspark)

schema apache-hudi

asked Oct 05 '20 at 08:20

Yadhidya Vardhan

-2

votes

1 answer

why I can't insert datagen in flink?

Flink SQL> CREATE TABLE sourceT ( > uuid varchar(20), > name varchar(10), > age int, > ts timestamp(3), > `partition` varchar(20) > ) WITH ( > 'connector' = 'datagen', > 'rows-per-second' = '1' > ); [INFO] Execute statement…

sql apache-flink hadoop-yarn apache-hudi

asked Nov 29 '22 at 12:40

Jiangchao Yang

Prev 1 2 3

…