Questions tagged [apache-hudi]

Apache Hudi is a transactional data lake platform with a focus on batch and event processing (with ACID support). Use this tag for questions specific to problems with Apache-Hudi. Do not use this tag for common issues with topic data lake or delta lake.

Questions on using Apache Hudi

158 questions
0
votes
1 answer

presto with hudi - select * from table

I have a parquet record created with hudi off a spark kinesis stream and stored in S3. An AWS glue table is generated from this record. I update the InputRecord type to org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat as per…
Adam
  • 432
  • 5
  • 16
0
votes
1 answer

Apache Hudi throwing Dataset not found exception when storing to S3

I am trying to load a simple dataframe as Hudi dataset into S3 and I am having trouble in doing that. I am new to Apache Hudi and I am trying to load the data from by running the code locally on my Windows machine. All the Maven dependencies I am…
0
votes
1 answer

Spark structured streaming with Apache Hudi

I have a requirement where i need to write the stream using structured streaming to Hudi dataset. I found there is a provision to do this over Apache Hudi Jira issues but wanted to know if anyone successfully implemented this and have an example. I…
0
votes
1 answer

spark-submit Error: java.util.NoSuchElementException: spark.scheduler.mode

I am trying to setup Apache Hudi on an Ubuntu 16.04 server. I cloned the repo https://github.com/apache/incubator-hudi.git and then build it as mvn clean install -DskipTests -DskipITs The build completed successfully. I then proceeded with…
Nadeem Mehraj
  • 174
  • 1
  • 2
  • 15
-1
votes
1 answer

Bad Performance using Open Table Format

I have an existing case: where Entire/Full data is read daily from multiple hive tables, Which is processed/transformed (join, aggregation, filter, etc) as mentioned in SQL query. These SQL query are mentioned in series of YAML files , let's say…
-1
votes
1 answer

How should I choose Hudi table partition key?

In my batch processing data pipeline I have transactions with booking date and accounting date, the transactions in the same time window have the same booking date and within 2 mins time window, booking date is just several minutes earlier than…
user1532146
  • 184
  • 2
  • 14
-1
votes
1 answer

Apache Hudi schema evolution

Can anyone share the right approach for handling schema changes in apache hudi? Example: renaming a column from col1 to col2 or changing the data type from long to int. (Pyspark)
-2
votes
1 answer

why I can't insert datagen in flink?

Flink SQL> CREATE TABLE sourceT ( > uuid varchar(20), > name varchar(10), > age int, > ts timestamp(3), > `partition` varchar(20) > ) WITH ( > 'connector' = 'datagen', > 'rows-per-second' = '1' > ); [INFO] Execute statement…
1 2 3
10
11