0
I am trying to adopt HUDI in our project.
I am looking for 3 levels of data.

Raw (S3) --> Cleaned (HUDI, append only) ---> Standard (HUDI, upserts)

The idea is to keep a Cleaned bucket for clean data with Append only mode.
This can be used by data scientists. Making this as HUDI will help in following GDPR regulations.

I am confused whether it is a good idea to use HUDI as an append only bucket?
Is there an issue in doing that?

Please give me some advice.
Amit Joshi
  • 172
  • 1
  • 14

1 Answers1

0

Hudi has bulk insert operation, to support append only use cases. See: https://hudi.apache.org/docs/write_operations#bulk_insert

You can always do updates/deletes on data ingested using bulk insert to handle GDPR for "cleaned" level. So imo it makses sense, since you get transactions, ability to have multiple writers to do deletes without the need to stop ingestion. Hudi also will take care of keeping your file sizes closer to optimal.