0

I have a requirement to update or delete a record the hudi table, one way is to do that with pyspark/scala by following the steps mentioned in the below guide

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html Also is it possible to do that with aws-cli?

Which one could be better to do this? by calling it through lamda or Glue .

GOPI M
  • 27
  • 7

1 Answers1

1

You can use aws-cli to submit spark jobs with EMR steps or notebooks to do adhoc analysis. Submitting spark jobs to EMR is preferred approach.

gbharat
  • 276
  • 1
  • 4
  • Thanks for the answer. I used a similar approach. Developed a Glue job with pyspark code and then used the aws-cli commands to trigger the job from git. – GOPI M Apr 21 '22 at 15:07