0

Can anyone help me with properly installing Hudi 0.6.0 on AWS EMR ver 6.0.0 ? I think AWS has some custom scripts added to make Hudi work in EMR properly

ASHISH M.G
  • 522
  • 2
  • 7
  • 23
  • If there is no restriction on EMR version, you can use latest AWS EMR 6.3.0 which has Hudi 0.7.0 version avialable - which has some major performance improvements implemented. So that will help you. When you are creating an emr cluster - nothing else is needed to do apart from selecting Spark component. And when you are running a step for pyspark + hudi app: give Spark-submit options as: --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false --jars /usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar – Felix K Jose Jun 16 '21 at 19:17

2 Answers2

0

The version of Hudi installed with Amazon EMR 5.30.1 is 0.5.2-incubating, it means that if you want to use version 0.6.0 you have to install it on your own and there's no guarantee that it's going to work with the AWS ecosystem (glue metastore, s3, redsfhit etc..). Knowing AWS I'd wait until the official version is released, usually these previews are too buggy and putting that into production is a risk.

0

The version of Hudi installed with Amazon EMR 5.31.0 is 0.6.0. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi.html

Artur Shamsutdinov
  • 3,127
  • 3
  • 21
  • 39