I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc.
I have come across some potential solutions incorporating my workflow scheduling needs:
- Cloud Composer
- Dataproc Workflow Template with Cloud Scheduling
- Install Oozie on Dataproc auto-scaling cluster
Please let me know which option would be most efficient in terms of performance, costing and migration complexities.