Hello fellow developers,
I have recently started learning about GCP and I am working on a POC that requires me to create a pipeline that is able to schedule Dataproc jobs written in PySpark. Currently, I have created a Jupiter notebook on my Dataproc cluster and that reads data from GCS and writes it to BigQuery, it's working fine on Jupyter but I want to use that notebook inside a pipeline.
Just like on Azure we can schedule pipeline runs using Azure data factory, Please help me out which GCP tool would be helpful to achieve similar results.
My goal is to schedule the run of multiple Dataproc jobs.