I have a .yaml file which has 5 independent pyspark jobs means all 5 should run concurrently in the GCP dataproc and have scheduled this .yaml file in crontab for every 30 mins.
I have enough memory in cluster as wel to run all these jobs in parallel. But sometimes all of a sudden, these jobs will start executing one by one even though they are scheduled to run parallel. If I restart the cluster, then jobs will be fine like they will execute in parallel as expected.
Could anyone assist me if I am missing something, or do I need to add anything on the configuration side.
this issue appears often and will be resolved only if I restart the cluster which is not a recommended method since many services are running inside the cluster. I would like to know the root cause for this problem as wel as the solution.