Please bear with me because I am still quite new to Spark.
I have a GCP DataProc cluster which I am using to run a large number of Spark jobs, 5 at a time.
Cluster is 1 + 16, 8 cores / 40gb mem / 1TB storage per node.
Now I might be misunderstanding something or not doing something correctly, but I currently have 5 jobs running at once, and the Spark UI is showing that only 34/128 vcores are in use, and they do not appear to be evenly distributed (The jobs were executed simultaneously, but the distribution is 2/7/7/11/7. There is only one core allocated per running container.
I have used the flags --executor-cores 4
and --num-executors 6
which doesn't seem to have made any difference.
Can anyone offer some insight/resources as to how I can fine tune these jobs to use all available resources?