I want to:
- Have multiple python environments in my pyspark dataproc cluster
- Specify while submitting the job which environment I want to execute my submitted job in
- I want to persist the environments so that I can use them on an as-needed basis. I won't tear down the cluster but I would occasionally stop it. I want the environments to persist the way they do on a normal VM
Currently, I know how to submit the job with the entire environment with a conda pack
but, the problem with that is it would ship the entire environment payload each time I want to submit the job and does not address the issue of handling multiple environments for projects