How to manage multiple environments in pyspark clusters?

Asked Nov 29 '21 at 04:27

Active Nov 29 '21 at 04:27

Viewed 154 times

I want to:

Have multiple python environments in my pyspark dataproc cluster
Specify while submitting the job which environment I want to execute my submitted job in
I want to persist the environments so that I can use them on an as-needed basis. I won't tear down the cluster but I would occasionally stop it. I want the environments to persist the way they do on a normal VM

Currently, I know how to submit the job with the entire environment with a conda pack but, the problem with that is it would ship the entire environment payload each time I want to submit the job and does not address the issue of handling multiple environments for projects

asked Nov 29 '21 at 04:27

figs_and_nuts

4,870
2
31
56

How to manage multiple environments in pyspark clusters?

0 Answers0