How to pass env variables in dataproc submit command?

Question

I want to be able to set the following env variables while submitting a job via dataproc submit:

How can I achieve that?

Can you clarify what your goal is when setting these variables? In general Dataproc will configure the environment for jobs so that e.g. SPARK_HOME is set correctly. Are you trying to override the default locations? — Jerry Ding, Jan 06 '22 at 20:18
Thank you @JerryDing for your time :) Dataproc is not available with pyspark 3.2. Pyspark 3.2.0 released pandas API for pyspark and I have to write our pipelines for it. So, I am creating the cluster with an env yaml that gets pyspark installed as a package in it. Then I am overriding the above-mentioned env variables to use this pyspark 3.2.0. Please suggest improvements/suggestions. — figs_and_nuts, Jan 07 '22 at 03:03

score 1 · Answer 1 · answered Jan 06 '22 at 23:14

1

Check the doc Setting environment variables on Dataproc cluster nodes on how to set env variables for different components in Dataproc.

answered Jan 06 '22 at 23:14

Dagang

1 Answers1