How to set environment variable in databricks?

Question

Simple question, but I can't find a simple guide on how to set the environment variable in Databricks. Also, is it important to set the environment variable on both the driver and executors (and would you do this via spark.conf)? Thanks

score 26 · Answer 1 · answered Jul 03 '19 at 05:38

Before creation:

You can set environment variable while creating the cluster.

Click on Advanced Options => Enter Environment Variables.

After creation:

Select your cluster => click on Edit => Advance Options => Edit or Enter new Environment Variables => Confirm and Restart.

OR

You can achieve the desired results by appending my environment variable declarations to the file /databricks/spark/conf/spark-env.sh. You may change the init file as follows:

%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
  |#!/bin/bash
  |
  |cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
  |[driver] {
  |  "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
  |}
  |EOF
  """.stripMargin, true)

For more details, refer “Databricks – Spark Configuration”.

Hope this helps.

As an aside: these variables can be accessed via `os.getenv("myenvname")` . Commenting here because this info was amazingly difficult to find. — defraggled, Nov 05 '21 at 01:27
Can we save file in .env file and load the variables in another file — Shubh, Apr 12 '22 at 04:02

score 1 · Answer 2 · answered Nov 06 '22 at 06:55

1

Use databricks cluster policy configuration. The configuration will auto-add the environment variables during policy selection.

spark_env_vars.MY_ENV_VAR: {
  "value":"2.11.2",
  "type": "fixed"
}

answered Nov 06 '22 at 06:55

JayaChandra S Reddy

33
5

How to set environment variable in databricks?

2 Answers2