Simple question, but I can't find a simple guide on how to set the environment variable in Databricks. Also, is it important to set the environment variable on both the driver and executors (and would you do this via spark.conf)? Thanks
Asked
Active
Viewed 3.9k times
2 Answers
26
Before creation:
You can set environment variable while creating the cluster.
Click on Advanced Options => Enter Environment Variables.
After creation:
Select your cluster => click on Edit => Advance Options => Edit or Enter new Environment Variables => Confirm and Restart.
OR
You can achieve the desired results by appending my environment variable declarations to the file /databricks/spark/conf/spark-env.sh. You may change the init file as follows:
%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
|#!/bin/bash
|
|cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
|[driver] {
| "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
|}
|EOF
""".stripMargin, true)
For more details, refer “Databricks – Spark Configuration”.
Hope this helps.

CHEEKATLAPRADEEP
- 12,191
- 1
- 19
- 42
-
16As an aside: these variables can be accessed via `os.getenv("myenvname")` . Commenting here because this info was amazingly difficult to find. – defraggled Nov 05 '21 at 01:27
-
Can we save file in .env file and load the variables in another file – Shubh Apr 12 '22 at 04:02
1
Use databricks cluster policy configuration. The configuration will auto-add the environment variables during policy selection.
spark_env_vars.MY_ENV_VAR: {
"value":"2.11.2",
"type": "fixed"
}

JayaChandra S Reddy
- 33
- 5