How to set environment variables in pipelines in Azure ML SDK v2 with jobs.create_or_update()?

Question

I am changing some of our code from Azure ML's SDK v1 to v2. However, when I invoke pipelines with components via ml_client.jobs.create_or_update, I just can't get them to use my environment variables. Here is what I am doing:

preprocessing_component = load_component(
    source=Path(__file__).parent / "preprocessing_component.yaml"
)
@pipeline()
def example_train_pipeline(input_data_path):
    preprocess_step = preprocessing_component(
        input_data_path=input_data_path

pipeline_job = example_train_pipeline(
    input_data_path=Input(
        type=AssetTypes.URI_FILE,
        path="xxx",
    )
)

pipeline_job.settings.default_compute = e.cluster_name
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name=experiment_name
)

I tried to set .env_variables when creating my AZ ML environment (which is loaded for this pipeline's component in the yaml). This stated this parameters was deprecated and I should use RunConfig.environment_variables instead. Thing is, I can't find docs on how to use a RunConfig with ml_client.jobs.create_or_update. I tried just passing a RunConfig with variables set via run_config.environment_variables to create_or_update, but this had no apparrent effect.

score 1 · Accepted Answer · answered Aug 12 '23 at 08:34

Azure ML SDK v2 introduced several changes, In SDK v2, the way to set environment variables has changed to:

Define the Environment Variables:

env_vars = {
    "MY_VARIABLE": "value",
    "ANOTHER_VARIABLE": "another_value"
}

Set Environment Variables in the Environment: Instead of setting the environment variables in the RunConfig, you set them directly in the Environment. If you have an existing environment, you can set its environment variables like this:
```
from azure.ml.core import Environment

my_env = Environment.get(workspace=ws, name="my_environment_name", version="my_environment_version")
my_env.environment_variables = env_vars
```
If you're creating a new environment, you can set the environment variables when you create it:
```
my_env = Environment(name="my_new_environment")
my_env.environment_variables = env_vars
```
Use the Environment in the Pipeline: Make sure that the environment you've set the environment variables for is the one being used by your pipeline's components. This is typically set in the component YAML file, but you can also set it programmatically if needed.
Submit the Job: When you submit the job using ml_client.jobs.create_or_update, the environment variables you've set should be available to your pipeline's components.
```
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name=experiment_name
)
```

This works. Still curious that a) I get `Property environment_variables is deprecated. Use RunConfiguration.environment_variables to set runtime variables.` when setting the variables in the environment and b) they don't show up in the UI under run configuration. — DoubleSteakHouse, Aug 14 '23 at 08:05

How to set environment variables in pipelines in Azure ML SDK v2 with jobs.create_or_update()?

1 Answers1