0

we generate an environment file programtically, here is how the resultant file looks like:

    FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

    RUN rm /bin/sh && ln -s /bin/bash /bin/sh
    RUN echo "source /opt/miniconda/etc/profile.d/conda.sh &&         conda activate" >> ~/.bashrc

    RUN echo $'channels:\n\
  - anaconda\n\
  - conda-forge\n\
  - defaults\n\
dependencies:\n\
  - python=3.8.10\n\
  - pip:\n\
      - azureml-sdk==1.50.0\n\
      - azureml-dataset-runtime==1.50.0\n\
      - azure-storage-blob\n\
      - numpy==1.23.5\n\
      - pandas==2.0.0\n\
      - scipy==1.5.2\n\
      - scikit-learn==1.2.2\n\
      - azure-eventgrid==4.9.0\n\
  - conda:\n\
      - conda=23.3.0' > conda_env.yml
    RUN source /opt/miniconda/etc/profile.d/conda.sh &&         conda activate &&         conda install conda &&         pip install cmake &&         conda env update -f conda_env.yml
    
ENV cluster_identity_name=clisyer-ide-name
ENV cluster_identity_id=1234567
ENV data_drift_event_topic_name=someName
ENV sa_name=someStorage

And the image builds successfully, the env vars are okay as I see in logs: enter image description here

But, when I try to access this environment programmatically:

if environment_name in environments:
    restored_environment = environments[environment_name]
    logging.info('Found environment: %s:%s', restored_environment.name, restored_environment.version)

I see the outout here which is correct name and correct version. But printing the environment variables returns this:

enter image description here

Only example env var is there and not the ones we set in the dokcer file.

However, I see the environment definition after fetching the environment and I can see the json containing ENV definitions: enter image description here

Am I doing something wrong when accessing the environment variables? Can someone plz help?

Obiii
  • 698
  • 1
  • 6
  • 26

2 Answers2

0

I have reproduced the issue with given Dockerfile.

It's seems like there is an issue when we try to declare environment variable in Dockerfile.

Alternate solution to this is to declare the environment variables in runtime as recommended.

To do this you can use runconfig. Below is a code snippet to do this:

from  azureml.core.runconfig  import  RunConfiguration
run_config = RunConfiguration()
run_config.environment = env
run_config.environment_variables = {
"cluster_identity_name": "clisyer-ide-name",
"cluster_identity_id": "1234567",
"data_drift_event_topic_name": "someName",
"sa_name": "someStorage",
}

Then you can use the config in execution of any job: enter image description here

With this I was able include the environment variables and accessible to the executed script. Please refer to this documentation for more details on RunConfiguration Class.

RishabhM
  • 525
  • 1
  • 5
  • Hi, thanks for the answer but we don't want to use the run_config because the OS vars are acceisble when running the script with eun_config but if you try to retrieve the environment again, and access the OS vars, there are none. – Obiii Jul 20 '23 at 11:14
0

We ended up using custom docker images with ENV commands, saving the images to azure ACR, and then creating the azure environment using the ACR repo and registering that environment into the workspace.

This way the ENV vars are backed into the image and are accessible whenever retrieved from ACR.

def get_environemnt(**args):
    new_env = Environment.from_dockerfile(
                environment_name,
                dockerfile
            )
    restored_environment = new_env
    restored_environment.register(workspace
    return restored_environment

environment_active_monitoring = get_environment(
        workspace=ws,
        environment_name=e.aml_env_name_active_monitoring, # type: ignore
        conda_dependencies_file=e.aml_env_active_monitoring_conda_dep_file, # type: ignore
        env_vars=env_vars,
        tag=e.docker_tag,
        create_new=e.rebuild_env_active_monitoring, # type: ignore
        gpu_accelerated=False)
Obiii
  • 698
  • 1
  • 6
  • 26