How to install custom Python package and run with Azure Machine Learning job?

Question

I am trying to train my model with an Azure Machine Learning job.

However, I run the job as an CLI app (with Click), where I import some functions from other files. In my CI/CD pipeline I install that custom package with

pip install .

However, when the job is created in Azure ML, it cannot import those functions. It gives an error:

Traceback (most recent call last):
  File "mlops_i4t/machine_learning/model_utils.py", line 4, in <module>
    from mlops_i4t.preprocessing import get_final_dataframe
ModuleNotFoundError: No module named 'mlops_i4t'

How can I pip install a custom package and use it in an Azure ML job?

You might need environments (setup either programatically or manually) https://learn.microsoft.com/en-us/azure/machine-learning/concept-environments — superfuzzy, Oct 07 '22 at 12:36
@superfuzzy Yes I saw that, however where can I provide something like: pip install ...? I only can set predefined environments like "AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest", — Niels Hoogeveen, Oct 07 '22 at 13:14
I usually have a helper script in my repo that is basically `run_in_azure.py`. Have a look at this https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-set-up-training-targets on how to set that up — superfuzzy, Oct 07 '22 at 17:01

superfuzzy · Answer 1 · 2022-10-10T11:09:14.520

0

Edit on more complex setups (like your own package)

Create your own docker file, e.g.:

dockerfile = r"""
    FROM nvcr.io/nvidia/pytorch:22.02-py3

    RUN python3 -m pip install --upgrade pip setuptools wheel

    RUN pip install numpy .....
    COPY requirements.azure.txt .
    RUN pip install -r requirements.azure.txt
    ....
    """

    myEnv = Environment(name = "your-env-name")
    myEnv.docker.base_image = None
    myEnv.docker.base_dockerfile = dockerfile
    myEnv.python.user_managed_dependencies = True
    myEnv.docker.arguments= ['--privileged']
    myEnv.register(workspace=ws)

See below on how to use this myEnv.

I assume you know about ScriptRunConfig and similar (otherwise see my comment)

Then the way I typically set up my environment is:

    myenv = Environment.from_pip_requirements(
        name="your-env-name", 
        file_path="requirements.azure.txt"
    )
    
    # Specify a (GPU) base image
    myenv.docker.enabled = True
    myenv.docker.base_image = (
        "mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04"
    )
    
    # Either go with RunConfiguration (more general)
    train_run_config = RunConfiguration()
    train_run_config.environment = myenv
    ....

    # OR use the simpler ScriptRunConfig:
    run_config = ScriptRunConfig(
        source_directory=".",
        command=launch_cmd,
        compute_target=compute,
        environment=myenv,
    )

The docker image serves as a base, you can build your own or pick from azures defaults.

The crucial part is the from_pip_requirements I typically store my requirements in a separate requirements.azure.txt since my local install e.g. might not have a GPU, etc

Here you can also use pip to install prepackaged wheels of your local install. Install local wheel file with requirements.txt

I really hope this makes it clear now :) Otherwise feel free to leave some comment.

edited Oct 10 '22 at 11:09

answered Oct 07 '22 at 17:11

superfuzzy

346
2
13

Yes it does, but I only now understand where to put pip packages (e.g. Pandas, SKlearn). However I am still confused how to install your private custom Python package? Maybe an idea is to push that as a Docker image and use as the environment? – Niels Hoogeveen Oct 10 '22 at 09:52
You can either use the power of pip (did not test this though): https://stackoverflow.com/questions/62956690/install-local-wheel-file-with-requirements-txt Or you can as you said build your own docker image (what I normally do). Apart from solving your problem this also makes everything easier because you only have to worry about docker, not azure – superfuzzy Oct 10 '22 at 11:02
I tried to create an env from a Dockerfile. However I think the error is because I provide that environment in a command, but it expects azure.ai.ml.entities._assets.environment.Environment So I think I need to switch to a ScriptRunConfig class – Niels Hoogeveen Oct 10 '22 at 11:26
try it out :) also don't forget to `COPY . .` to be able to successfully build your package – superfuzzy Oct 10 '22 at 11:28
I tried, but now I get an error when building the Docker image. It says: Step 4/28 : COPY requirements.txt . COPY failed: file not found in build context or excluded by .dockerignore: stat ./requirements.txt: file does not exist Any suggestions? I guess it is because the source dir is within my package. src = ScriptRunConfig(source_directory="./src/mlops_i4t", script="train.py", compute_target=computeName, environment=env) – Niels Hoogeveen Oct 10 '22 at 12:57
yes you most likely need to `COPY src/requirements.txt .`, like you need the relative path from the dockerfile to your requirements txt – superfuzzy Oct 11 '22 at 09:59

How to install custom Python package and run with Azure Machine Learning job?

1 Answers1