training in Azure ml using a docker container which already has a training script

Question

I was looking into training in Azure ml using a custom docker container which already has a training script, but so far in the docs, I haven't found anything. Is it possible to upload a custom container(containing the training script) to the artifact registry and then use it for training?

I read all the docs couldn't find anything. Just like in Vertex Ai, we have an option to upload the custom container with the training script inside it and trigger it through vertex ai. Looking for something similar in Azure ML.

Welcome to Stack overflow! It might be helpful if you include some code allowing other users to see your initial approach to this problem. — John Harrington, Dec 21 '22 at 16:19

Daniel Schneider · Answer 1 · 2022-12-21T16:52:56.513

You can certainly run a docker that has the training script already baked in. You just need to keep 2 things in mind:

AzureML won't execute the CMD of your docker. Instead it will execute the command you provide with your job.
When AzureML executes the command on your docker, it will do so in the folder where it placed the code files. Given that you don't have any, that will be an empty folder. So you should navigate to the right folder before executing your command.

See here for an example using the simple docker danielschneider/hello that contains python and a hello.py file that prints out hello world.

If you run it locally, it does just that:

docker run -it danielschneider/hello python hello.py
> hello world

Here is how you run that command on a cluster node in AzureML

Create a yaml file with the following contents and save it as test.yaml:

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
type: command

environment: 
  image: danielschneider/hello

command: |
  cd /  
  python hello.py

# replace below cpu-cluster with the name of your compute target
compute: azureml:cpu-cluster

Use the CLI to execute the job:

az ml job create -f test.yaml

(this does, of course, require the installation of the Azure CLI and the AzureML extension -- see here for instructions)

Some details on the above YAML

The $schema and the type are not required, but they help VSCode in giving you Intellisense when you edit the file.

The environment is obviously pointing to the docker.io location of my hello image. You can point to any public image or private images in the workspace container registry (or any other container registry connected to your workspace). In addition you can point to registered environments from your workspace or an attached registry.

The command is what is executed on the docker. As you can see, I am navigating to the root before executing the command since that is where hello.py is located in the docker image. Note that the | is a standard YAML notation to start a multi-line string.

For compute, you would obviously have to use whatever compute you want to run the image on.

training in Azure ml using a docker container which already has a training script

1 Answers1

Here is how you run that command on a cluster node in AzureML

Some details on the above YAML