3

I've followed the documentation pretty well as outlined here.

I've setup my azure machine learning environment the following way:

from azureml.core import Workspace

# Connect to the workspace
ws = Workspace.from_config()

from azureml.core import Environment
from azureml.core import ContainerRegistry

myenv = Environment(name = "myenv")

myenv.inferencing_stack_version = "latest"  # This will install the inference specific apt packages.

# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..." 
myenv.docker.arguments = None

# Environment variable (I need python to look at folders 
myenv.environment_variables = {"PYTHONPATH":"/root"}

# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python" 

from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep

myenv.register(workspace=ws) # works!

I have a score.py file configured for inference (not relevant to the problem I'm having)...

I then setup inference configuration

from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

I setup my compute cluster:

from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "theclustername" 

# Check to see if the cluster already exists
try:
    aks_target = ComputeTarget(workspace=ws, name=aks_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")

    aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)

    aks_target.wait_for_completion(show_output=True)

from azureml.core.webservice import AksWebservice

# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
                                                    num_replicas=3,
                                                    cpu_cores=4,
                                                    memory_gb=10)

Everything succeeds; then I try and deploy the model for inference:

from azureml.core.model import Model

model = Model(ws, name="thenameofmymodel")

# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'

# Deploy the model
aks_service = Model.deploy(ws,
                           aks_service_name,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=gpu_aks_config,
                           deployment_target=aks_target,
                           overwrite=True)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

And it fails saying that it can't find the environment. More specifically, my environment version is version 11, but it keeps trying to find an environment with a version number that is 1 higher (i.e., version 12) than the current environment:

FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here: 
Error:
{
  "code": "BadRequest",
  "statusCode": 400,
  "message": "The request is invalid",
  "details": [
    {
      "code": "EnvironmentDetailsFetchFailedUserError",
      "message": "Failed to fetch details for Environment with Name: myenv Version: 12."
    }
  ]
}

I have tried to manually edit the environment JSON to match the version that azureml is trying to fetch, but nothing works. Can anyone see anything wrong with this code?

Update

Changing the name of the environment (e.g., my_inference_env) and passing it to InferenceConfig seems to be on the right track. However, the error now changes to the following

Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
  "code": "DeploymentFailed",
  "statusCode": 404,
  "message": "Deployment not found"
}

Solution

The answer from Anders below is indeed correct regarding the use of azure ML environments. However, the last error I was getting was because I was setting the container image using the digest value (a sha) and NOT the image name and tag (e.g., imagename:tag). Note the line of code in the first block:

myenv.docker.base_image = "4fb3..." 

I reference the digest value, but it should be changed to

myenv.docker.base_image = "imagename:tag"

Once I made that change, the deployment succeeded! :)

Brian Goodwin
  • 391
  • 3
  • 14
  • Thanks for all the context! – Anders Swanson Aug 17 '20 at 21:57
  • Are all four of the code chunks you shared above executed within the same `.py` or `.ipynb`? Or are the chunks split across multiple files? – Anders Swanson Aug 17 '20 at 22:11
  • They are executed in the same notebook! :) thank you for your answer- going to try it out ASAP! – Brian Goodwin Aug 17 '20 at 23:18
  • @BrianGoodwin I am curious why you are setting this to true: myenv.python.user_managed_dependencies = True? I ask because you are defining the conda packages so why not Azure ML manage the environment and dependencies? – Aravind Yarram Sep 08 '20 at 18:07
  • Because the docker image already contains the (more complex) dependencies including custom packages that aren't available via pip, Conda, etc. So I setup an environment variable that forces the python interpreter to be what I want. The Conda dependencies are there simply for the only one that's required by azureml. I hope this helps! – Brian Goodwin Sep 10 '20 at 20:58

1 Answers1

1

One concept that took me a while to get was the bifurcation of registering and using an Azure ML Environment. If you have already registered your env, myenv, and none of the details of the your environment have changed, there is no need re-register it with myenv.register(). You can simply get the already register env using Environment.get() like so:

myenv = Environment.get(ws, name='myenv', version=11)

My recommendation would be to name your environment something new: like "model_scoring_env". Register it once, then pass it to the InferenceConfig.

Anders Swanson
  • 3,637
  • 1
  • 18
  • 43
  • 1
    Thank you! Your approach seems to be on the right track... my error is now changing (see the update I posted). Do you think I should just nix everything and start from square one? Maybe my azureml has some weird configuration that I didn't setup correctly? no idea. mystified here. – Brian Goodwin Aug 18 '20 at 01:14
  • 1
    do you actually need a custom docker image? and does the end-to-end notebook example work for you without issue? – Anders Swanson Aug 18 '20 at 01:36
  • yea - I have some dependencies that I can't really wrap up and ship via the CondaDependencies classes. With the exception of the last deployment step, everything works great. Even so you'd think that azureml should be able to accommodate custom images given that they have the means to set it up. Let me try and setup a public GitHub repo with the code. – Brian Goodwin Aug 18 '20 at 01:54
  • I do wonder though if the error is due to some "behind the curtains" checks that azure does... I should try to work around it without a docker image. I'll give it a try. – Brian Goodwin Aug 18 '20 at 01:59
  • I’m thinking 50-50 odds on backend error vs. one tiny error we don’t know we’re making. – Anders Swanson Aug 18 '20 at 02:01
  • 1
    My tried and true approach is to clone an example notebook and start tweaking/iterating towards your setup until you find the error. – Anders Swanson Aug 18 '20 at 02:12
  • Solid advice @AndersSwanson - thanks again. I’ll try that! – Brian Goodwin Aug 18 '20 at 02:14
  • 1
    np. I’ll share this Q with some people who know more about deployments in the AM – Anders Swanson Aug 18 '20 at 02:16
  • 1
    thanks again for your help! I edited my post with the solution that enabled the deployment to succeed... turned out to be a simple one line of code like you said LOL. But your answer definitely solved the larger (original) problem I was facing with Azure ML doing the environment versioning. – Brian Goodwin Aug 20 '20 at 18:51