0

I am trying to get a list of all the jobs in my workspace belonging to a certain experiment. This is a trivial matter using the v1 of the SDK (relevant documentation). However, I am unable to do that using the v2.

More in general, I can't figure out how to explore jobs applying filters of any kind. The closest thing I can find in the documentation is this list method combined with list comprehension, but that is way too slow.

What would be the best way to do this using the Azure ML SDK v2? Is it possible to do it at all?

First, this is the code using the v1. It does what I want and it takes less than a second to complete:

#My version of azureml-core is 1.49.0
import azureml.core
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
exp = Experiment(workspace=ws, name="my_experiment_name")
runs = exp.get_runs()

Now this is the closest thing I came up with using the version 2 of the sdk, using the same workspace as the previous example. This also works but it takes around seven minutes to complete:

# My version of azure-ai-ml is 1.5.0
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

ml_client = MLClient(
        credential= DefaultAzureCredential(),
        subscription_id= "my_subscription_id",
        resource_group_name= "my_resource_group_name",
        workspace_name= "my_workspace_name",
    )

runs = [run for run in ml_client.jobs.list() if run.experiment_name == "my_experiment_name"]

1 Answers1

0

I tried in my environment and got the below results:

How to get all jobs associated with an experiment in Azure ML Python SDK v2 ?, I came up with using version 2 of the SDK, using the same workspace as the previous example. This also works but it takes around seven minutes to complete.

The get_runs() function of the Experiment class, which is used by the v1 SDK, allows users to directly receive the runs for a given experiment from the Azure ML service. This API call is faster since it is designed to just deliver the necessary data.

The jobs.list() function of the MLClient class, on the other hand, is used by the v2 SDK to obtain all of the tasks in the workspace and then filter them using a list comprehension. This API request is slower since it obtains a lot more data than is required.

You can use the below code to get the list of jobs associated with an experiment using Azureml python SDK v2.

Code:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

ml_client = MLClient(
        credential= InteractiveBrowserCredential(),
        subscription_id="Your-subscription-id",
        resource_group_name="Your-resource-grp",
        workspace_name="Your-workspace-name"
        )
for run in ml_client.jobs.list():
      if run.experiment_name == "Default":
              print("Name",run.display_name)
              print("Type",run.type)
              print("Compute",run.compute)
              print("Status",run.status)

Output:

Name: data326
Type: automl
Compute: testcompute
Status: Completed

enter image description here

Sourav
  • 814
  • 1
  • 9
  • This is basically what I got. I am surprised, the documentation paints the v2 of the SDK as a direct upgrade, so I expected to be able to do everything I could with the v1. But the method with the v2 is unacceptably slow. – tamboles98 Apr 24 '23 at 08:11
  • @tamboles98, Okay, but did the solution work? – Sourav Apr 24 '23 at 08:13