Azure Mlflow Run id not found using Python SDK azure-ai-ml v2

Question

I'm using Azure ML jobs to run an experiment using python sdk-v2, and I haven't be able to access into the run logs after the run is completed. I'm not sure what is happening, if I'm missing some permission or a previous step. It just says "run 'xxxx' not found

from mlflow.tracking import MlflowClient

# Use MlFlow to retrieve the job that was just completed
run_id = 'musing_steelpan_xxxx'

finished_mlflow_run = MlflowClient().get_run(run_id)

. The run_id actually exist, I'm the owner of the worspace and cluster.

MlflowException                           Traceback (most recent call last)
Cell In [5], line 6
      3 # Use MlFlow to retrieve the job that was just completed
      4 run_id = 'musing_steelpan_hnlbhxf9qy'
----> 6 finished_mlflow_run = MlflowClient().get_run(run_id)

File /miniconda/envs/benchmark/lib/python3.8/site-packages/mlflow/tracking/client.py:150, in MlflowClient.get_run(self, run_id)
    112 def get_run(self, run_id: str) -> Run:
    113     """
    114     Fetch the run from backend store. The resulting :py:class:`Run <mlflow.entities.Run>`
    115     contains a collection of run metadata -- :py:class:`RunInfo <mlflow.entities.RunInfo>`,
   (...)
    148         status: FINISHED
    149     """
--> 150     return self._tracking_client.get_run(run_id)

File /miniconda/envs/benchmark/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py:72, in TrackingServiceClient.get_run(self, run_id)
     58 """
     59 Fetch the run from backend store. The resulting :py:class:`Run <mlflow.entities.Run>`
     60 contains a collection of run metadata -- :py:class:`RunInfo <mlflow.entities.RunInfo>`,
  (...)
     69          raises an exception.
     70 """
     71 _validate_run_id(run_id)
   ...
    648     )
    649 run_info = self._get_run_info_from_dir(run_dir)
    650 if run_info.experiment_id != exp_id:

MlflowException: Run 'musing_steelpan_xxxx' not found

What type of run is this? And where did you get the run_id from? — Daniel Schneider, Dec 13 '22 at 18:30
I'm running an Azure ML training Job, you can check reference [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-model?tabs=python) and the run_id was extracted from the Azure Portal workspace where the run was executed. — Simón Cerda, Dec 13 '22 at 19:12

score 0 · Answer 1 · answered Dec 15 '22 at 10:47

0

In some cases (e.g. for jobs inside a pipeline, jobs inside a sweep), the display_name shown at the top in the portal (which can be changed by the user) is not the same as the name of the job (which is immutable) and shown further down in the portal (see image below).

Did you take the name or the display_name from the portal (or are they the same)?

answered Dec 15 '22 at 10:47

Daniel Schneider

1,797
7
20

Thanks Daniel for the help, but the name indeed is the same. I even use the properties json file to check it. I feel I'm missing some permission to access the Run, but I can't find anything related. – Simón Cerda Dec 15 '22 at 12:25

score 0 · Answer 2 · answered Dec 18 '22 at 21:29

Here is another idea: You might not be connected to the right workspace. You set the workspace by means of the MLFLOW_TRACKING_URI or as a direct parameter to the mlflow client. Try going to the Azure Portal and look at the workspace properties -- there you find the MLFLow Tracking URI for the workspace:

Then you can plug it into the code below -- this should print out 100 runs of your workspace (I believe the first 100...):

client = mlflow.tracking.MlflowClient(tracking_uri="<your mlflow tracking uri>")
runs = client.search_runs(experiment_ids=[])
for run in runs:
    print(run.info.run_uuid)

For the above code to work you need to:

pip install mlflow azureml-mlflow azureml-core
az login

Azure Mlflow Run id not found using Python SDK azure-ai-ml v2

2 Answers2