issue with the datadrift notebook

Question

when running this Data Drift sample notebook, I'm having issues running a particular cell :

exp = Experiment(ws, datadrift._id)
dd_run = Run(experiment=exp, run_id=run)
RunDetails(dd_run).show()

This generates the following traceback :

(...)
ImportError: cannot import name 'get_run_ids_and_metric_types_filter_expression'

I believe there might be a version issue with this notebook. I'm running AzureML SDK 1.0.60, and this sample is drawn from the 1.0.60 version of the notebook (at least the one in the master branch as of today)

Or is this an issue with my environment?

I also realized, by inspecting the output logs of the run that I'm getting a traceback on the job itself :

The experiment failed. Finalizing run...
Traceback (most recent call last):
  File "datadrift_run.py", line 173, in <module>
    run.run(target_date)
  File "datadrift_run.py", line 100, in run
    drift_main(arguments_drift)
  File "/mnt/batch/tasks/shared/LS_root/jobs/playground-olivier/azureml/13f371b5-1985-44c2-921c-fd66b0dbe852_1568646629244/mounts/workspacefilestore/azureml/13f371b5-1985-44c2-921c-fd66b0dbe852_1568646629244/_generate_script.py", line 363, in main
    'datadrift_id': args.datadrift_id
  File "/mnt/batch/tasks/shared/LS_root/jobs/playground-olivier/azureml/13f371b5-1985-44c2-921c-fd66b0dbe852_1568646629244/mounts/workspacefilestore/azureml/13f371b5-1985-44c2-921c-fd66b0dbe852_1568646629244/_generate_script.py", line 75, in _get_drift_metrics
    diff_metrics = dsdo.run()
  File "/azureml-envs/azureml_9a12ab39ef186b06eb543bbc347567d8/lib/python3.6/site-packages/azureml/data/_dataset_diff.py", line 840, in run
    base_profile_metrics = get_dataprofile_metrics(self.base_datasetprofile, self.config)
  File "/azureml-envs/azureml_9a12ab39ef186b06eb543bbc347567d8/lib/python3.6/site-packages/azureml/data/_dataset_diff.py", line 163, in get_dataprofile_metrics
    column_type = column_type_classifier[(dp.columns[c].value_counts is None, dp.columns[c].histogram is None)]
KeyError: 'usaf'

These two are unrelated but generated by the same notebook.

After I removed 'usaf' from the feature_list variable in original notebook, the whole notebook worked fine for me. It's safe to do as as 'usaf' column doesn't exists in training dataset. — James Gan, Sep 16 '19 at 20:42
The ImportError doesn't happen to me. Maybe it's a version issue here. My environment is created from scratch this morning with newest package. When did you install azureml-sdk and azureml-contrib-datadrift and which version? — James Gan, Sep 16 '19 at 20:44
Yes this should be an environment or dependency issue. We run stringent gated tests on all these notebooks so if anything breaks it will come down to dependencies and their version, or the env itself. I would build a clean environment (recommended is 3.6.5) and reinstall `azureml-sdk` and `azureml-contrib-datadrift`. — Trevor Bye, Sep 19 '19 at 19:38
worked with a new environment. Mine was a notebook vM that had pip install --upgrade run on it. Probably things didn't go as expected. Thanks! — omartin2010, Sep 19 '19 at 21:45

issue with the datadrift notebook

0 Answers0