0

I am having a problem with mounting a dataset (created from an azure data lake datastore). I download the dataset by name and am trying to pass it as input to a Tensorflow estimator. The script parameter I provide is as below:

'--data-folder': dataset.as_named_input('trainigdata').as_mount('tmp/dataset')

But I get the following exception:

Mounting trainigdata to tmp/dataset
ERROR - Uncaught exception from FUSE operation opendir, returning errno.EINVAL.
Traceback (most recent call last):
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/fuse.py", line 734, in _wrapper
    return func(*args, **kwargs) or 0
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/fuse.py", line 954, in opendir
    path.decode(self.encoding))
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/fuse.py", line 1076, in __call__
    return getattr(self, op)(*args)
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/fuse/dprepfuse.py", line 297, in opendir
    self._open_dirs[path] = self._list_entries(path)
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/fuse/dprepfuse.py", line 145, in _list_entries
    .to_pandas_dataframe(extended_types=True)
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/api/_loggerfactory.py", line 131, in wrapper
    return func(*args, **kwargs)
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/api/dataflow.py", line 706, in to_pandas_dataframe
    ExecuteAnonymousActivityMessageArguments(anonymous_activity=Dataflow._dataflow_to_anonymous_activity_data(dataflow_to_execute)))
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/api/_aml_helper.py", line 38, in wrapper
    return send_message_func(op_code, message, cancellation_token)
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/api.py", line 88, in execute_anonymous_activity
    response = self._message_channel.send_message('Engine.ExecuteActivity', message_args, cancellation_token)
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/engine.py", line 74, in send_message
    raise_engine_error(response['error'])
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/api/errorhandlers.py", line 22, in raise_engine_error
    raise ExecutionError(error_response)
azureml.dataprep.api.errorhandlers.ExecutionError: Could not execute the specified transform.|session_id=101b574b-cdd2-4975-a5bd-0e57c9fc061f
Logging warning in history service: ERROR:: Dataset  failed. . Exception Details:Traceback (most recent call last):
  File "/mnt/batch/tasks/shared/LS_root/jobs/env/azureml/trainprediction_aks_1571941512_8d9344d7/mounts/workspaceblobstore/azureml/trainprediction_AKS_1571941512_8d9344d7/azureml-setup/context_managers.py", line 208, in __enter__
    self.datasets.__enter__()
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/data/context_managers.py", line 119, in __enter__
    context_manager.__enter__()
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/fuse/daemon.py", line 92, in __enter__
    self._wait_until_mounted()
  File "/azureml-envs/azureml_f73412f070d144d39c8a826b53bde771/lib/python3.6/site-packages/azureml/dataprep/fuse/daemon.py", line 142, in _wait_until_mounted
    while not os.path.exists(self.mount_point) or len(os.listdir(self.mount_point)) == 0:
OSError: [Errno 22] Invalid argument: '/mnt/batch/tasks/shared/LS_root/jobs/env/azureml/trainprediction_aks_1571941512_8d9344d7/mounts/workspaceblobstore/azureml/trainprediction_AKS_1571941512_8d9344d7/tmp/dataset'

Can someone help with this.

theduck
  • 2,589
  • 13
  • 17
  • 23

2 Answers2

0

Currently we have the blob APIs to start mounting ADLS Gen2 for running training jobs. Simpler solution would be to sign up for MPA (https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-multi-protocol-access) This allows you to use native Datastore API & mount point in AML with ADLS gen2.

Do you want to support for folder level ACLs while doing data access, If yes currently ACL's will not be supported it's on road map to develop.

Ram
  • 2,459
  • 1
  • 7
  • 14
0

unfortunately, I am unable to reproduce this error using the latest azureml-sdk.

Instead of using a relative mount path, can you please try the following:

'--data-folder': dataset.as_named_input('trainigdata').as_mount('/tmp/dataset')

Specifically, change tmp/dataset to /tmp/dataset?

Thanks!

Rongduan Zhu
  • 116
  • 4