AzureMl pipeline: How to access data of step1 into step2

Question

I am following this article from microsoft to create azure ml pipeline with two steps and want to use data written by step1 into step2. According to the article below code should provide path of data written by step1 into script used for step2 as an argument

datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
)

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input]

)

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

But when I acccess the pd argument in step2.py it provides the

"<bound method OutputFileDatasetConfig.as_mount of <azureml.data.output_dataset_config.OutputFileDatasetConfig object at 0x7f8ae7f478d0>>"

Any idea how to pass blob storage location used by step1 to write data in step2?

You should try to follow the following notebook, the steps are described and you will also find the underlying python scripts used, especially the `train.py` script. https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb — arhr, Feb 15 '21 at 13:31

score 0 · Answer 1 · answered Apr 15 '21 at 18:32

You will probably find what you need here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines. Particularly, note the section Read OutputFileDatasetConfig as inputs to non-initial steps:

# get adls gen 2 datastore already registered with the workspace
datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", 
destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
    )

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input()]
    )

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

Your mistake is probably that OutputFileDatasetConfig has a method as_input() but not a property.

AzureMl pipeline: How to access data of step1 into step2

1 Answers1