0

I have a requirement to use azure machine learning to develop a pipeline. In this pipeline we don't pass data as inputs/outputs but variables (for example a list or an int). I have looked on the Microsoft documentation but could not seem to find something fitting my case. Also tried to use the PipelineData class but could not retrieve my variables.

  1. Is this possible?
  2. Is this a good approach?

Thanks for your help.

Curiousme
  • 3
  • 3
  • You can refer to [PipelineData Class](https://learn.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py#methods) and [Please rework the pipeline interactions with azureml.data.OutputFileDatasetConfig](https://github.com/Azure/azure-sdk-for-python/issues/23565#issuecomment-1078626800) – Ecstasy Mar 29 '22 at 04:38

2 Answers2

1

I know I'm a bit late to the party but here we go:

Passing variables between AzureML Pipeline Steps

To directly answer your question, to my knowledge it is not possible to pass variables directly between PythonScriptSteps in an AzureML Pipeline.

The reason for that is that the steps are executed in isolation, i.e. the code is run in different processes or even computes. The only interface a PythonScriptStep has is (a) command line arguments that need to be set prior to submission of the pipeline and (b) data.

Using datasets to pass information between PythonScriptSteps

As a workaround you can use PipelineData to pass data between steps. The previously posted blog post may help: https://vladiliescu.net/3-ways-to-pass-data-between-azure-ml-pipeline-steps/

As for your concrete problem:

# pipeline.py

# This will make Azure create a unique directory on the datastore everytime the pipeline is run.
variables_data = PipelineData("variables_data", datastore=datastore)

# `variables_data` will be mounted on the target compute and a path is given as a command line argument
write_variable = PythonScriptStep(
    script_name="write_variable.py",
    arguments=[
        "--data_path",
        variables_data
    ],
    outputs=[variables_data],
)

read_variable = PythonScriptStep(
    script_name="read_variable.py",
    arguments=[
        "--data_path",
        variables_data
    ],
    inputs=[variables_data],
)

In your script you'll want to serialize the variable / object that you're trying to pass between steps:

(You could of course use JSON or any other serialization method)

# write_variable.py

import argparse
import pickle
from pathlib import Path

parser = argparse.ArgumentParser()
parser.add_argument("--data_path")
args = parser.parse_args()

obj = [1, 2, 3, 4]

Path(args.data_path).mkdir(parents=True, exist_ok=True)
with open(args.data_path + "/obj.pkl", "wb") as f:
    pickle.dump(obj, f)

Finally, you can read the variable in the next step:

# read_variable.py

import argparse
import pickle

parser = argparse.ArgumentParser()
parser.add_argument("--data_path")
args = parser.parse_args()


with open(args.data_path + "/obj.pkl", "rb") as f:
    obj = pickle.load(f)

print(obj)
Till
  • 101
  • 4
0

The way you are trying to approach is not completely successful approach. However, there are few possible steps to pass the variables in the pipeline and that too need to pass as variables dataset. The procedure of implementation is managed in a documentation and sharing that.

https://vladiliescu.net/3-ways-to-pass-data-between-azure-ml-pipeline-steps/

How to use Pipeline parameters on AzureML

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-pipeline-parameter

https://learn.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py

Sairam Tadepalli
  • 1,563
  • 1
  • 3
  • 11
  • Thanks for your answer! I'm not sure I entirely get your explanation. Let's say at the end of step1 I created a list [1,2,3] that I want step2 to use, how can I use the PipelineParameters? This is why I tried to use PipelineData as referred in the first link you shared. I was able to store the output in the container but as an input for step2, I'm only getting the Path to that object. So I wonder how I can access it. – Curiousme Mar 29 '22 at 12:35