0

In my code I read data as mltable for the Azure SDK v2. Then I convert that into pandas DataFrame, do some work on the data and then I want to save it back as an mltable in order to use that as a part of my pipeline.

Unfortunately I was unable to find any proper documentation with straightforward answer to that.

My code looks like this:

path = {
    'file': f'azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/my_container/paths/raw_data/my_data.csv'
    }

tbl = mltable.from_delimited_files(paths=[path])
df = tbl.to_pandas_dataframe()

At the same time I wonder if I should save it as a URI file in the Data section in order to use that for my pipeline component? If so I would also be happy to get some inputs on how to convert pandas into URI file. I found no examples in the official documentation.

Thank you in advance.

Egorsky
  • 179
  • 1
  • 11

1 Answers1

1

To create URI file from a Dataframe, you need to create a CSV File with the dataframe and then upload the CSV File.

To do this you can use AzureMachineLearningFileSystem Class Below is a code snippet to do the task:

from azureml.fsspec import AzureMachineLearningFileSystem
from azureml.core import Workspace
ws=Workspace.from_config()
import pandas as pd

data = {'column1': [1, 2, 3], 'column2': ['A', 'B', 'C']}
df_sample = pd.DataFrame(data)
df_sample.to_csv('sample_data.csv', index=False)

uri = f'azureml://subscriptions/{ws.subscription_id}/resourcegroups/{ws.resource_group}/workspaces/{ws.name}/datastores/workspaceblobstore/paths/'
fs = AzureMachineLearningFileSystem(uri)


# Upload a file
fs.upload(lpath='sample_data.csv', rpath='raw_data2/sample_data.csv', recursive=False, **{'overwrite': 'MERGE_WITH_OVERWRITE'})

This will create a csv file in the datastore. enter image description here

Again, you can access the updated CSV file URI with mltable. enter image description here

RishabhM
  • 525
  • 1
  • 5