I am trying to use autoML feature of AML. I saw that in the sample notebook it is using Dataset.Tabular.from_delimited_files(train_data) which only takes data from a https path. I am wondering how can I use pandas dataframe directly automl config instead of using dataset API. Alternatively, what is the way I can convert pandas dataframe to tabular dataset to pass into automl config?
Asked
Active
Viewed 1,152 times
1 Answers
1
You could quite easily save your pandas dataframe to parquet, upload the data to the workspace's default blob store and then create a Dataset
from there:
# ws = <your AzureML workspace>
# df = <contains a pandas dataframe>
from azureml.core.dataset import Dataset
os.makedirs('mydata', exist_ok=True)
df.to_parquet('mydata/myfilename.parquet')
dataref = ws.get_default_datastore().upload('mydata')
dataset = Dataset.Tabular.from_parquet_files(path = dataref.path('myfilename.parquet'))
dataset.to_pandas_dataframe()
Or you can just create the Dataset
from local files in the portal http://ml.azure.com
Once you created it in the portal, it will provide you with the code to load it, which will look somewhat like this:
# azureml-core of version 1.0.72 or higher is required
from azureml.core import Workspace, Dataset
subscription_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
resource_group = 'ignite'
workspace_name = 'ignite'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='IBM-Employee-Attrition')
dataset.to_pandas_dataframe()

Daniel Schneider
- 1,797
- 7
- 20