0

My "experiment" is like this,

Experiment

I have 10 rows (excluding header) in "Dataset.csv" and 3 rows (excluding header) in the CSV being imported by Import Data. The schema of both CSVs is same. I want Add Rows to append the 3 rows to Dataset.csv.

The real "Dataset.csv" has more than 25,000 rows and is expected to grow. Hence, using Export Data to generate a merged dataset (as a new CSV) is not a feasible solution. Any way to implement append for this scenario?

Thanks

Update 1: Dataset.csv is present in ML Studios Dataset.

enter image description here

Mujeeb
  • 995
  • 1
  • 8
  • 18
  • Not quite sure I understand. You're unable to use the CSV generated by "Export Data"? – Jon Aug 28 '18 at 12:51
  • @Jon I am unable to use the CSV to update the existing dataset. And by dataset, I mean ML Studios' dataset (see Update 1). – Mujeeb Aug 28 '18 at 12:59
  • Ahhhh, I see. I did find [this answer](https://stackoverflow.com/a/36132435/186013) where you can't update a dataset that's uploaded to Azure ML, but it looks like you can you can upload with a different name, remove the original dataset, and rename the new one. I believe [this](https://github.com/Azure/Azure-MachineLearning-ClientLibrary-Python) is the SDK for it. – Jon Aug 28 '18 at 13:03
  • 1
    Oh, looking at the SDK, it may be possible to just update the dataset with the `update_from_dataframe` method. – Jon Aug 28 '18 at 13:05
  • I will look into these links, but the real problem is that I can't always delete and re-upload the dataset. It's at 400 MB right now and is expected to grow :) – Mujeeb Aug 28 '18 at 13:05
  • The update method should be exactly what you need, I think :) – Jon Aug 28 '18 at 13:06
  • @Jon `update_from_dataframe` worked brilliantly. Thanks! – Mujeeb Aug 29 '18 at 06:44
  • Glad it worked! :) – Jon Aug 29 '18 at 10:27

1 Answers1

2

So it turns out the Python SDK has an update_from_dataframe method on it that can be used to update a dataset that has been uploaded to Azure ML Studio. If you're unable to use a new CSV and need to update an existing data set, then this should do the trick.

Jon
  • 2,644
  • 1
  • 22
  • 31
  • Any way to achieve the same via Azure Functions? – Mujeeb Aug 30 '18 at 04:10
  • Microsoft's documentation for this, https://learn.microsoft.com/en-us/azure/machine-learning/team-data-science-process/python-data-access – Mujeeb Aug 30 '18 at 11:00
  • Interestingly, Azure Functions doesn't fully support Python. However, it is [in the works](https://feedback.azure.com/forums/355860-azure-functions/suggestions/13250616-will-you-support-python-on-functions). – Jon Aug 30 '18 at 13:09
  • AF version 2 will no longer have Python support. But have you come across any library in C# that can append rows to the dataset (similar to how the Python SDK does it)? – Mujeeb Sep 03 '18 at 05:03
  • I think it will have Python support in v2 due to [this repo](https://github.com/Azure/azure-functions-python-worker). It is still early in its development so the docs about supporting it may not be updated yet. It would be really weird if they didn't support Python :) – Jon Sep 03 '18 at 10:36