Storing training dataset in a platform like mlflow

Question

Are there experiment management platforms that also allow storing and managing training datasets (images, in my case)? I am familiar with the ML-Flow, but AFAIK it doesn't support such an option, am I right? If there are no platforms like this, how would you suggest managing training datasets in combination with existing platforms?

hsaltan · Answer 1 · 2022-08-09T09:12:29.130

0

You can use a cloud service's object storage to store and manage your training dataset. Any cloud provider has such a solution. Then, your code should allow ingesting data from storage buckets to train and do experiment tracking.

edited Aug 09 '22 at 09:12

answered Aug 07 '22 at 12:21

hsaltan

461
1
3
13

Thank you very much for an answer. If I understand you right, your idea will allow me to store the data-set, but since I want to see 3 things in one place: 1)which data-set was used 2)which parameters (alpha and l1 for example) were used for training 3)what were the results metrics such as MAE, F1 etc It will not be helpful in the 1st one (at least in ML-FLOW), am I right? I mean that I will still have to manage some kind of excel file where I right down name of data-set or file images names? – Igor Aug 07 '22 at 14:52
You could save Mlflow output, the model, and the metrics in S3 again as well as the dataset. I don't think you need to put them in something like excel. – hsaltan Aug 08 '22 at 07:21
Please don't post product/service recommendations as answers. – David Makogon Aug 08 '22 at 16:12

Storing training dataset in a platform like mlflow

1 Answers1