0

I would like a general learning pipeline (from N features predict a label, for example), in the sense that one of my input CSVs would have 5 features and another would have 10 features (those two CSVs would obviously produce different models, I don't want to combine them in any way, I just want to run the same program on both the CSVs).

However, to load the features, I need to use

TextLoader(...).CreateFrom<ClassA>()

where ClassA defines my schema. Its properties need to reflect the CSV format, therefore the CSV must always have the same number of columns.

I have noticed CustomTextLoader but it's obsolete. Any ideas? Thank you.

asdf
  • 721
  • 2
  • 8
  • 24

2 Answers2

1

Taking a look at the source: (https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML/Data/TextLoader.cs)

CreateFrom looks like nothing more than a helper method that populates Arguments.Columns and Arguments, both of which are publicly accessible. This means that you could write your own implementation.

TextLoader tl = new TextLoader(inputFileName)
tl.Arguments.HasHeader = useHeader;
tl.Arguments.Separator = new[] { separator };
tl.Arguments.AllowQuoting = allowQuotedStrings;
tl.Arguments.AllowSparse = supportSparse;
tl.Arguments.TrimWhitespace = trimWhitespace;

And now the important part, you'll need to populate a TextLoader.Arguments.Columns with an entry for each column in your data set. If you know ahead of time that you'll have 5 or 10 columns that would be the simplest, but otherwise, I'd peek into the CSV to figure out.

tl.Arguments.Column = new TextLoaderColumns[numColumns];
tl.Arguments.Column[0].Name = ...
tl.Arguments.Column[0].Source = ... // see the docs
tl.Arguments.Column[0].Type = ...
// and so on.
jaket
  • 9,140
  • 2
  • 25
  • 44
  • I was hoping for a bit more straightforward solution but you are right, this should work (I haven't tried yet). Thank you. – asdf Sep 15 '18 at 16:54
1

jaket - thank you for your answer. I can see how that would work loading the data into the TextLoader. However, how would you then Train the model? as the pipeline Train() method also requires you to pass in an object defining the data schema :

 PredictionModel<ClassA, ClassAPrediction> model = pipeline.Train<ClassA, ClassAPrediction>();
Dave
  • 49
  • 5