0

I'm just starting with ML.Net and find myself confused by the rapid evolution of APIs and samples based on various API versions.

My goal is to read in several numeric feature columns and one text column specifying a label ("Brand"), but I get an error on the last line of this snippet

var trainingDataView = mlContext.Data.ReadFromTextFile<PurchaseData>
    (path: trainDataPath, hasHeader: true, separatorChar: ',');

var dataProcessPipeline = mlContext.Transforms
    .Concatenate(DefaultColumnNames.Features,
                                nameof(PurchaseData.AgeBracket),
                                nameof(PurchaseData.Gender),
                                nameof(PurchaseData.IncomeBracket),
                                )                               
    .Append(mlContext.Transforms.CopyColumns("Label", nameof(PurchaseData.Brand)))
    .AppendCacheCheckpoint(mlContext);

var trainer = mlContext.MulticlassClassification.Trainers
    .StochasticDualCoordinateAscent(featureColumn: DefaultColumnNames.Features);
var trainingPipeline = dataProcessPipeline.Append(trainer);

var trainedModel = trainingPipeline.Fit(trainingDataView);

'Schema mismatch for label column 'Label': expected float, double or KeyType, got Text'

Why is the label not expected/allowed to be Text and what can I do to fix it?

DFord
  • 2,352
  • 4
  • 33
  • 51
Eric J.
  • 147,927
  • 63
  • 340
  • 553

1 Answers1

3

You'll need to convert your Label to a key type, the algorithms need numbers as inputs. Replace: .Append(mlContext.Transforms.CopyColumns("Label", nameof(PurchaseData.Brand)))

With:

mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: DefaultColumnNames.Label,inputColumnName:nameof(PurchaseData.Brand))

Take a look at this, for an example: https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/end-to-end-apps/MulticlassClassification-GitHubLabeler/GitHubLabeler/GitHubLabelerConsoleApp/Program.cs

amy8374
  • 1,450
  • 3
  • 17
  • 26