1

I'd like to add a custom column after loading my IDataView from file. In each row, the column value should be the sum of previous 2 values. A sort of Fibonacci series.

I was wondering to create a custom transformer but I wasn't able to find something that could help me to understand how to proceed. I also tried to clone ML.Net Git repository in order to see how other transformers were implemented but I saw many classes are marked as internal so I cannot re-use them in my project.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Luca
  • 11
  • 3

2 Answers2

1

There is a way to create a custom transform with CustomMapping

Here's an example I used for this answer.

The input and output classes:

class InputData
{
    public int Age { get; set; }
}

class CustomMappingOutput
{
    public string AgeName { get; set; }
}

class TransformedData
{
    public int Age { get; set; }

    public string AgeName { get; set; }
}

Then, in the ML.NET program:

MLContext mlContext = new MLContext();

var samples = new List<InputData>
{
    new InputData { Age = 16 },
    new InputData { Age = 35 },
    new InputData { Age = 60 },
    new InputData { Age = 28 },
};

var data = mlContext.Data.LoadFromEnumerable(samples);

Action<InputData, CustomMappingOutput> mapping =
    (input, output) =>
    {
        if (input.Age < 18)
        {
            output.AgeName = "Child";
        }
        else if (input.Age < 55)
        {
            output.AgeName = "Man";
        }
        else
        {
            output.AgeName = "Grandpa";
        }
    };

var pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null);

var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);

var dataEnumerable = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: true);

foreach (var row in dataEnumerable)
{
    Console.WriteLine($"{row.Age}\t {row.AgeName}");
}
Jon
  • 2,644
  • 1
  • 22
  • 31
  • Hi, thanks for your response. Unfortuantely this does not resolve my issue. the "input" parameter contains just the current row processed. It seems there is no way to access the previous rows in that context. Is it correct? – Luca Jun 27 '19 at 11:07
  • Right, I don't believe there is a way (currently) that you can access a previous row. – Jon Jun 28 '19 at 10:32
0

Easy thing. I am assuming, you know how to use pipelines.

This is a part of my project, where I merge two columns together:

IEstimator<ITransformer> pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null)
                            .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question1", outputColumnName: "question1Featurized"))
                            .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question2", outputColumnName: "question2Featurized"))
                            .Append(mlContext.Transforms.Concatenate("Features", "question1Featurized", "question2Featurized"))
                            //.Append(mlContext.Transforms.NormalizeMinMax("Features"))
                            //.AppendCacheCheckpoint(mlContext)
                            .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: nameof(customTransform.Label), featureColumnName: "Features"));

As you can see the two columns question1Featurized and question2Featurized are combined into Features which will be created and can be used as any other column of IDataView. The Features column does not need to be declared in a separate class.

So in your case you should transform the columns firs in their data type, if strings you can do what I did and in case of numeric values use a custom Transformer/customMapping.

The documentation of the Concatenate function might help as well!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459