1

I am learning ML.Net and trying to use the AutoML API and getting a null reference exception. Question has been updated with my recent learnings and a minimal amount of code to reproduce.

Put this in VSCode and you too can experience a 2 dimension vector exploding.

class Program
{
    static void Main(string[] args)
    {
        var mlContext = new MLContext();

        // create schema for multidimensional vector
        var autoSchema = SchemaDefinition.Create(typeof(InputData));
        var col = autoSchema[1];
        col.ColumnType = new VectorDataViewType(NumberDataViewType.Single, 3, 60);

        // fabricate some data
        var trainingData = new List<InputData>();
        var inputData = new InputData();
        inputData.MultiDimensional = new float[20,20];
        for (int i = 0; i < inputData.MultiDimensional.GetUpperBound(0); i++)
        {
            for (int j = 0; j < inputData.MultiDimensional.GetUpperBound(1); j++)
            {
                inputData.MultiDimensional[i,j] = 5; // doesn't matter
            }
        }
        trainingData.Add(inputData);

        // setup a data view
        IDataView trainingDataView = mlContext.Data.LoadFromEnumerable<InputData>(trainingData, autoSchema);

        // preview it (goes BOOM)
        var preview = trainingDataView.Preview();

        // run the experiment
        var settings = new BinaryExperimentSettings();
        settings.MaxExperimentTimeInSeconds = 60;
        ExperimentResult<BinaryClassificationMetrics> experimentResult = mlContext.Auto()
            .CreateBinaryClassificationExperiment(settings)
            .Execute(trainingDataView);
    }
}

public class InputData
{
    public bool Label { get; set; }
    public float[,] MultiDimensional { get; set; }
}

The documentation seems to indicate my setup is correct: https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.data.vectortypeattribute.-ctor?view=ml-dotnet#Microsoft_ML_Data_VectorTypeAttribute__ctor_System_Int32___

To fix my multidimension vector problem, I've tried:

  • Removing the float[,] initializers in InputData
  • Specifying the exact size with [VectorType(3,60)] as appropriate for each property
  • Leaving the [VectorType] attribute off altogether and using autoschema to set it.
  • Leaving the [VectorType] attribute off altogether and not using autoschema to let ML.net figure it out on its own
  • Adding just [VectorType()], although the docs say that is for single dimension arrays.

My question now is - what is the correct way to use vectors with more than 1 dimension in the AutoML part of ML.Net? Is this even possible?

Bill Sambrone
  • 4,334
  • 4
  • 48
  • 70
  • Just curious, is there an error in loading the enumerable without the `autoSchema` parameter? – Jon Oct 21 '20 at 16:58
  • That's a great suggestion! I tried it out, same exception though. – Bill Sambrone Oct 21 '20 at 19:22
  • What happens if you do `trainingDataView.Preview()`? Also, is it possible to get a sample of the data? – Jon Oct 21 '20 at 21:35
  • Interesting - I got the same exception with doing Preview. I just now tried adding a [NoColumn] attribute on all multidimensional vectors, and the exception goes away. Is there something I'm doing wrong with these? It seems correct from the documentation: https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.data.vectortypeattribute.-ctor?view=ml-dotnet#Microsoft_ML_Data_VectorTypeAttribute__ctor_System_Int32___ – Bill Sambrone Oct 21 '20 at 22:03
  • Good find! I saw you put an issue in [here](https://github.com/dotnet/machinelearning/issues/5446). Hopefully the team can have a fix for you soon :) – Jon Oct 22 '20 at 08:14

1 Answers1

0

Oh wow, found my own question years later still open. From the github issue posted in the comments, this still is not possible. This github issue in 2022 confirms this is still the case.

Bill Sambrone
  • 4,334
  • 4
  • 48
  • 70