I am learning ML.Net and trying to use the AutoML API and getting a null reference exception. Question has been updated with my recent learnings and a minimal amount of code to reproduce.
Put this in VSCode and you too can experience a 2 dimension vector exploding.
class Program
{
static void Main(string[] args)
{
var mlContext = new MLContext();
// create schema for multidimensional vector
var autoSchema = SchemaDefinition.Create(typeof(InputData));
var col = autoSchema[1];
col.ColumnType = new VectorDataViewType(NumberDataViewType.Single, 3, 60);
// fabricate some data
var trainingData = new List<InputData>();
var inputData = new InputData();
inputData.MultiDimensional = new float[20,20];
for (int i = 0; i < inputData.MultiDimensional.GetUpperBound(0); i++)
{
for (int j = 0; j < inputData.MultiDimensional.GetUpperBound(1); j++)
{
inputData.MultiDimensional[i,j] = 5; // doesn't matter
}
}
trainingData.Add(inputData);
// setup a data view
IDataView trainingDataView = mlContext.Data.LoadFromEnumerable<InputData>(trainingData, autoSchema);
// preview it (goes BOOM)
var preview = trainingDataView.Preview();
// run the experiment
var settings = new BinaryExperimentSettings();
settings.MaxExperimentTimeInSeconds = 60;
ExperimentResult<BinaryClassificationMetrics> experimentResult = mlContext.Auto()
.CreateBinaryClassificationExperiment(settings)
.Execute(trainingDataView);
}
}
public class InputData
{
public bool Label { get; set; }
public float[,] MultiDimensional { get; set; }
}
The documentation seems to indicate my setup is correct: https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.data.vectortypeattribute.-ctor?view=ml-dotnet#Microsoft_ML_Data_VectorTypeAttribute__ctor_System_Int32___
To fix my multidimension vector problem, I've tried:
- Removing the
float[,]
initializers inInputData
- Specifying the exact size with
[VectorType(3,60)]
as appropriate for each property - Leaving the
[VectorType]
attribute off altogether and using autoschema to set it. - Leaving the
[VectorType]
attribute off altogether and not using autoschema to let ML.net figure it out on its own - Adding just
[VectorType()]
, although the docs say that is for single dimension arrays.
My question now is - what is the correct way to use vectors with more than 1 dimension in the AutoML part of ML.Net? Is this even possible?