I just try to make my first ML.NET project, that I have built before with Azure ML, Visual Interface, Python and so on, but now I wanted to do it with C#.
I was following this tutorial, but with a totally different dataset and purpose.
The dataset has a lot of extra columns, but my data model looks like the following (pointing on the index of the column in the dataset):
using Microsoft.ML.Data;
namespace ML_Net
{
public class Earthquake
{
[LoadColumn(1)]
public int geo_level_1_id { get; set; }
[LoadColumn(2)]
public int geo_level_2_id { get; set; }
[LoadColumn(3)]
public int geo_level_3_id { get; set; }
[LoadColumn(4)]
public int count_floors_pre_eq { get; set; }
[LoadColumn(5)]
public int age { get; set; }
[LoadColumn(6)]
public int area { get; set; }
[LoadColumn(7)]
public int height { get; set; }
[LoadColumn(8)]
public int count_families { get; set; }
[LoadColumn(26)]
public int has_secondary_use { get; set; }
[LoadColumn(27)]
public double square { get; set; }
[LoadColumn(39)]
public double difference { get; set; }
[LoadColumn(40)]
public int damage_grade { get; set; }
}
public class DamagePrediction
{
[ColumnName("PredictedLabel")]
public int damage_grade;
}
}
The error comes from the training function:
public static IEstimator<ITransformer> BuildAndTrainModel(IDataView trainingDataView, IEstimator<ITransformer> pipeline)
{
var trainingPipeline = pipeline
.Append(_mlContext.MulticlassClassification.Trainers
.SdcaMaximumEntropy("Label", "Features"))
.Append(_mlContext.Transforms.Conversion
.MapKeyToValue("PredictedLabel"));
_trainedModel = trainingPipeline.Fit(trainingDataView);
_predEngine = _mlContext.Model
.CreatePredictionEngine<Earthquake, DamagePrediction>(_trainedModel);
Earthquake building = new Earthquake()
{
geo_level_1_id = 1,
geo_level_2_id = 42,
geo_level_3_id = 941,
count_floors_pre_eq = 2,
age = 0,
area = 24,
height = 4,
count_families = 2,
has_secondary_use = 0,
square = 4.898979485566356,
difference = 0.8989794855663558
};
var prediction = _predEngine.Predict(building);
Console.WriteLine($"=============== Single Prediction just-trained-model - Result: {prediction.damage_grade} ===============");
return trainingPipeline;
}
Which says:
Exception thrown: 'System.ArgumentOutOfRangeException' in Microsoft.ML.Data.dll An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in Microsoft.ML.Data.dll Schema mismatch for feature column 'Features': expected Vector < Single >, got Vector < Int32 >
I cannot seem to understand what the problem is, can you help me please with some ideas?
I work with only numerical data which is why I didn't add transformation or featurization, but maybe normalization could help.. As I have some floats..
Thank you in advance for all the ideas!