I've created a relatively simple console app to predict future expenses based on historical expenditure data.
I've created it from scratch by following different tutorials and GitHub examples.
Eventually things seemed to be working, but one issue I've noticed was that if I use more than one example of sample data against the model, it used to return exactly same prediction value for all examples.
I figured there must be something not quite right with my data processing and transformation pipeline. So I've decided to feed exactly the same dataset to ML Model Builder in Visual Studio to see what it will come up with.
Sure enough, I was using 'wrong' algorithm and also some data transformation adjustments were needed as well.
Now my predictions seem to be much more reasonable and have a much better RSquared result, but it still returns the same value for majority of predictions, like so:
Expense | Prediction |
---|---|
Food | 75 |
Bills | 165 |
Parking | 75 |
Internet | 75 |
Phone | 75 |
Garden | 85 |
Insurance | 75 |
As you can see, most of values are 75, which simply cannot be true in real life.
However, when I run the same examples of sample data against ML Model Builder's generated model, not only it returns different values per expense type, but much more believable values as well!
I've closely compared the code generated by ML Model Builder with the one I've written and everything matches. Dataset is exactly the same, pipeline and algorithm is identical as well and so are the training options. Input and output models are the same. Prediction engine is also created the same way.
The only difference is my code saves trained model into .zip file and ML Model Builder's model seems to be loaded from .mlnet file. But that shouldn't matter as even me running in-memory tests still yields the same results.
Where should I be looking for a possible issue? I could use ML Model Builder's generated model, of course, but that would not be ideal.
EDIT1: I've tried re-running ML Model Builder's auto generated code to re-train its previously created model via UI:
var data = LoadIDataViewFromFile(mlContext, inputDataFilePath, separatorChar, hasHeader);
var model = RetrainModel(mlContext, data);
SaveModel(mlContext, model, data, outputModelPath);
As soon as that's done - the same issue of repeating values comes back.
If I run the UI wizard again using exactly the same CSV file - the previously used test inputs shows expected (different) values again.
It feels like there's something extra being used/added to the model, when it's done via UI?