0

I am fairly new in ML.NET. Right now, I just simply copy and then paste the code from here ML.NET Tutorial Taxi Fare. But, instead of using the given CSV files, I am using the BMW.DE historical stock prices from 1996 to date. I got it from here Finance Yahoo.

My goal is to predict the "Open" value on the next day.

A sample data in my BWM.DE.csv file:

    Date,Open
    12/19/1996,20.0
    12/20/1996,20.3
    12/23/1996,20.6
    12/27/1996,20.8
    12/30/1996,20.9
    1/2/1997,20.7
    1/3/1997,20.8
    1/6/1997,20.9
    1/7/1997,20.6

My BMW Class

    public class BmwOpenClass
    {
        [LoadColumn(0)]
        public string Date;

        [LoadColumn(1)]
        public float Open;
    }

    public class PredictedOpen
    {
        [ColumnName("Score")]
        public float Open;
    }

And this is how I trained it

public static ITransformer Train(MLContext mlContext, string dataPath)
        {
            IDataView dataView = mlContext.Data.LoadFromTextFile<BmwOpenClass>(dataPath, hasHeader: true, separatorChar: ',');

            var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: "Open")
                .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "DateEncoded", inputColumnName: "Date"))
                .Append(mlContext.Transforms.Concatenate("Features", "DateEncoded"))
                .Append(mlContext.Regression.Trainers.FastTree());

            var model = pipeline.Fit(dataView);

            SaveModelAsFile(mlContext, model);

            return model;
        }

Now, when I tried to evaluate the model(I have a separate CSV file for evaluation purposes) I'm getting a Model quality of:

R2 Score: -2.57

RMS Loss: 59.94

I'm pretty sure the -2.57 is not normal, because per the documentation, R2 should be closer to 1 to tell the model is fine.

2 Answers2

2

I am not sure using OneHotEncoding is a good idea for your Date column, as is. The whole Date day/month/year is not a categorical value. It would be better if the day, month, year get split in separate columns, than you can encode them each. Also, just the Date is not a very good feature, that's why the model might not be a good one. Try to augment your dataset by adding a column for day of the week, something around whether it is a holiday or not and it should do slightly better. Ultimately for a better model you need more features...

amy8374
  • 1,450
  • 3
  • 17
  • 26
0

I am not expert either, but R2 Score can be negative when the model fitted is worse than the average fitted model. From https://www.geeksforgeeks.org/ml-r-squared-in-regression-analysis/

It means that your model is not trained well enough.

It might help to have a look at example here https://github.com/Schentrup-Software/Automatic-Stock-Trader

walter33
  • 776
  • 5
  • 12