0

What I want to achieve.

My data is in the following format. Daily Natural Gas price settlements. Column A : individual rows from December 2018 - December 2026 Column B : Opening price of gas from December 2018 - December 2026 Column C : Previous price of gas from December 2018 - December 2026.

I want to use gradient boosting algorithm in Python to predict prices beyond December 2026 but I think typically the output of the algorithm returns an array of some sort after implement D Matrix and subsequent commands and subsequently run few more steps to come up with scatter plot.

Question.

Using the array (generated data) I am lost on what should I do next to predict December 2026 and beyond because my scatter plot might just take training and test data set and make a prediction but what about future years which are of my interest.

Community
  • 1
  • 1

1 Answers1

1

If you don't have the data for years beyond 2026 then you will have no way of knowing how well your models perform for those years (this is tautological.)

I think one thing you can do in that case is weight your train, validate & test splits based on a datetime index of your data. By preventing your model from "seeing the future" in training, you can get a decent idea of how predictable your target is, measuring the model's performance on "future" holdout data after you train. Presumably, as the maintainer of the model you would then update your predictions (and iterate on training) as new years of data become available.

I guess I should also point out that you haven't shared a compelling reason why xgboost and only xgboost will do for this problem. For models that may go into production, I would encourage you to run some regressions or cheaper algorithms and compare performance. If you haven't checked out some of the model selection tools out there, I think it would be worth your while! An easy one to get started with is gridsearch: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

Charles Landau
  • 4,187
  • 1
  • 8
  • 24
  • Thanks for detailed explanation. The reason I am pursuing XG boosting algorithm because I read a presentation where it had indicated lowest MAPE of all the algorithms tested. If you are saying I wouldn't have data beyond 2026 it would be difficult to predict but analogously speaking how can a weather be predicted? I am following the same mechanism. Predicting a future opening or last price based on previous months. – Siddharth Kulkarni Nov 13 '18 at 15:46
  • 1
    I'm not saying you can't predict, I'm saying you can't test the real accuracy of predictions without real data :D @SiddharthKulkarni If the response addressed your question please mark it as answered. – Charles Landau Nov 13 '18 at 15:49