I have a time-series data from 2016 to 2021, how could I backcast to get the data from 2010 to 2015 using ARIMA in Python? COuld you guys give me some sample Python code? Thank you very much
2 Answers
The only possibiliy I see here is to simply inverse your time series. That means the last observations becomes the first, the second last becomes the second and so on. You then have a series from 2021 to 2016.
You can do that by:
df = df.reindex(index=df.index[::-1])
You can then train an ARIMA model on this data and predict the "next" five years from 2015 to 2010. Remember that the first prediction will be for 2015-12-31, so you need to inverse this again to have the series from 2010 to 2015.
Keep in mind that ARIMA the predictions will be very, very bad, since your forecasts will be based on forecasts and so on. ARIMA is not made for predictions on such long time frames, so the results will be useless anyway, I guess. It is very likely that the predicitons will become a straight line after 30 or 40 predicions.And you can only use the autoregression part in such a case, since the order of the moving average model will limit the amount of steps you can forecast into the future.

- 808
- 1
- 3
- 9
-
Many thanks for your answer. Could you give me a hint which method is better in this case instead of ARIMA? – PPM Jan 19 '22 at 12:35
-
The problem is that you need to forecast very far into the future. You have to keep in mind that a prediction is based on the last observations before. In ARIMA case it should not be more then 40 observations, other algorithms like LSTM may be able to handle more. It depends on the data and the algorithm, of course. In your case you try to predict hundreds of steps into the future. The predictions for 2013 for example will be based on predictions that were themselves based on predictions and so on. I do not think that there is a good method for this case. – Arne Decker Jan 19 '22 at 12:42
-
Nope, in my time series, a year is a data point, so I have 6 data points from 2016 to 2021 only – PPM Jan 19 '22 at 12:46
-
1Okay, that results in another problem. You have only six data points to train your model with, that will not be enought to create a proper model. In that case I would suggest you try to do it manually. Machine learning requires more data. – Arne Decker Jan 19 '22 at 14:01
Forecasting from an inversed timeseries would be the solution if you had more data.
However, only having 6 observations is problematic. Creating a forecasting (or backcasting) model requires using some of the observations to train the model and others to validate it. If you train with 4 observations then you only have 2 observations for validation. Is a model good if it forecasts those two well or did you just get lucky? Is it bad if it forecasts one observation well and the other poorly? If you increase the validation set to 3 observations, you get more confidence on whether the model is good or bad but creating the model (with only 3 observations) gets even harder than before.
Like others have stated, regardless of what machine learning model you choose, the results are likely to be poor with so little data. If you had the monthly data it might be more fruitful.
If you can't get the monthly data, since you are backcasting into the past, it might be better to estimate the values manually based on some related variables that you have data of (if any). E.g. if your timeseries is about a company's sales then maybe you could estimate based on the company's annual revenue (or company size, or something else) if you can get the historical data of that variable. This is not precise but can still be more precise than what ARIMA or similar methods would give with the data you have.

- 3
- 2