1

I am using the AirPassengers dataset to predict a timeseries. For the model I am using, I chosen to use auto_arima to forecast the predicted values. However, it seems that the chosen order by the auto_arima is unable to fit the model. The corresponding chart is produced.

Forecasted

What can I do to get a better fit?

My code for those that want to try:

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline

from pmdarima import auto_arima

df = pd.read_csv("https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv")
df = df.rename(columns={"#Passengers":"Passengers"})
df.Month = pd.to_datetime(df.Month)
df.set_index('Month',inplace=True)

train,test=df[:-24],df[-24:]

model = auto_arima(train,trace=True,error_action='ignore', suppress_warnings=True)
model.fit(train)

forecast = model.predict(n_periods=24)
forecast = pd.DataFrame(forecast,index = test.index,columns=['Prediction'])

plt.plot(train, label='Train')
plt.plot(test, label='Valid')
plt.plot(forecast, label='Prediction')
plt.show()

from sklearn.metrics import mean_squared_error
print(mean_squared_error(test['Passengers'],forecast['Prediction']))

Thank you for reading. Any advice is appreciated.

diggledoot
  • 691
  • 8
  • 22
  • Have you tried explicitly setting the seasonality parameter `D` (presumably 12 here)? – Igor Rivin May 19 '20 at 04:24
  • @IgorRivin I did after you mentioned so. There still no change in the fit it seems. However, I explicitly statede the m to be 12 and there is a massive improvement. – diggledoot May 19 '20 at 04:28
  • @IgorRivin I have answered my own question but are there still ways to fit it better? – diggledoot May 19 '20 at 04:33

2 Answers2

0

The problem was that I did not specify the m, in this case, I assigned the value of m to be 12, denoting that it is a monthly cycle, that each data row is a month. That's how I understand it. source

Feel free to comment, I'm not entirely sure as I am new to using ARIMA.

Code:

model = auto_arima(train,m=12,trace=True,error_action='ignore', suppress_warnings=True)

Just add m=12,to denote that the data is monthly.

Result: What I want

diggledoot
  • 691
  • 8
  • 22
0

This series is not stationary, and no amount of differencing (notice that the amplitude of the variations keeps increasing) will make it so. However, transforming the data first by taking logs should do better (experiment shows that it does do better, but not what I would call well). Setting the seasonality (as I suggest in the comment by m=12, and taking logs produces this:enter image description here which is essentially perfect.

Igor Rivin
  • 4,632
  • 2
  • 23
  • 35
  • Doesn't auto_arima take into account non-stationarity and automatically finds the best values for it? source: https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/ – diggledoot May 19 '20 at 04:39
  • No, it can only fit a SARIMA model to the data you give it. It will not transform the data first.If some differenced data is stationary it will succeed, but for this data no amount of differencing (or seasonality) will make it stationary. – Igor Rivin May 19 '20 at 04:41
  • Yes, I tried it. Thank you for your time @Igor Rivin – diggledoot May 19 '20 at 04:53
  • @cswannabe You are welcome. I have to say that this auto_arima is a bit disappointing (the fact that you have to set the seasonality parameter by hand AND it takes a LONG time are both not great). – Igor Rivin May 19 '20 at 13:28