3

I can make predictions on my sample data but when I try to make out of sample predictions I get an error message saying:

C:\Users\YannickLECROART\Miniconda3\envs\machinelearning\lib\site-packages\statsmodels\tsa\base\tsa_model.py:531: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  ValueWarning)
<statsmodels.tsa.statespace.mlemodel.PredictionResultsWrapper object at 0x000001F303476A58>

You can find the dataset I use by clicking on the link below.

https://ufile.io/an2cx

import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib

matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'

First of all, I extract the dataset from the Excel file.

df = pd.read_excel("C:\\Users\\YannickLECROART\\Desktop\\comedie.xlsx", index_col=0)

Then, I convert the dataframe into a time series.

df.index = pd.to_datetime(df.index)

I sort the data so that I only get values between 9 and 10 in the morning.

idx_9 = df.between_time('09:00', '09:59')

I configure the SARIMAX parameters

mod = sm.tsa.statespace.SARIMAX(idx_0,
                                order=(1, 1, 1),
                                seasonal_order=(1, 1, 0, 12),
                                enforce_stationarity=False,
                                enforce_invertibility=False)

results = mod.fit()

Then I make predictions on my sample data to compare it with the observed values

pred = results.get_prediction(start=1, dynamic=False)
pred_ci = pred.conf_int()

ax = idx_9['2017':].plot(label='Observations')
pred.predicted_mean.plot(ax=ax, label='Prédictions', alpha=.7, figsize=(14, 7))

ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)

ax.set_xlabel('Date')
ax.set_ylabel('Places occupées')
plt.legend()

plt.show()

This is how the plot looks like

enter image description here

Finally I want to make out of sample predictions in order to plot it after the observations and this is where I get the error message:

pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()

ax = idx_0.plot(label='Observations', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Prédictions')
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Places occupées')
plt.legend()
plt.show()

Could you tell me why am I getting this error message and how I can fix it? Thanks in advance.

Yannick
  • 399
  • 3
  • 10
  • 16

1 Answers1

4

To perform forecasting using dates, your index must be a DatetimeIndex or PeriodIndex, with an associated frequency, like months, daily, minutes, etc.

In your case, I guess you have data for a few minutes each day, which I don't think corresponds to a Pandas frequency. For this reason, it does perform the forecasting, it just doesn't know how to assign new dates to the forecasts.

If you know how to construct a the date index for the forecast period, then you can do so and pass it as an index argument. e.g.

fcast_index = pd.to_datetime(['2017-04-02 9:00am', '2017-04-02 9:00am', ...])
pred_uc = results.get_forecast(steps=100, index=fcast_index)
cfulton
  • 2,855
  • 2
  • 14
  • 13