0

I'm working on a time series forescast model with pmdarima.

My time series is short, but not so bad behaved. The following code gives an error on sklearn\utils\validation.py

from pmdarima import auto_arima
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
import datetime
import pandas as pd

datelist = pd.date_range('2018-01-01', periods=24, freq='MS')

sales = [26.000000,27.100000,26.000000,28.014286,28.057143,
         30.128571,39.800000,33.000000,37.971429,45.914286,
         37.942857,33.885714,36.285714,34.971429,40.042857,
         27.157143,30.685714,35.585714,43.400000,51.357143,
         45.628571,49.942857,42.028571,52.714286]


df = pd.DataFrame(data=sales,index=datelist,columns=['sales'])

observations = df['sales']
size = df['sales'].size
shape = df['sales'].shape
maxdate = max(df.index).strftime("%Y-%m-%d")
mindate = min(df.index).strftime("%Y-%m-%d")


asc = seasonal_decompose(df, model='add')

if asc.seasonal[asc.seasonal.notnull()].size == df['sales'].size:
    seasonality = True
else:
    seasonality = False

# Check Stationarity
aftest = adfuller(df['sales'])

if aftest[1] <= 0.05:
    stationarity = True
else:
    stationarity = False

results = auto_arima(observations,
                     seasonal=seasonality,
                     stationary=stationarity,
                     m=12,
                     error_action="ignore")
~\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    584                              " minimum of %d is required%s."
    585                              % (n_samples, array.shape, ensure_min_samples,
--> 586                                 context))
    587 
    588     if ensure_min_features > 0 and array.ndim == 2:

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

However, if I change the first value of the sales series from 26 to 30 it works.

What could be wrong here?

seimetz
  • 171
  • 7

1 Answers1

0
  1. Your example is not reproducible as currently seasonality and stationarity are not defined in the global scope. That leads to auto_arima throwing an error of the form

    NameError: name 'seasonality' is not defined

  2. You have only few observations, so try explicitly setting the min/max order values for the different ARIMA processes. IMO, this is generally good practice. In your case we can do

    fit = auto_arima(
        observations,
        start_p = 0, start_q = 0, start_P = 0, start_Q = 0,
        max_p = 3, max_q = 3, max_P = 3, max_Q = 3,
        D = 1, max_D = 2, m = 12,
        seasonal = True,
        error_action = 'ignore')
    

    Here we consider processes up to MA(3) and AR(3), as well as SMA(3) and SAR(3).

  3. Let's visualise the original time series data including the forecast

    n_ahead = 10
    preds, conf_int = fit.predict(n_periods = n_ahead, return_conf_int = True)
    xrange = pd.date_range(min(datelist), periods = 24 + n_ahead, freq = 'MS')
    
    import matplotlib.pyplot as plt
    import matplotlib.dates as dates
    
    fig = plt.figure()
    plt.plot(xrange[:df.shape[0]], df["sales"])
    plt.plot(xrange[df.shape[0]:], preds)
    plt.fill_between(
        xrange[df.shape[0]:],
        conf_int[:, 0], conf_int[:, 1],
        alpha = 0.1, color = 'b')
    plt.show()
    

    enter image description here

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68