1

I am trying to forecast sales of products for more than 2000 products. In my data, I resample each products' sales data into weekly sales data and each product time series data behaves differently. Seasonal patterns are not obvious and that is why I decided to use auto_arima function in Python for two different conditions which assumes there is seasonality and there is not. For the seasonality case, I assumed period is 52 weeks because peaks in seasonal decomposition of data was observed same after 1 year period. Now, my question is that is it good practice to try two different conditions for auto arima function and captures the best model(ARIMA or SARIMAX) that gives lowest mse? Also, auto_arima function works very slow while it tries to find the order of sarimax model. I wil be glad to hear any advice for speeding up and my first question.

Thanks.

df_models = pd.DataFrame()
df_model_results = pd.DataFrame()

for k in range(len(df_stationary_items)):
 
 test_df = grouped_df.get_group(df_stationary_items[k])
 X = test_df['Quantity'].values
 train, test = X[0:len(X)-1], X[len(X)-1:]
 try:
     stepwise_fit = auto_arima(test_df['Quantity'], start_p=0, start_q=0,
                           max_p=6, max_q=6,m=52,
                           start_P=0,seasonal=True,alpha=0.05,
                           d=None,D=None, max_D=1 ,trace=True,n_jobs=-1,
                           error_action='ignore',stepwise=True)
     df_models =df_models.append({"ItemNo": df_stationary_items[k], "Order": stepwise_fit.order,"SeasonalOrder": stepwise_fit.seasonal_order},ignore_index=True)
        
     model = SARIMAX(train, order=stepwise_fit.order,seasonal_order=stepwise_fit.seasonal_order)
     model_fit = model.fit()
     predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1, dynamic=False)
     rmse= sqrt(mean_squared_error(test, predictions))
     df_model_results =df_model_results.append({"ItemNo": df_stationary_items[k],"StationaryP":result[1] ,"Order": stepwise_fit.order,"SeasonalOrder": stepwise_fit.seasonal_order,"Predicted":predictions[0],"Expected":test[0],"STDEV":test_df['Quantity'].std(),"rmse":rmse},ignore_index=True)
 except:
     continue
     
df_test_results_nonseasonal = pd.DataFrame()
df_model_results_nonseasonal = pd.DataFrame()
df_models_nonseasonal=pd.DataFrame()

for m in range(len(df_stationary_items)):
    test_df_nonseasonal = grouped_df.get_group(df_stationary_items[m])    
    X_non = test_df_nonseasonal['Quantity'].values
    train_non, test_non = X_non[0:len(X_non)-1], X_non[len(X_non)-1:]
    try:
    

        stepwise_nonseasonal = auto_arima(test_df_nonseasonal['Quantity'],error_action='ignore',seasonal=False)
        df_models_nonseasonal =df_models_nonseasonal.append({"ItemNo": df_stationary_items[m], "Order": stepwise_nonseasonal.order},ignore_index=True)
        model_non = ARIMA(train_non, order=stepwise_nonseasonal.order)
        model_fit_non = model_non.fit()
        predictions_non = model_fit_non.predict(start=len(train_non), end=len(train_non)+len(test_non)-1, dynamic=False)
        rmse_non= sqrt(mean_squared_error(test_non, predictions_non))
        df_model_results_nonseasonal =df_model_results_nonseasonal.append({"ItemNo": df_stationary_items[m],"StationaryP":result_non[1] ,"Order": stepwise_nonseasonal.order,"Predicted":predictions_non[0],"Expected":test_non[0],"STDEV":test_df_nonseasonal['Quantity'].std(),"rmse":rmse_non},ignore_index=True)
    except:
       continue

Any advice for forecasting of multiple products would be great!

  • Did you search in https://stats.stackexchange.com/ for best practices? – RichieV Sep 08 '20 at 07:51
  • This question talks about a similar method, but using AIC instead of MSE https://stats.stackexchange.com/q/78949/275865 – RichieV Sep 08 '20 at 07:57
  • Thanks for your responses. For my modules, auto arima is already adjusted to find the best parameters considered AIC, but when I compare the model results (arima or sarimax), I check mse after making predictions and choose the model that gives lowest mse. For best practices, thanks for the link I will definitely check it. – chrismoltisanti Sep 08 '20 at 21:41
  • 1
    As they explain in that post, most of the time that leads to overfitting... the "convention" is to define the best model according to experience with the data (in this case perhaps it would be SARIMAX, since nearly all sales data is seasonal) and stick with it even if you find another model that happens to have better results with some test sample, this other model will most likely loose in the long run, considering a real world production environment – RichieV Sep 08 '20 at 22:07
  • 1
    Another source or variation is the substitution or support interactions among products, try to find positive and negative correlations and possibly group the best candidates before fitting regressions – RichieV Sep 08 '20 at 22:12
  • Actually, after my this post, I tried by adding fourier featurizer to SARIMAX and used it as exogenous variables (I will add holiday data into it later). Since grid search for Sarimax was not useful because it really takes too much time and I am not sure if it was a good practice. Now, I think that adding seasonality as exogenous variables to SARIMAX model would work for me. – chrismoltisanti Sep 08 '20 at 22:47
  • But I did not understand why I should insist on sticking to SARIMAX, in some items it is really hard to observe seasonality and they follow a cyclist pattern which is better for Arima. So,ARIMA model works better on these items. Anyway, it is my first experience with forecasting so I will take your advice and stick on it :Dthanks – chrismoltisanti Sep 08 '20 at 22:50
  • SARIMAX is one of the most complex auto-regressive models, it compounds several transformations including seasonality (S) and cyclicality (I)... keep in mind my comments come from heuristic suggestions and generally work, if you find a different model that works with some product and you want to use it, go ahead, just keep in mind that you risk following an over-fitted model... it can be a good idea to keep track of a suite of models going forward so you see which performs better – RichieV Sep 08 '20 at 23:30
  • Great, I understand. Thanks a lot! – chrismoltisanti Sep 09 '20 at 06:29

0 Answers0