4

I manually made 20 models and found out should use d=1 or D=1 for each model, but auto_arima never use difference args(even one model has no d or D at all, and all of the trials are like (1,0,1) x (0, 0, 1, 52). I checked it by setting trace=True).

I want auto_arima to do params grid search pdq=(0~3, 0~1, 0~3) and PDQs=(0~3, 0~1, 0~3, 52).

I set params as below:

    start_p=0,
    start_q=0,
    max_p=3,
    max_d=1,
    max_q=3,
    start_P=0,
    D=None,
    start_Q=0,
    max_P=2,
    max_D=1,
    max_Q=2,
    max_order=10,
    m=52,
    seasonal=True,
    stationary=False,
    information_criterion='aic',
    alpha=0.05,
    test='kpss',
    seasonal_test='ocsb',
    stepwise=True,
    n_jobs=-1,
    start_params=None,
    trend=None,
    method=None,
    transparams=True,
    maxiter=None,
    n_fits=100,
    with_intercept=True,

How to make auto_arima do grid search what I wanna do?

Roy
  • 159
  • 3
  • 12

1 Answers1

4

There are several thing you should know about pmdarima and its implementation of auto_arima. I'm playing a bit with that at the moment so I will try to answer your questions.

  1. Grid search: The implementation of auto_arima use stepwise algorithm to identify optimal parameters. It's stepwise in your params above which is set to True by default. In the API there is written:

    The stepwise algorithm can be significantly faster than fitting all hyper-parameter combinations and is less likely to over-fit the model.

    If you want to make grid search, you have to set this parameter to False.

  2. Differencing parameters: When you try this option (stepwise=False), it should try all the combinations except two params - d and D. That's because they are estimated and not included in params search. In params listed in your questions you have two tests - test and seasonal_test. These methods are used to select values for d and D respectively.

    I would recommend you to read documentation on Understanding p, d and q. You can get a bit better idea how they deal with differencing parameter estimation.

    You can also try to test these method directly (just change the value of test):

from pmdarima.arima.utils import ndiffs
ndiffs(y, test='kpss')

I don't know about how to test d params in grid search, I think both Python and R implementation do the same or similar estimation. Therefore, you can do that by yourself and run auto_arima with manually set differencing parameters and leave rest on grid search. Otherwise it will select values for both d and D automatically. The question is - how do you know each model should use d=1 and D=1 when those automatic tests say something different?

Nerxis
  • 3,452
  • 2
  • 23
  • 39
  • Thanks for your reply. What I was confused with is that based on AIC, the model should use D=1, d=1(I manually made a model by `statsmodels` and got a good AIC score), but `auto-arima` wouldn't use that params and returns the best params eg. D=0, q=0, even though AIC diff(former 800 vs later 2100) is large. Hence I am curious why `auto-arima` wouldn't use that params. – Roy Jul 18 '19 at 03:18
  • 1
    I think the reason for that is that `auto_arima` focus on differencing parameter (`d`) estimation at first (in your case selected by `kpss` test) and than it tries to auto fit the rest (`p` and `q`) choosing it by selected `information_criterion` which is AIC in your case. If you really want full grid search you will probably have to do it by yourself. Either follow what I wrote in the answer or even write it completely as it's explained [here](https://machinelearningmastery.com/how-to-grid-search-sarima-model-hyperparameters-for-time-series-forecasting-in-python/) – Nerxis Jul 18 '19 at 07:40