3

I am using the auto.arima from the forecast package in R to determine the optimal K-terms for fourier series.

After I do that, I want to then calculate the seasonality and plug that one seasonality variable into a multiple regression model.

Using the dataset from the forecast package, I was able to extract the optimal amount of fourier terms:

library(forecast)

##Public dataset from the forecast package
head(gas)

##Choose Optimal Amount of K-Terms
bestfit <- list(aicc=Inf)
for(i in 1:6)
{
  fit <- auto.arima(gas, xreg=fourier(gas, K=i), seasonal=FALSE)
  if(fit$aicc < bestfit$aicc)
    bestfit <- fit
  else break;
  optimal_k_value<-max(i)
  print(i)
}

##Extract Fourier Terms 
seasonality<-data.frame(fourier(gas, K=optimal_k_value))

##Convert Gas TS Data to Dataframe
gas_df <- data.frame(gas, year = trunc(time(gas)), 
                 month = month.abb[cycle(gas)])

##Extract True Seasonality by Taking Sum of Rows
seasonality$total<- rowSums(seasonality)

##Combine Seasonality to Month and Year
final_df<-cbind(gas_df, seasonality$total)

Would the seasonality$total column be considered by "seasonality variable" for later modelling or do I need to add coefficients to it?

Stedy
  • 7,359
  • 14
  • 57
  • 77
nak5120
  • 4,089
  • 4
  • 35
  • 94
  • I'm thinking I need to multiply it by the coefficients from the arima model for the seasonal portion only. – nak5120 Nov 07 '18 at 22:24

1 Answers1

1

No, seasonality$total is not the seasonality variable. To see that, note that each column of fourier(gas, K = optimal_k_value) is just a seasonal component going from -1 to 1 so that they are just sin(...) and cos(...) without any coefficients. Clearly, different seasonal components must have different coefficients, so you shouldn't just sum them up.

Side comment 1: since i is always just a single number, there is no point in using max(i), just optimal_k_value <- i is enough.

Side comment 2: I suggest to inspect

plot(resid(auto.arima(gas, xreg = fourier(gas, K = optimal_k_value), seasonal = FALSE)))

For one, there may be seasonality of lower than yearly frequency (it seems like fourier doesn't allow to consider that), although perhaps you are going to model it separately as a trend. Also, it may be a good idea to split the data to something like before and after 1970.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Would you agree that maybe the best approach to extract seasonality, would be to extract the coefficients from the arima model and then multiply it to each column in the seasonality output? There are coefficients for the columns in the seasonality output. – nak5120 Nov 09 '18 at 20:26
  • 1
    @nak5120, I'd say that's the *quickest* way to extract seasonality starting from what you've done so far. Also, that would indeed be exactly the seasonality that you are targeting with your current approach. (I guess that's the answer to your main question.) In general, however, I don't think it's the best way to extract seasonality, but that's a different, methodological question, and better suited for http://stats.stackexchange.com. – Julius Vainora Nov 09 '18 at 20:45
  • Ok that's good to know, thank you. I'll ask on stats.stackexchange.com also – nak5120 Nov 09 '18 at 21:51