1

I am new to R and I am using it to analyse time series data (I am also new to this).

I have quarterly data for 15 years and I am interested in exploring the interplay between drinking and smoking rates in young people - treating smoking as the outcome variable. I was advised to use the gls command in the nlme package as this would allow me to include AR and MA terms. I know I could use more complex approaches like ARIMAX but as a first step, I would like to use simpler models.

After loading the data, specify the time series

data.ts = ts(data=data$smoke, frequency=4, start=c(data[1, "Year"], data[1, "Quarter"]))
data.ts.dec = decompose(data.ts)

After decomposing the data and some tests (KPSS and ADF test), it is clear that the data are not stationary so I differenced the data:

diff_dv<-diff(data$smoke, difference=1)
plot.ts(diff_dv, main="differenced")
data.diff.ts = ts(diff_dv, frequency=4, start=c(hse[1, "Year"], hse[1, "Quarter"]))

The ACF and PACF plots suggest AR(2) should also be included so I set up the model as:

mod.gls = gls(diff_dv ~ drink+time , data = data, 
              correlation=corARMA(p=2), method="ML")

However, when I run this command I get the following:

"Error in model.frame.default: variable lengths differ".

I understand from previous posts that this is due to the differencing and the fact that the diff_dv is now shorter. I have attempted fixing this by modifying the code but neither approach works:

mod.gls = gls(diff_dv ~ drink+time , data = data[1:(length(data)-1), ], 
             correlation=corARMA(p=2), method="ML")

mod.gls = gls(I(c(diff(smoke), NA)) ~ drink+time+as.factor(quarterly) , data = data,
              correlation=corARMA(p=2), method="ML")

Can anyone help with this? Is there a workaround which would allow me to run the -gls- command or is there an alternative approach which would be equivalent to the -gls- command?

As a side question, is it OK to include time as I do - a variable with values 1 to 60? A similar question is for the quarters which I included as dummies to adjust for possible seasonality - is this OK?

Your help is greatly appreciated!

Phil
  • 7,287
  • 3
  • 36
  • 66
Francesca
  • 63
  • 1
  • 7

1 Answers1

0

Specify na.action = na.omit or na.action = na.exclude to omit the rows with NA's. Here is an example using the built-in Ovary data set. See ?na.fail for info on the differences between these two.

Ovary2 <- transform(Ovary, dfoll = c(NA, diff(follicles)))
gls(dfoll ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary2,
           correlation = corAR1(form = ~ 1 | Mare), na.action = na.exclude)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks, this is really helpful. I will add the code as you suggested. I have 2 follow-up questions - what do these 2 bit of code do? sin(2*pi*Time) + cos(2*pi*Time) corAR1(form = ~ 1 | Mare) Please feel free to point me to any relevant documentation. – Francesca Aug 11 '21 at 08:32
  • This is just a modification of the example on the ?gls page to add diff and na.action. – G. Grothendieck Aug 11 '21 at 15:46