1

This question builds upon a previous one which was nicely answered for me here.

R: Grouped rolling window linear regression with rollapply and ddply

Wouldn't you know that the code doesn't quite work when extended to the real data rather than the example data?

I have a somewhat large dataset with the following characteristics.

str(T0_satData_reduced)
'data.frame':   45537 obs. of  5 variables:
 $ date   : POSIXct, format: "2014-11-17 08:47:35" "2014-11-17 08:47:36" "2014-11-17 08:47:37" ...
 $ trial  : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ vial   : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ O2sat  : num  95.1 95.1 95.1 95.1 95 95.1 95.1 95.2 95.1 95 ...
 $ elapsed: num  20 20 20.1 20.1 20.1 ...

The previous question dealt with the desire to apply a rolling regression of O2sat as a function of elapsed, but grouping the regressions by the factors trial and vial.

The following code is drawn from the answer to my previous question (simply modified for the complete dataset rather than the practice one)

rolled <- function(df) {
   rollapplyr(df, width = 600, function(m) { 
   coef(lm(formula = O2sat ~ elapsed, data = as.data.frame(m)))
   }, by = 60, by.column = FALSE)
 }

T0_slopes <- ddply(T0_satData_reduced, .(trial,vial), function(d) rolled(d))

However, when I run this code I get a series of errors or warnings (first two here).

Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors

I'm not sure where this error comes from as I have shown both elapsed and O2sat are numeric, so I am not regressing on factors. However, if I force them both to be numeric within the rolled function above like this.

...
coef(lm(formula = as.numeric(O2sat) ~ as.numeric(elapsed), data = as.data.frame(m)))
...

I no longer get the errors, however, I don't know why this would solve the error. Additionally, the resulting regressions appear suspect because the intercept terms seem inappropriately small.

Any thoughts on why I am getting these errors and why using as.numeric seems to eliminate the errors (if potentially still providing inappropriate regression terms)?

Thank you

Community
  • 1
  • 1
Nate Miller
  • 386
  • 5
  • 19

1 Answers1

2

rollapply passes a matrix to the function so only pass the numeric columns. Using rolled from my prior answer and the setup in that question:

do.call("rbind", by(dat[c("x", "y")], dat[c("w", "z")], rolled))

Added

Another way to do it is to perform the rollapply over the row indexes instead of over the data frame itself. In this example we have also added the conditioning variables as extra output columns:

rolli <- function(ix) {
   data.frame(coef = rollapplyr(ix, width = 6, function(ix) { 
         coef(lm(y ~ x, data = dat, subset = ix))[2]
      }, by = 3), w = dat$w[ix][1], z = dat$z[ix][1])
}
do.call("rbind", by(1:nrow(dat), dat[c("w", "z")], rolli))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Great! That worked. Is it possible to bring along the grouping variables (w and z) so they are in the output from this do.call? Otherwise it looks great! – Nate Miller Feb 04 '15 at 23:44
  • Great! Good to see another way of setting it up. I'm not clear on how to add the conditioning variables to the final dataset even with the new formulation, but I'll explore that. Thanks – Nate Miller Feb 05 '15 at 00:40
  • `dat$u[ix]` and `dat$w[ix]` give the conditioning variables. These are constant vectors all of whose elements equal `dat$u[ix][1]` and `dat$w[ix][1]` respectively. – G. Grothendieck Feb 05 '15 at 00:49
  • Yes, that makes sense. Thank you. I am just unclear where to reference the conditional variables in the code above. Is it within the rolli function and if so where? I have tried several different methods for bringing along these conditionals within the function and haven't been able to generated a final output with the regression coefficients and the conditional variables in a final table. Continuing to work on it... – Nate Miller Feb 05 '15 at 01:11
  • It really depends on what you want to do. – G. Grothendieck Feb 05 '15 at 03:04
  • Yes that is certainly true. What I would like to end up with is a data frame with columns for "w", "z", "Intercept" and "slope", one row per combination of "w" and "z". That would be useful. – Nate Miller Feb 05 '15 at 18:09
  • OK. I have modified the example to add the conditioning variables as output columns. – G. Grothendieck Feb 05 '15 at 20:52