1

Using lme function, I fitted a model on a large data set with near 470K observations and about 40 variables. The size of this fit (object.size(fit)) is near 300 Mb, which is not feasible to store in a server. The goal is to let a user to interactively define a newdata (which has a maximum of 500 observations) and then call predict(fit, newdata, level = 0, na.action = na.omit) to output the predicted values. The question is how I can reduce the size of the fit given a limited storage space in the server?

Have already tried a couple of ways in this post but it does not shrink the size down to what I really need.

Any thought? Thanks!


Community
  • 1
  • 1
Isaac
  • 13
  • 5

1 Answers1

3

The lme objects, as with any class, are designed to contain everything they may need for any function that has been written to be called on it. If you want to just use the bare bones you will need to pull out only what you need and reassign the class so the correct S3 method is called. To see which components you need, you would have to look at the source nlme:::predict.lme. Here is an example with the Orthodont dataset.

library(nlme)
data(Orthodont)

# Just fit a model
fm1 <- lme(distance ~ age, data = Orthodont)

# pull out the minimal components needed for prediction
min_fm1 <- list(modelStruct = fm1$modelStruct, 
                dims = fm1$dims, 
                contrasts = fm1$contrasts, 
                coefficients = fm1$coefficients, 
                groups = fm1$groups, 
                call = fm1$call,
                terms = fm1$terms)

# assign class otherwise the default predict method would be called
class(min_fm1) <- "lme"

# By dropping this like fm1$data you trim it down quite a bit
object.size(fm1)
63880 bytes
object.size(min_fm1)
22992 bytes

# make sure output identical
identical(predict(min_fm1, Orthodont, level = 0, na.action = na.omit), 
          predict(fm1, Orthodont, level = 0, na.action = na.omit))
[1] TRUE
cdeterman
  • 19,630
  • 7
  • 76
  • 100