0

I have a very large data set that I extract from a data warehouse. To download the data set to the box where I want to run lme4 takes a long time. I would like to know if I could process the data into a covariance matrix, download that data (which is much smaller), and use that as the data input to lme4. I have done something similar to this for multiple regression models using SAS, and am hoping I can create this type of input for lme4.

Thanks.

  • I don't think so. To be honest, it doesn't seem to me that the variance-covariance matrix would actually contain enough information to run the mixed model ... ? I would actually suggest that you repost to the `r-sig-mixed-models@r-project.org`, where there are more expert eyes watching. – Ben Bolker Feb 06 '13 at 16:14

1 Answers1

1

I don't know of any way to use the observed covariance matrix to fit an lmer model. But if the goal is to reduce data set size in order to speed up analysis, there may be simpler approaches. For example, if you don't need the conditional modes of the random effects, and you have a very large sample size, then you might try fitting the model to progressively larger subsets of the data until the estimates of the fixed effects and the covariance matrix of the random effects 'stabilize'. This approach has worked well in my experience, and has been discussed by others:

http://andrewgelman.com/2012/04/hierarchicalmultilevel-modeling-with-big-data/

Here's another quotation:

"Related to the “multiple model” approach are simple approximations that speed the computations. Computers are getting faster and faster—but models are getting more and more complicated! And so these general tricks might remain important. A simple and general trick is to break the data into subsets and analyze each subset separately. For example, break the 85 counties of radon data randomly into three sets of 30, 30, and 25 counties, and analyze each set separately." Gelman and Hill (2007), p.547.

Steve Walker
  • 111
  • 1
  • 2