2

I am working with a data set of 205 observations and 2 explanatory variables : site (two levels) and strain ( 21 levels ). I am trying to fit a mixed model to the data when strain is the fixed variable. the experiment is not balanced ( i.e. there are different numbers of repetitions in each strain*site combination). And I use Rstudio 3.0.2; windows 8.

I first tried to run the following -

 lme(dist_OFT_min1~strain , 
     random=~strain|site,data=data.frame(d1),
     method='REML')

Yet I got the following error :

Error in logLik.lmeStructInt(lmeSt, lmePars) : 'Calloc' could not allocate memory (730512784 of 8 bytes)
In addition: Warning messages:
1: In logLik.lmeStructInt(lmeSt, lmePars) : Reached total allocation of 3873Mb: see help(memory.size)
2: In logLik.lmeStructInt(lmeSt, lmePars) : Reached total allocation of 3873Mb: see help(memory.size)

I tried to increase the memory limit to 8000 . But then when I rerun the previous line Rstudio (and the pc in total ) get very slow and unresponsive, I let it run for over 30 minutes for no result.

I tried to fit the model on a cut-down data set - containing only 5 of the strains :

 lme(dist_OFT_min1~strain , 
     random=~strain|site,
     data=data.frame(d.test),
     method='REML',
     control = lmeControl(msMaxIter =90,maxIter=90)) 

and another kind of trouble popped :

Error in lme.formula(dist_OFT_min1 ~ strain, random = ~strain | site, :
nlminb problem, convergence error code = 1 message = iteration limit reached without convergence (10)

Can anyone please help me understand what kind of trouble I might be facing with my data ? How can I check for any ? (I'm still a beginner). Here I share similar data I simulated on my own, Yet I faced the same trouble as with the original data ( lab equals site ):

    library(nlme)
    u.strain=letters[1:10]
    u.lab=paste('L',letters[1:5],sep='')
    test.dat=data.frame(strain=sort(rep(u.strain,5)),lab=rep(u.lab,10),test=rep(7,50))
    test.dat[order(test.dat$strain),'test']=test.dat[order(test.dat$strain),'test']+sort(rep(rnorm(n=10,mean=0,sd=3),5)) #strain effect
    test.dat[order(test.dat$lab),'test']=test.dat[order(test.dat$lab),'test']+sort(rep(rnorm(n=5,mean=0,sd=4),10)) #lab effect
    test.dat[,'test']=test.dat[,'test']+ rnorm(n=50,mean=0,sd=5) # interaction
    test.dat=rbind(test.dat,test.dat,test.dat,test.dat,test.dat)
    test.dat[,'test']=test.dat[,'test']+rnorm(n=250,0,sd=2.5)

I figured that the model I actually need is as follows :

    lme(teat~strain , random=~strain+strain:lab|lab,
             data=data.frame(test.dat),method='REML')

Still the same trouble popped.

p.s. I mainly seek to estimate the variance of the interaction strain:lab. Any tips ?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
ImanJ
  • 23
  • 5
  • Parts of this might below on crossvalidated, BUT, in my experience such a small dataset could never need that much memory in lme. Can you share the data structure, or the data themselves? The convergence issue may be more related to the model structure. One issue may be that there's so little data, and only two sites. Have you tried fitting the model without the random effects, using lm or gls? – Jesse Anderson Mar 08 '14 at 18:47
  • 2
    Your problem is that `strain|site` is fitting a full (unstructured) variance-covariance matrix -- it has to fit 22*21/2 = 231 random effects. You might want `1|strain:site`, which is almost (but not quite) the same model. I'm still a little surprised you have memory problems. Also, as @blindJesse says, fitting a random effect using a grouping variable with only 2 levels is probably not going to work (see http://glmm.wikidot.com/faq for more information) – Ben Bolker Mar 08 '14 at 20:16
  • A model without the random effects is not the model I need. I tried the syntax @BenBolker suggested , Yet I got error message. I tried to form my own ideal data yet it did not work either. I will share the simulation below. – ImanJ Mar 09 '14 at 09:22

1 Answers1

2

Define interaction explicitly:

test.dat$strainlab <- with(test.dat,interaction(strain,lab))

Fit site (lab) as fixed effect (makes much more sense if you have only two sites), strain-by-site interaction as random:

m1 <- lme(test~strain+lab , random=~1|strainlab,
    data=data.frame(test.dat),method='REML')
VarCorr(m1)
## strainlab = pdLogChol(1) 
##             Variance  StdDev
## (Intercept) 29.286446 5.411695
## Residual     5.398621 2.323493

Approximate confidence intervals:

intervals(m1,which="var-cov")
## Approximate 95% confidence intervals
## 
##  Random Effects:
##   Level: strainlab 
##                    lower     est.    upper
## sd((Intercept)) 4.258995 5.411695 6.876374
## 
##  Within-group standard error:
##    lower     est.    upper 
## 2.106594 2.323493 2.562725 
##

Alternatively, fit lab and strain-by-lab interaction as random:

m2 <- lme(test~strain , random=~1|lab/strain,
data=data.frame(test.dat),method='REML')
VarCorr(m2)
##             Variance     StdDev  
## lab =       pdLogChol(1)         
## (Intercept) 18.608568    4.313765
## strain =    pdLogChol(1)         
## (Intercept) 29.286446    5.411695
## Residual     5.398621    2.323493
## 
intervals(m2,which="var-cov")
## Approximate 95% confidence intervals
## 
##  Random Effects:
##   Level: lab 
##                    lower     est.    upper
## sd((Intercept)) 1.925088 4.313765 9.666346
##   Level: strain 
##                    lower     est.    upper
## sd((Intercept)) 4.259026 5.411695 6.876324
## 
##  Within-group standard error:
##    lower     est.    upper 
## 2.106600 2.323493 2.562717 
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you @BenBolker , m2 seems to be what I need. – ImanJ Mar 10 '14 at 05:15
  • Just to be sure I got it right - m2 is an equivalent of of the famous 2-way ANOVA mixed effects ? and what is the difference between the two sd((Intercept)) lines that appear in the last part ? – ImanJ Mar 10 '14 at 05:25
  • not quite. strain is fixed, strain x site is random. the first sd((Intercept)) is among-lab, the second sd((Intercept)) is among-strain-within-lab – Ben Bolker Mar 10 '14 at 13:30