0

I try to use R on my laptop to run the HLM regression of a large dataset of about 2GB(500,000 lines), and the format of this dataset is spss(.sav).Sorry I could not share the data, as required by my professor, but I would try my best to provide as many details as possible. Here is some codes of mine.

data<- spss.get("Stanford Dataset .sav")
result1 <- lmer(SCIENCE ~ GDP + Individualism+ Gender+ Gender*GDP+
              Individualism*Gender + (1+Gender|Country/School),data = data)
summary(result1)

And the problem is, it takes me about 5 minutes to run a regression and print the summary. Is there any faster way to deal with this large memory model?

Actually I have tried some of the following methods:

1) use data.table in data.table package. data <- data.table(data) before run the regression . Howevern I wait for the results with more mins than before.

2) use as.big.matrix in package bigmemory, and it shows the error:

Error in list2env(data) : first argument must be a named list

Seems that the matrix is not working in the function lmer.

So I am really lack of thoughts now, any relative idea would be helpful.

Thanks a lot !

exteral
  • 991
  • 2
  • 12
  • 33
  • Maybe try running the model on a small sample of your data and then passing the parameters to the full model to get it started. But I don't think 5 minutes is unexpected for a big dataset and reasonably complex model like this, and things like `data.table` probably won't help as I think most of the slowness comes from `lmer` building and then operating on model matrices. – Marius May 17 '17 at 02:56
  • Thx @Marius, `data.table` takes me 1 more min to print out the result LOL. And could you be more detailed on running a sample model first and then passing the parameters to the full model? I just do know how this would shorten the time I might use on `lmer` or `glmer` – exteral May 17 '17 at 03:11

0 Answers0