I try to use R on my laptop to run the HLM regression of a large dataset of about 2GB(500,000 lines), and the format of this dataset is spss(.sav).Sorry I could not share the data, as required by my professor, but I would try my best to provide as many details as possible. Here is some codes of mine.
data<- spss.get("Stanford Dataset .sav")
result1 <- lmer(SCIENCE ~ GDP + Individualism+ Gender+ Gender*GDP+
Individualism*Gender + (1+Gender|Country/School),data = data)
summary(result1)
And the problem is, it takes me about 5 minutes to run a regression and print the summary
. Is there any faster way to deal with this large memory model?
Actually I have tried some of the following methods:
1) use data.table
in data.table
package. data <- data.table(data)
before run the regression . Howevern I wait for the results with more mins than before.
2) use as.big.matrix
in package bigmemory
, and it shows the error:
Error in list2env(data) : first argument must be a named list
Seems that the matrix is not working in the function lmer
.
So I am really lack of thoughts now, any relative idea would be helpful.
Thanks a lot !