A part of the project I'm working on is determining residuals
. I'm doing this performing linear models
.
Unfortunately the packages I have found do either not meet the requirements or are glitchy.
I have tried using the following packages for my project.
lm
- Standard linear modelling function inR
- + pro's -- None
- - cons -- uses standard statistic library, single core, cannot handle out of memory calculations
fastLm
- part ofRcppArmadillo
package- + pro's -- Multicore
- - cons -- Cannot handle out of memory calculations.
biglm
- part of thebiglm
package- + pro's -- Special designed for handling out of memory calculations by splitting up the data in chunks
- - cons -- Single core
speedlm
- part of thespeedglm
package- + pro's -- Multicore, should be able to handle out of memory calculations by splitting up the data in chunks
Some problems I personally ran onto using speedlm
, otherwise this would have been the package of choice:
updateWithMoreData
seems to fail when a column contains non-numeric data- cons -- Does not have a default method to retrieve the residuals.
After googling without success, I have used the following code in attempt to find new packages, attempting different keywords but I simply cannot seem to find any appropiate packages.
find <- findFn("linear model lm", sortby="function", maxPages = 10)
format(find)
Is there any Linear model packages besides theones I mentioned above which meet the following requirements:
- Ability to use multiple CPU's to calculate linear models
- Ability to split up the dataset and
update
the linear model with chunks of the dataset - Get fitted values