Using multiple CPU's, What packages can I use to calculate linear models?

Question

A part of the project I'm working on is determining residuals. I'm doing this performing linear models.
Unfortunately the packages I have found do either not meet the requirements or are glitchy.

I have tried using the following packages for my project.

lm - Standard linear modelling function in R
fastLm - part of RcppArmadillo package
biglm - part of the biglm package
speedlm - part of the speedglm package

Some problems I personally ran onto using speedlm, otherwise this would have been the package of choice:

updateWithMoreData seems to fail when a column contains non-numeric data
cons -- Does not have a default method to retrieve the residuals.

After googling without success, I have used the following code in attempt to find new packages, attempting different keywords but I simply cannot seem to find any appropiate packages.

find <- findFn("linear model lm", sortby="function", maxPages = 10)
format(find)

Is there any Linear model packages besides theones I mentioned above which meet the following requirements:

Ability to use multiple CPU's to calculate linear models
Ability to split up the dataset and update the linear model with chunks of the dataset
Get fitted values

@The person who downvoted, would you mind telling me how I can improve this question? — Bas, Oct 21 '15 at 09:06
[H2O](http://h2o.ai/product/algorithms/) has a GLM that can handle this. But that's not in R, though you can run everything from R. — phiver, Oct 21 '15 at 09:09
You can't. It's off-topic. And of course your pros/cons are very subjective. E.g., if you have a way to get fitted values you don't need a `residuals` method. — Roland, Oct 21 '15 at 09:11
@Roland I have edited the pros/cons. And you are right, if I have a way to get the fitted values I don't need a `residuals` method. — Bas, Oct 21 '15 at 09:19
Revolution R's edition (even the community one) uses multiple cores, vectorized SIMD (SSE/AVX) CPU operations *and* can process more data that can fit in memory. `lm` may end up being much faster than any other option simply because it uses vectorized operations. I'se seen the `svd` command perform 7 times faster — Panagiotis Kanavos, Oct 21 '15 at 09:28
@PanagiotisKanavos - the proof is in the pudding, and Revolution's R may be prove a bit faster due to the use of another BLAS for the vector operations, but the advantage in lm probably won't be produced by multicore operations; see the comments in http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html and evaluate for yourself. — russellpierce, Oct 21 '15 at 09:34
I have, 7x faster `svd` than plain R on a 2-year old desktop i7 running Windows 7, and all cores working instead of 1 (which means hyperthreading is also used). The proof is in the pudding indeed. Besides, you link to an *ancient* page, current CPUs have more and wider SIMD commands — Panagiotis Kanavos, Oct 21 '15 at 09:36
I'll have to give it a try again. I attempted a recompile of R using a parallel BLAS (not Intel's) and didn't see any advantage for the problem I was working on; so I abandoned that approach. As far as I can tell the svd in R uses DGESDD and ZGESDD but DGESVD uses QR and QR is still not getting a multicore advantage (as far as I can tell). Maybe you can provide an answer with some benchmarks just for the sake of the record? — russellpierce, Oct 21 '15 at 09:46

score 3 · Answer 1 · edited May 23 '17 at 10:30

Typical estimation procedures for linear models, e.g. what R uses for lm, involve QR decomposition which appears (in most BLASes; see below for more details) to be inherently a sequential process and therefore bound to a single core.

Other methods may be multicore, but may not accomplish your real aim - a faster calculation. I'll note two.

You could explore alternate BLASes for R. However, as noted there "Multi-threaded BLAS libraries make no significant difference to real-world analysis problems using R". REvolution for example does provide a modified version of R that uses multiple cores when fitting some linear models... and may indeed prove a bit faster on parts of the operation involving vector operations. See the comments on one of their pages talking about the speed advantage of using a multicore BLAS and evaluate for yourself. Ultimately, the proof will be in the pudding - try it with your real-world problem and see if it gives you what you want (although I gather from existing comments it does not).
You could look at results using the search term stochastic gradient decent. That method, given enough resources, may be able to give you a multicore solution that yields a speed benefit.

As an aside, the two methods you endorsed as multicore on quick review don't seem to me to be truely multicore. In general, it is easy to split data into chunks, and again I might be wrong, but I don't think you'll be able to process those chunks in parallel and recombine the models ... that is ... unless you are willing to do something general (in which case the methods you reject will work just as well).

The something general you might do, if you are willing to be a bit imprecise is:

split your data up into samples
run the samples separately and in parallel
collect your regression coefficients and use the mean coefficients as actual coefficients
calculate your predictions
calculate your residuals

... but that doesn't solve your RAM issue and again - I question whether you'll find enough of a speed benefit to make it worth your while.

Using multiple CPU's, What packages can I use to calculate linear models?

1 Answers1